[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149099#comment-17149099
 ] 

David Capwell commented on CASSANDRA-15579:
---

Also membership changes

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149093#comment-17149093
 ] 

David Capwell commented on CASSANDRA-15579:
---

One thing to say, Cassandra tests tend to lack failure mode testing, so would 
be good to start looking into where things could fail and if we have tests to 
handle them; thats what I was doing for repair. 

We also have issues with upgrades, and issues with older SSTable formats. 

Another thing to look into is the interaction between different features, if 
enabling/disabling a feature interacts with something, make sure we include 
testing with it (and failures there).

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15792) test_speculative_data_request - read_repair_test.TestSpeculativeReadRepair

2020-06-30 Thread Gianluca Righetto (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianluca Righetto updated CASSANDRA-15792:
--
Status: Patch Available  (was: In Progress)

> test_speculative_data_request - read_repair_test.TestSpeculativeReadRepair
> --
>
> Key: CASSANDRA-15792
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15792
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Ekaterina Dimitrova
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Failing on the latest trunk here:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/127/workflows/dfba669d-4a5c-4553-b6a2-85647d0d8d2b/jobs/668/tests
> Failing once in 30 times as per Jenkins:
> https://jenkins-cm4.apache.org/job/Cassandra-trunk-dtest/69/testReport/dtest.read_repair_test/TestSpeculativeReadRepair/test_speculative_data_request/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149079#comment-17149079
 ] 

Caleb Rackliffe commented on CASSANDRA-15579:
-

It also seems like this could really leverage CASSANDRA-15348.

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149076#comment-17149076
 ] 

Caleb Rackliffe commented on CASSANDRA-15579:
-

[~adelapena] I might have some cycles to help here if there's enough work to 
split up.

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149074#comment-17149074
 ] 

Caleb Rackliffe edited comment on CASSANDRA-15579 at 7/1/20, 3:33 AM:
--

Yeah, there's also the {{AbstractReadRepairTest}} subclasses, two versions of 
{{ReadRepairTest}} in different packages, {{MixedModeReadRepairTest}} (which 
looks pretty sparse in terms of its version combinations?), and 
{{SimpleReadWriteTest}}. It might make more sense to work on CASSANDRA-14697 
than include transient replication in this ticket, but not sure what everyone 
else thinks...


was (Author: maedhroz):
Yeah, there's also the {{AbstractReadRepairTest}} subclasses, two versions of 
{{ReadRepairTest}} in different packages, {{MixedModeReadRepairTest}} (which 
looks pretty sparse in terms of its version combinations?), and 
{{SimpleReadWriteTest}}. It might make more sense to work on CASSANDRA-14697 
than include transient replication here, but not sure what everyone else 
thinks...

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149074#comment-17149074
 ] 

Caleb Rackliffe commented on CASSANDRA-15579:
-

Yeah, there's also the {{AbstractReadRepairTest}} subclasses, two versions of 
{{ReadRepairTest}} in different packages, {{MixedModeReadRepairTest}} (which 
looks pretty sparse in terms of its version combinations?), and 
{{SimpleReadWriteTest}}. It might make more sense to work on CASSANDRA-14697 
than include transient replication here, but not sure what everyone else 
thinks...

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL

2020-06-30 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149071#comment-17149071
 ] 

ZhaoYang commented on CASSANDRA-15900:
--

rebased and submit another round of ci: 
[j8|https://circleci.com/workflow-run/cdf55335-c876-450b-8bf9-1d778a2df806] and 
[j11|https://circleci.com/workflow-run/2080f225-f689-4243-ad67-288bef608640]

> Close channel and reduce buffer allocation during entire sstable streaming 
> with SSL
> ---
>
> Key: CASSANDRA-15900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk 
> file into user-space off-heap buffer when SSL is enabled, because netty 
> doesn't support zero-copy with SSL.
> But there are two issues:
>  # file channel is not closed.
>  # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, 
> thus it's all allocated outside the pool and will cause large amount of 
> allocations.
> [Patch|https://github.com/apache/cassandra/pull/651]:
>  # close file channel when the last batch is loaded into off-heap bytebuffer. 
> I don't think we need to wait until buffer is flushed by netty.
>  # reduce the batch to 64kb which is more buffer pool friendly when streaming 
> entire sstable with SSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149066#comment-17149066
 ] 

Caleb Rackliffe commented on CASSANDRA-15900:
-

Let's see...

{{test_restart_node_localhost - 
pushed_notifications_test.TestPushedNotifications}} should have been addressed 
by CASSANDRA-15677 a few days ago.

{{test_describe - cqlsh_tests.test_cqlsh.TestCqlsh}} and its materialized view 
equivalent have a history of flakiness, and don't look directly related to this 
patch. (Is there an issue around {{read_repair}} showing up in the table DDL 
where it isn't expected?)



> Close channel and reduce buffer allocation during entire sstable streaming 
> with SSL
> ---
>
> Key: CASSANDRA-15900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk 
> file into user-space off-heap buffer when SSL is enabled, because netty 
> doesn't support zero-copy with SSL.
> But there are two issues:
>  # file channel is not closed.
>  # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, 
> thus it's all allocated outside the pool and will cause large amount of 
> allocations.
> [Patch|https://github.com/apache/cassandra/pull/651]:
>  # close file channel when the last batch is loaded into off-heap bytebuffer. 
> I don't think we need to wait until buffer is flushed by netty.
>  # reduce the batch to 64kb which is more buffer pool friendly when streaming 
> entire sstable with SSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection

2020-06-30 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15907:

Description: 
CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i 
and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a 
few things we should follow up on, however, to make life a bit easier for 
operators and generally de-risk usage:

(Note: Line numbers are based on {{trunk}} as of 
{{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.)

*Minor Optimizations*

* {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be 
able to use simple arrays instead of lists for {{rowsToFetch}} and 
{{originalPartitions}}. Alternatively (or also), we may be able to null out 
references in these two collections more aggressively. (ex. Using 
{{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, 
assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.)
* {{ReplicaFilteringProtection:323}} - We may be able to use 
{{EncodingStats.merge()}} and remove the custom {{stats()}} method.
* {{DataResolver:111 & 228}} - Cache an instance of 
{{UnaryOperator#identity()}} instead of creating one on the fly.
* {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather rather 
than serially querying every row that needs to be completed. This isn't a clear 
win perhaps, given it targets the latency of single queries and adds some 
complexity. (Certainly a decent candidate to kick even out of this issue.)

*Documentation and Intelligibility*

* There are a few places (CHANGES.txt, tracing output in 
{{ReplicaFilteringProtection}}, etc.) where we mention "replica-side filtering 
protection" (which makes it seem like the coordinator doesn't filter) rather 
than "replica filtering protection" (which sounds more like what we actually 
do, which is protect ourselves against incorrect replica filtering results). 
It's a minor fix, but would avoid confusion.
* The method call chain in {{DataResolver}} might be a bit simpler if we put 
the {{repairedDataTracker}} in {{ResolveContext}}.

*Testing*

* I want to bite the bullet and get some basic tests for RFP (including any 
guardrails we might add here) onto the in-JVM dtest framework.

*Guardrails*

* As it stands, we don't have a way to enforce an upper bound on the memory 
usage of {{ReplicaFilteringProtection}} which caches row responses from the 
first round of requests. (Remember, these are later used to merged with the 
second round of results to complete the data for filtering.) Operators will 
likely need a way to protect themselves, i.e. simply fail queries if they hit a 
particular threshold rather than GC nodes into oblivion. (Having control over 
limits and page sizes doesn't quite get us there, because stale results 
_expand_ the number of incomplete results we must cache.) The fun question is 
how we do this, with the primary axes being scope (per-query, global, etc.) and 
granularity (per-partition, per-row, per-cell, actual heap usage, etc.). My 
starting disposition   on the right trade-off between performance/complexity 
and accuracy is having something along the lines of cached rows per query. 
Prior art suggests this probably makes sense alongside things like 
{{tombstone_failure_threshold}} in {{cassandra.yaml}}.

  was:
CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i 
and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a 
few things we should follow up on, however, to make life a bit easier for 
operators and generally de-risk usage:

(Note: Line numbers are based on {{trunk}} as of 
{{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.)

*Minor Optimizations*

* {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be 
able to use simple arrays instead of lists for {{rowsToFetch}} and 
{{originalPartitions}}. Alternatively (or also), we may be able to null out 
references in these two collections more aggressively. (ex. Using 
{{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, 
assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.)
* {{ReplicaFilteringProtection:323}} - We may be able to use 
{{EncodingStats.merge()}} and remove the custom {{stats()}} method.
* {{DataResolver:111 & 228}} - Cache an instance of 
{{UnaryOperator#identity()}} instead of creating one on the fly.
* {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather rather 
than serially querying every row that needs to be completed. This isn't a clear 
win perhaps, given it targets the latency of single queries and adds some 
complexity. (Certainly a decent candidate to kick even out of this issue.)

*Documentation and Intelligibility*

* There are a few places (CHANGES.txt, tracing output in 
{{ReplicaFilteringProtection}}, etc.) where we mention "replica-side fi

[jira] [Commented] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149059#comment-17149059
 ] 

Caleb Rackliffe commented on CASSANDRA-15907:
-

...and of course, if we want to punt on a redesign for now, we can always 
proceed w/ the [guardrails 
approach|https://issues.apache.org/jira/browse/CASSANDRA-15907?focusedCommentId=17148207&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17148207],
 which is basically like [~adelapena]'s latest idea, but with a large N.

> Operational Improvements & Hardening for Replica Filtering Protection
> -
>
> Key: CASSANDRA-15907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Coordination, Feature/2i Index
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Labels: 2i, memory
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>
> CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i 
> and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a 
> few things we should follow up on, however, to make life a bit easier for 
> operators and generally de-risk usage:
> (Note: Line numbers are based on {{trunk}} as of 
> {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.)
> *Minor Optimizations*
> * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be 
> able to use simple arrays instead of lists for {{rowsToFetch}} and 
> {{originalPartitions}}. Alternatively (or also), we may be able to null out 
> references in these two collections more aggressively. (ex. Using 
> {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, 
> assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.)
> * {{ReplicaFilteringProtection:323}} - We may be able to use 
> {{EncodingStats.merge()}} and remove the custom {{stats()}} method.
> * {{DataResolver:111 & 228}} - Cache an instance of 
> {{UnaryOperator#identity()}} instead of creating one on the fly.
> * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather 
> rather than serially querying every row that needs to be completed. This 
> isn't a clear win perhaps, given it targets the latency of single queries and 
> adds some complexity. (Certainly a decent candidate to kick even out of this 
> issue.)
> *Documentation and Intelligibility*
> * There are a few places (CHANGES.txt, tracing output in 
> {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side 
> filtering protection" (which makes it seem like the coordinator doesn't 
> filter) rather than "replica filtering protection" (which sounds more like 
> what we actually do, which is protect ourselves against incorrect replica 
> filtering results). It's a minor fix, but would avoid confusion.
> * The method call chain in {{DataResolver}} might be a bit simpler if we put 
> the {{repairedDataTracker}} in {{ResolveContext}}.
> *Guardrails*
> * As it stands, we don't have a way to enforce an upper bound on the memory 
> usage of {{ReplicaFilteringProtection}} which caches row responses from the 
> first round of requests. (Remember, these are later used to merged with the 
> second round of results to complete the data for filtering.) Operators will 
> likely need a way to protect themselves, i.e. simply fail queries if they hit 
> a particular threshold rather than GC nodes into oblivion. (Having control 
> over limits and page sizes doesn't quite get us there, because stale results 
> _expand_ the number of incomplete results we must cache.) The fun question is 
> how we do this, with the primary axes being scope (per-query, global, etc.) 
> and granularity (per-partition, per-row, per-cell, actual heap usage, etc.). 
> My starting disposition   on the right trade-off between 
> performance/complexity and accuracy is having something along the lines of 
> cached rows per query. Prior art suggests this probably makes sense alongside 
> things like {{tombstone_failure_threshold}} in {{cassandra.yaml}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149000#comment-17149000
 ] 

Caleb Rackliffe commented on CASSANDRA-15907:
-

At this point, we're sitting on what appears to be 4 distinct approaches to 
addressing the problems in the current implementation. Before trying to 
contrast them all, I want to think through the kinds of usage we expect and the 
consequences of that. Future indexing implementations aside, neither filtering 
queries nor secondary index queries are currently meant to be used at scale 
(especially at CL > ONE/LOCAL_ONE) without partition restrictions. Optimizing 
for that case seems reasonable. The other big axis is how common out of sync 
replicas actually are, and how responsive we have to be from "rare" to "entire 
replica datasets are out of sync". What's currently in trunk does just fine if 
there is very little out-of-sync data, especially in the common case that we're 
limited to a partition. (i.e. The actual number of protection queries is very 
low, because we group by partition.) Its weakness is the edge case.

bq. Issue blocking RFP read immediately at {{MergeListener#onMergedRows}} when 
detecting potential outdated rows

This single-pass solution would excel in situations where there are very few 
silent replicas and put very little stress on the heap, given it could simply 
forgo caching merged rows that don't satisfy the query filter. It also appears 
to be a fairly simple change to the existing logic. The downside of this 
approach is that it would start to issue a pretty high volume of individual row 
protection queries as it came across more silent replicas, without even the 
benefit of mitigating partition grouping. It wouldn't require any new 
guardrails around memory usage, and the worst that could happen is a query 
timeout.

bq. We could try to not cache all the results but advance in blocks of a 
certain fixed number of cached results, so we limit the number of cached 
results while we can still group keys to do less queries. That is, we could 
have that pessimistic SRP read prefetching and caching N rows completed with 
extra queries to the silent replicas, plugged to another group of 
unmerged-merged counters to prefetch more results if (probably) needed

This seems to retain all the nice characteristics of the current trunk 
implementation (most importantly partition grouping for RFP queries), with the 
added benefit that it should only use heap proportional to the actual user 
limit (although not precisely, given the different between the batch size and 
the limit). It wouldn't really require any new guardrails around memory usage, 
given the tighter coupling to the limit or page size, and the worse case is 
also a timeout. The stumbling block feels like complexity, but that might just 
be my lack of creativity. [~adelapena] Wouldn't we have to avoid SRP in the 
first phase of the query to limit the size of the result cache during batches?

I've been trying to figure out a way to merge these two ideas, i.e. to batch 
partition/completion reads in the RFP {{MergeListener}}. Combined w/ filtering, 
also in the {{MergeListener}}, we could discard (i.e. avoid caching) the rows 
that don't pass the filter. The problem is that the return value of 
{{onMergedRows()}} is what presently informs SRP/controls the counter.

> Operational Improvements & Hardening for Replica Filtering Protection
> -
>
> Key: CASSANDRA-15907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Coordination, Feature/2i Index
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Labels: 2i, memory
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>
> CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i 
> and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a 
> few things we should follow up on, however, to make life a bit easier for 
> operators and generally de-risk usage:
> (Note: Line numbers are based on {{trunk}} as of 
> {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.)
> *Minor Optimizations*
> * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be 
> able to use simple arrays instead of lists for {{rowsToFetch}} and 
> {{originalPartitions}}. Alternatively (or also), we may be able to null out 
> references in these two collections more aggressively. (ex. Using 
> {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, 
> assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.)
> * {{ReplicaFilteringProtection:323}} - We may be able to use 
> {{EncodingStats.m

[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode

2020-06-30 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15905:
-
Source Control Link: 
https://github.com/apache/cassandra/commit/9251b8116ff89b528b6b9eaa43d4dc2d1bc0bbaf
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed to 3.11 (with very small backport) and trunk, thanks!

> cqlsh not able to fetch all rows when in batch mode
> ---
>
> Key: CASSANDRA-15905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.11.7, 4.0-alpha5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cqlsh in trunk only display the first page when running in the batch 
> mode, i.e. using {{--execute}} or {{--file}} option. 
>   
>  It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
>   
>  It can be reproduced in 3 steps.
> {code:java}
>  1. ccm create trunk -v git:trunk -n1 && ccm start
>  2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"   // 
> write 1000 rows
>  3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// 
> fetch all rows
> {code}
>  
>  There are 1000 rows written. But the output in step 3 will only list 100 
> rows, which is the first page. 
> {code:java}
> ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l
>  105{code}
>  
>  The related change was introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
> script no longer fetch all rows when not using tty in the print_result 
> method. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode

2020-06-30 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15905:
-
Reviewers: Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Brandon Williams, Brandon Williams
   Status: Review In Progress  (was: Patch Available)

> cqlsh not able to fetch all rows when in batch mode
> ---
>
> Key: CASSANDRA-15905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.11.7, 4.0-alpha5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cqlsh in trunk only display the first page when running in the batch 
> mode, i.e. using {{--execute}} or {{--file}} option. 
>   
>  It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
>   
>  It can be reproduced in 3 steps.
> {code:java}
>  1. ccm create trunk -v git:trunk -n1 && ccm start
>  2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"   // 
> write 1000 rows
>  3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// 
> fetch all rows
> {code}
>  
>  There are 1000 rows written. But the output in step 3 will only list 100 
> rows, which is the first page. 
> {code:java}
> ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l
>  105{code}
>  
>  The related change was introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
> script no longer fetch all rows when not using tty in the print_result 
> method. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode

2020-06-30 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15905:
-
Test and Documentation Plan: dtest added
 Status: Patch Available  (was: Open)

> cqlsh not able to fetch all rows when in batch mode
> ---
>
> Key: CASSANDRA-15905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.11.7, 4.0-alpha5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cqlsh in trunk only display the first page when running in the batch 
> mode, i.e. using {{--execute}} or {{--file}} option. 
>   
>  It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
>   
>  It can be reproduced in 3 steps.
> {code:java}
>  1. ccm create trunk -v git:trunk -n1 && ccm start
>  2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"   // 
> write 1000 rows
>  3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// 
> fetch all rows
> {code}
>  
>  There are 1000 rows written. But the output in step 3 will only list 100 
> rows, which is the first page. 
> {code:java}
> ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l
>  105{code}
>  
>  The related change was introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
> script no longer fetch all rows when not using tty in the print_result 
> method. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode

2020-06-30 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15905:
-
Status: Ready to Commit  (was: Review In Progress)

> cqlsh not able to fetch all rows when in batch mode
> ---
>
> Key: CASSANDRA-15905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.11.7, 4.0-alpha5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cqlsh in trunk only display the first page when running in the batch 
> mode, i.e. using {{--execute}} or {{--file}} option. 
>   
>  It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
>   
>  It can be reproduced in 3 steps.
> {code:java}
>  1. ccm create trunk -v git:trunk -n1 && ccm start
>  2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"   // 
> write 1000 rows
>  3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// 
> fetch all rows
> {code}
>  
>  There are 1000 rows written. But the output in step 3 will only list 100 
> rows, which is the first page. 
> {code:java}
> ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l
>  105{code}
>  
>  The related change was introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
> script no longer fetch all rows when not using tty in the print_result 
> method. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-dtest] branch master updated: Add test for CASSANDRA-15905

2020-06-30 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/master by this push:
 new f5bc21c  Add test for CASSANDRA-15905
f5bc21c is described below

commit f5bc21c40ccd4bc2b9bc118ec5888bad3cc15b16
Author: Yifan Cai 
AuthorDate: Tue Jun 30 14:23:04 2020 -0700

Add test for CASSANDRA-15905

Patch by Yifan Cai, reviewed by brandonwilliams for CASSANDRA-15905
---
 cqlsh_tests/test_cqlsh.py | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/cqlsh_tests/test_cqlsh.py b/cqlsh_tests/test_cqlsh.py
index a47f942..f261833 100644
--- a/cqlsh_tests/test_cqlsh.py
+++ b/cqlsh_tests/test_cqlsh.py
@@ -2031,6 +2031,33 @@ Tracing session:""")
 
 assert_all(session, "SELECT * FROM ks.cf", [[0]])
 
+def test_fetch_all_rows_in_batch_mode(self):
+"""
+Test: cqlsh -e "" with more rows than 1 page
+@jira_ticket CASSANDRA-15905
+"""
+self.cluster.populate(1)
+self.cluster.start(wait_for_binary_proto=True)
+node1, = self.cluster.nodelist()
+session = self.patient_cql_connection(node1)
+
+session.execute("CREATE KEYSPACE ks WITH 
REPLICATION={'class':'SimpleStrategy','replication_factor':1};")
+session.execute("CREATE TABLE ks.test (key uuid primary key);")
+
+num_rows = 200
+expected_lines = num_rows + 5 # 5: header + empty lines
+
+for i in range(num_rows):
+session.execute("INSERT INTO ks.test (key) VALUES (uuid())")
+
+stdout, err = self.run_cqlsh(node1, cmds="", cqlsh_options=['-e', 
'SELECT * FROM ks.test;'])
+assert err == ""
+output_lines = stdout.splitlines()
+assert expected_lines == len(output_lines)
+assert output_lines[0].strip() == ''
+assert output_lines[-2].strip() == ''
+assert output_lines[-1].strip() == "({} rows)".format(num_rows)
+
 def run_cqlsh(self, node, cmds, cqlsh_options=None, env_vars=None):
 """
 Local version of run_cqlsh to open a cqlsh subprocess with


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk

2020-06-30 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit eebb9e02cd10cde576bcf860417ec3d011c7c165
Merge: 3b8ed1e 9251b81
Author: Brandon Williams 
AuthorDate: Tue Jun 30 17:57:22 2020 -0500

Merge branch 'cassandra-3.11' into trunk

 CHANGES.txt   |  1 +
 bin/cqlsh.py  | 42 ++
 pylib/cqlshlib/tracing.py |  2 +-
 3 files changed, 28 insertions(+), 17 deletions(-)

diff --cc CHANGES.txt
index 1c30b58,d89d22b..8fafb7d
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,57 -1,8 +1,58 @@@
 -3.11.7
 +4.0-alpha5
 + * Prune expired messages less frequently in internode messaging 
(CASSANDRA-15700)
 + * Fix Ec2Snitch handling of legacy mode for dc names matching both formats, 
eg "us-west-2" (CASSANDRA-15878)
 + * Add support for server side DESCRIBE statements (CASSANDRA-14825)
 + * Fail startup if -Xmn is set when the G1 garbage collector is used 
(CASSANDRA-15839)
 + * generateSplits method replaced the generateRandomTokens for 
ReplicationAwareTokenAllocator. (CASSANDRA-15877)
 + * Several mbeans are not unregistered when dropping a keyspace and table 
(CASSANDRA-14888)
 + * Update defaults for server and client TLS settings (CASSANDRA-15262)
 + * Differentiate follower/initator in StreamMessageHeader (CASSANDRA-15665)
 + * Add a startup check to detect if LZ4 uses java rather than native 
implementation (CASSANDRA-15884)
 + * Fix missing topology events when running multiple nodes on the same 
network interface (CASSANDRA-15677)
 + * Create config.yml.MIDRES (CASSANDRA-15712)
 + * Fix handling of fully purged static rows in repaired data tracking 
(CASSANDRA-15848)
 + * Prevent validation request submission from blocking ANTI_ENTROPY stage 
(CASSANDRA-15812)
 + * Add fqltool and auditlogviewer to rpm and deb packages (CASSANDRA-14712)
 + * Include DROPPED_COLUMNS in schema digest computation (CASSANDRA-15843)
 + * Fix Cassandra restart from rpm install (CASSANDRA-15830)
 + * Improve handling of 2i initialization failures (CASSANDRA-13606)
 + * Add completion_ratio column to sstable_tasks virtual table (CASANDRA-15759)
 + * Add support for adding custom Verbs (CASSANDRA-15725)
 + * Speed up entire-file-streaming file containment check and allow 
entire-file-streaming for all compaction strategies 
(CASSANDRA-15657,CASSANDRA-15783)
 + * Provide ability to configure IAuditLogger (CASSANDRA-15748)
 + * Fix nodetool enablefullquerylog blocking param parsing (CASSANDRA-15819)
 + * Add isTransient to SSTableMetadataView (CASSANDRA-15806)
 + * Fix tools/bin/fqltool for all shells (CASSANDRA-15820)
 + * Fix clearing of legacy size_estimates (CASSANDRA-15776)
 + * Update port when reconnecting to pre-4.0 SSL storage (CASSANDRA-15727)
 + * Only calculate dynamicBadnessThreshold once per loop in 
DynamicEndpointSnitch (CASSANDRA-15798)
 + * Cleanup redundant nodetool commands added in 4.0 (CASSANDRA-15256)
 + * Update to Python driver 3.23 for cqlsh (CASSANDRA-15793)
 + * Add tunable initial size and growth factor to RangeTombstoneList 
(CASSANDRA-15763)
 + * Improve debug logging in SSTableReader for index summary (CASSANDRA-15755)
 + * bin/sstableverify should support user provided token ranges 
(CASSANDRA-15753)
 + * Improve logging when mutation passed to commit log is too large 
(CASSANDRA-14781)
 + * replace LZ4FastDecompressor with LZ4SafeDecompressor (CASSANDRA-15560)
 + * Fix buffer pool NPE with concurrent release due to in-progress tiny pool 
eviction (CASSANDRA-15726)
 + * Avoid race condition when completing stream sessions (CASSANDRA-15666)
 + * Flush with fast compressors by default (CASSANDRA-15379)
 + * Fix CqlInputFormat regression from the switch to system.size_estimates 
(CASSANDRA-15637)
 + * Allow sending Entire SSTables over SSL (CASSANDRA-15740)
 + * Fix CQLSH UTF-8 encoding issue for Python 2/3 compatibility 
(CASSANDRA-15739)
 + * Fix batch statement preparation when multiple tables and parameters are 
used (CASSANDRA-15730)
 + * Fix regression with traceOutgoingMessage printing message size 
(CASSANDRA-15687)
 + * Ensure repaired data tracking reads a consistent amount of data across 
replicas (CASSANDRA-15601)
 + * Fix CQLSH to avoid arguments being evaluated (CASSANDRA-15660)
 + * Correct Visibility and Improve Safety of Methods in LatencyMetrics 
(CASSANDRA-15597)
 + * Allow cqlsh to run with Python2.7/Python3.6+ 
(CASSANDRA-15659,CASSANDRA-15573)
 + * Improve logging around incremental repair (CASSANDRA-15599)
 + * Do not check cdc_raw_directory filesystem space if CDC disabled 
(CASSANDRA-15688)
 + * Replace array iterators with get by index (CASSANDRA-15394)
 + * Minimize BTree iterator allocations (CASSANDRA-15389)
 +Merged from 3.11:
+  * Fix cqlsh output when fetching all rows in batch mode (CASSANDRA-15905)
   * Upgrade Jackson to 2.9.10 (CASSANDRA-15867)
   * Fix CQL formatting of read command res

[cassandra] branch cassandra-3.11 updated: Fix cqlsh output when fetching all rows in batch mode

2020-06-30 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-3.11 by this push:
 new 9251b81  Fix cqlsh output when fetching all rows in batch mode
9251b81 is described below

commit 9251b8116ff89b528b6b9eaa43d4dc2d1bc0bbaf
Author: yifan-c 
AuthorDate: Tue Jun 30 00:15:18 2020 -0700

Fix cqlsh output when fetching all rows in batch mode

Patch by Yifan Cai, reviewed by brandonwilliams for CASSANDRA-15905
---
 CHANGES.txt   |  1 +
 bin/cqlsh.py  | 44 
 pylib/cqlshlib/tracing.py |  2 +-
 3 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 9b4cf55..d89d22b 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.11.7
+ * Fix cqlsh output when fetching all rows in batch mode (CASSANDRA-15905)
  * Upgrade Jackson to 2.9.10 (CASSANDRA-15867)
  * Fix CQL formatting of read command restrictions for slow query log 
(CASSANDRA-15503)
  * Allow sstableloader to use SSL on the native port (CASSANDRA-14904)
diff --git a/bin/cqlsh.py b/bin/cqlsh.py
index 2e5e2d4..44d4d50 100644
--- a/bin/cqlsh.py
+++ b/bin/cqlsh.py
@@ -1079,7 +1079,7 @@ class Shell(cmd.Cmd):
 elif result:
 # CAS INSERT/UPDATE
 self.writeresult("")
-self.print_static_result(result, 
self.parse_for_update_meta(statement.query_string))
+self.print_static_result(result, 
self.parse_for_update_meta(statement.query_string), with_header=True, 
tty=self.tty)
 self.flush_output()
 return True, future
 
@@ -1087,20 +1087,30 @@ class Shell(cmd.Cmd):
 self.decoding_errors = []
 
 self.writeresult("")
-if result.has_more_pages and self.tty:
+
+def print_all(result, table_meta, tty):
+# Return the number of rows in total
 num_rows = 0
+isFirst = True
 while True:
-if result.current_rows:
+# Always print for the first page even it is empty
+if result.current_rows or isFirst:
 num_rows += len(result.current_rows)
-self.print_static_result(result, table_meta)
+with_header = isFirst or tty
+self.print_static_result(result, table_meta, with_header, 
tty)
 if result.has_more_pages:
-raw_input("---MORE---")
+if self.shunted_query_out is None and tty:
+# Only pause when not capturing.
+raw_input("---MORE---")
 result.fetch_next_page()
 else:
+if not tty:
+self.writeresult("")
 break
-else:
-num_rows = len(result.current_rows)
-self.print_static_result(result, table_meta)
+isFirst = False
+return num_rows
+
+num_rows = print_all(result, table_meta, self.tty)
 self.writeresult("(%d rows)" % num_rows)
 
 if self.decoding_errors:
@@ -1110,7 +1120,7 @@ class Shell(cmd.Cmd):
 self.writeresult('%d more decoding errors suppressed.'
  % (len(self.decoding_errors) - 2), color=RED)
 
-def print_static_result(self, result, table_meta):
+def print_static_result(self, result, table_meta, with_header, tty):
 if not result.column_names and not table_meta:
 return
 
@@ -1118,7 +1128,7 @@ class Shell(cmd.Cmd):
 formatted_names = [self.myformat_colname(name, table_meta) for name in 
column_names]
 if not result.current_rows:
 # print header only
-self.print_formatted_result(formatted_names, None)
+self.print_formatted_result(formatted_names, None, 
with_header=True, tty=tty)
 return
 
 cql_types = []
@@ -1132,9 +1142,9 @@ class Shell(cmd.Cmd):
 if self.expand_enabled:
 self.print_formatted_result_vertically(formatted_names, 
formatted_values)
 else:
-self.print_formatted_result(formatted_names, formatted_values)
+self.print_formatted_result(formatted_names, formatted_values, 
with_header, tty)
 
-def print_formatted_result(self, formatted_names, formatted_values):
+def print_formatted_result(self, formatted_names, formatted_values, 
with_header, tty):
 # determine column widths
 widths = [n.displaywidth for n in formatted_names]
 if formatted_values is not None:
@@ -1143,9 +1153,10 @@ class Shell(cmd.Cmd):
 widths[num] = max(widths[num], col.displaywidth)
 
 # print header
-header = ' | '.join(hdr.ljust(w, color=self.color) for (hdr, w) in 
zip(formatted_n

[cassandra] branch trunk updated (3b8ed1e -> eebb9e0)

2020-06-30 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 3b8ed1e  Fix a log message typo in StartupChecks
 new 9251b81  Fix cqlsh output when fetching all rows in batch mode
 new eebb9e0  Merge branch 'cassandra-3.11' into trunk

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt   |  1 +
 bin/cqlsh.py  | 42 ++
 pylib/cqlshlib/tracing.py |  2 +-
 3 files changed, 28 insertions(+), 17 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode

2020-06-30 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148988#comment-17148988
 ] 

Yifan Cai commented on CASSANDRA-15905:
---

||Cassandra||Dtest||
|[PR|https://github.com/apache/cassandra/pull/661]|[PR|https://github.com/apache/cassandra-dtest/pull/82]|
|[Code|https://github.com/yifan-c/cassandra/tree/CASSANDRA-15905-cqlsh-fetch-all-rows-in-batch-mode]|[Code|https://github.com/yifan-c/cassandra-dtest]|

Test: 
[https://app.circleci.com/pipelines/github/yifan-c/cassandra/66/workflows/2b590ea0-2b4a-4d79-8abc-347cecded0cc]

The dtest failures should not be related to the change. The errors can be 
reproduced by running the dtest against trunk.

There is no failure from tests in {{test_cqlsh.py}}.

Briefly, the changes are
 * Fetch and print all pages iteratively. ({{cqlsh.py::Shell::print_result}})
 * Print compactly when in batch mode.
 * Always print header and new line at the bottom for each page if in tty mode, 
in order to have the same behavior.

> cqlsh not able to fetch all rows when in batch mode
> ---
>
> Key: CASSANDRA-15905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.11.7, 4.0-alpha5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cqlsh in trunk only display the first page when running in the batch 
> mode, i.e. using {{--execute}} or {{--file}} option. 
>   
>  It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
>   
>  It can be reproduced in 3 steps.
> {code:java}
>  1. ccm create trunk -v git:trunk -n1 && ccm start
>  2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"   // 
> write 1000 rows
>  3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"// 
> fetch all rows
> {code}
>  
>  There are 1000 rows written. But the output in step 3 will only list 100 
> rows, which is the first page. 
> {code:java}
> ➜ bin/cqlsh -e "SELECT * FROM keyspace1.standard1" | wc -l
>  105{code}
>  
>  The related change was introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
> script no longer fetch all rows when not using tty in the print_result 
> method. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-06-30 Thread Olivier Michallat (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148948#comment-17148948
 ] 

Olivier Michallat commented on CASSANDRA-15299:
---

About the name change, I would like to advocate one last time for renaming the 
new outer type, not the legacy inner type.

I know you'd prefer the other way, and in a vacuum I would agree. But I think 
that in this case maintaining continuity is more important than perfect naming. 
For example:
 * the mere size of the patch. This will affect hundreds of unrelated lines.
 * those changes will get in the way later: create more conflicts when you 
backport something to a legacy branch, obscure {{git blame}} output, etc.
 * old commits still use the old naming. If you need to look at something in 
git history, you'll have to make the mental switch constantly. Not the end of 
the world, but it's just one more little thing.

 

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> ---
>
> Key: CASSANDRA-15299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Client
>Reporter: Aleksey Yeschenko
>Assignee: Sam Tunnicliffe
>Priority: Normal
>  Labels: protocolv5
> Fix For: 4.0-alpha
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection

2020-06-30 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15907:

Fix Version/s: 3.11.x
   3.0.x

> Operational Improvements & Hardening for Replica Filtering Protection
> -
>
> Key: CASSANDRA-15907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Coordination, Feature/2i Index
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Labels: 2i, memory
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>
> CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i 
> and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a 
> few things we should follow up on, however, to make life a bit easier for 
> operators and generally de-risk usage:
> (Note: Line numbers are based on {{trunk}} as of 
> {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.)
> *Minor Optimizations*
> * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be 
> able to use simple arrays instead of lists for {{rowsToFetch}} and 
> {{originalPartitions}}. Alternatively (or also), we may be able to null out 
> references in these two collections more aggressively. (ex. Using 
> {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, 
> assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.)
> * {{ReplicaFilteringProtection:323}} - We may be able to use 
> {{EncodingStats.merge()}} and remove the custom {{stats()}} method.
> * {{DataResolver:111 & 228}} - Cache an instance of 
> {{UnaryOperator#identity()}} instead of creating one on the fly.
> * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather 
> rather than serially querying every row that needs to be completed. This 
> isn't a clear win perhaps, given it targets the latency of single queries and 
> adds some complexity. (Certainly a decent candidate to kick even out of this 
> issue.)
> *Documentation and Intelligibility*
> * There are a few places (CHANGES.txt, tracing output in 
> {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side 
> filtering protection" (which makes it seem like the coordinator doesn't 
> filter) rather than "replica filtering protection" (which sounds more like 
> what we actually do, which is protect ourselves against incorrect replica 
> filtering results). It's a minor fix, but would avoid confusion.
> * The method call chain in {{DataResolver}} might be a bit simpler if we put 
> the {{repairedDataTracker}} in {{ResolveContext}}.
> *Guardrails*
> * As it stands, we don't have a way to enforce an upper bound on the memory 
> usage of {{ReplicaFilteringProtection}} which caches row responses from the 
> first round of requests. (Remember, these are later used to merged with the 
> second round of results to complete the data for filtering.) Operators will 
> likely need a way to protect themselves, i.e. simply fail queries if they hit 
> a particular threshold rather than GC nodes into oblivion. (Having control 
> over limits and page sizes doesn't quite get us there, because stale results 
> _expand_ the number of incomplete results we must cache.) The fun question is 
> how we do this, with the primary axes being scope (per-query, global, etc.) 
> and granularity (per-partition, per-row, per-cell, actual heap usage, etc.). 
> My starting disposition   on the right trade-off between 
> performance/complexity and accuracy is having something along the lines of 
> cached rows per query. Prior art suggests this probably makes sense alongside 
> things like {{tombstone_failure_threshold}} in {{cassandra.yaml}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148893#comment-17148893
 ] 

Caleb Rackliffe commented on CASSANDRA-15861:
-

[~jasonstack] I thought a bit more about our earlier chat (and had a quick chat 
w/ [~bdeggleston]), and it seems like the simplest thing might be handling the 
stats an index summary in slightly different ways.

The STATS component is small. We could just buffer it up, use that buffered 
size in the manifest, and stream that buffer. It special-cases this component, 
but we more or less avoid having to reason about the risk of blocking 
compactions, a repair completing, etc.

For the SUMMARY, we take advantage of the fact that possibly/infrequently 
delaying the redistribution task isn't a big suboptimal outcome. We have a 
simple lock that protects it (on {{SSTableRader}}, similar to what you've 
already mentioned or as a threadsafe set of readers in a central location), 
i.e. streaming acquires it when the manifest is created and releases it when 
the index summary completes streaming (where that "completion" happens in the 
non-SSL case isn't 100% clear to me)...and index redistribution acquires it 
_before_ it creates a transaction in {{getRestributionTransactions()}}, then 
releases it when the redistribution is complete (so we never have to block a 
compaction). Streaming might have to deal with a short delay if a 
redistribution is running, but a.) that doesn't happen that often and b.) the 
summary (I think) is usually not very large. ({{getRestributionTransactions()}} 
can ignore streaming SSTables just like it ignores compacting ones.

> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair starte

[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL

2020-06-30 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148877#comment-17148877
 ] 

Dinesh Joshi commented on CASSANDRA-15900:
--

[~maedhroz] [~jasonstack] looks like there are a few failures. They're likely 
unrelated but it would be great to double check and make sure.

> Close channel and reduce buffer allocation during entire sstable streaming 
> with SSL
> ---
>
> Key: CASSANDRA-15900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk 
> file into user-space off-heap buffer when SSL is enabled, because netty 
> doesn't support zero-copy with SSL.
> But there are two issues:
>  # file channel is not closed.
>  # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, 
> thus it's all allocated outside the pool and will cause large amount of 
> allocations.
> [Patch|https://github.com/apache/cassandra/pull/651]:
>  # close file channel when the last batch is loaded into off-heap bytebuffer. 
> I don't think we need to wait until buffer is flushed by netty.
>  # reduce the batch to 64kb which is more buffer pool friendly when streaming 
> entire sstable with SSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15538:
--
  Authors: Ekaterina Dimitrova, Sylvain Lebresne  (was: Sylvain Lebresne)
Reviewers: Blake Eggleston, Sam Tunnicliffe  (was: Blake Eggleston, 
Ekaterina Dimitrova, Sam Tunnicliffe)

> 4.0 quality testing: Local Read/Write Path: Other Areas
> ---
>
> Key: CASSANDRA-15538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15538
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Aleksey Yeschenko*
> Testing in this area refers to the local read/write path (StorageProxy, 
> ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still 
> finding numerous bugs and issues with the 3.0 storage engine rewrite 
> (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the 
> local read/write path with techniques such as property-based testing, fuzzing 
> ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]),
>  and a source audit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15580) 4.0 quality testing: Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie reassigned CASSANDRA-15580:
-

Assignee: Benjamin Lerer  (was: Berenguer Blasi)

> 4.0 quality testing: Repair
> ---
>
> Key: CASSANDRA-15580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15580
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> We aim for 4.0 to have the first fully functioning incremental repair 
> solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of 
> repair: (full range, sub range, incremental) function as expected as well as 
> ensuring community tools such as Reaper work. CASSANDRA-3200 adds an 
> experimental option to reduce the amount of data streamed during repair, we 
> should write more tests and see how it works with big nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15580) 4.0 quality testing: Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15580:
--
  Authors: Benjamin Lerer, Stephen Mallette  (was: Berenguer Blasi)
Reviewers: Marcus Eriksson, Vinay Chella  (was: Marcus Eriksson, Stephen 
Mallette, Vinay Chella)

> 4.0 quality testing: Repair
> ---
>
> Key: CASSANDRA-15580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15580
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> We aim for 4.0 to have the first fully functioning incremental repair 
> solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of 
> repair: (full range, sub range, incremental) function as expected as well as 
> ensuring community tools such as Reaper work. CASSANDRA-3200 adds an 
> experimental option to reduce the amount of data streamed during repair, we 
> should write more tests and see how it works with big nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15579:
--
  Authors: Andres de la Peña, Sylvain Lebresne  (was: Andres de la Peña)
Reviewers:   (was: Sylvain Lebresne)

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-30 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148858#comment-17148858
 ] 

Benedict Elliott Smith commented on CASSANDRA-15234:


bq. Given the framework makes it trivial to support old names, having no 
properties marked for removal of 5.0 works for me

+1

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other

2020-06-30 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148856#comment-17148856
 ] 

David Capwell commented on CASSANDRA-15909:
---

As long as this is done without breaking the old names, then sounds good to me.

> Make Table/Keyspace Metric Names Consistent With Each Other
> ---
>
> Key: CASSANDRA-15909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15909
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
> Fix For: 4.0-beta
>
>
> As part of CASSANDRA-15821 it became apparent that certain metric names found 
> in keyspace and tables had different names but were in fact the same metric - 
> they are as follows:
> * Table.SyncTime == Keyspace.RepairSyncTime
> * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows
> * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime
> * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize
> * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize
> * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize
> * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize
> Also, client metrics are the only metrics to start with a lower case letter. 
> Change those to upper case to match all the other metrics.
> Unifying this naming would help make metrics more consistent as part of 
> CASSANDRA-15582



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15580) 4.0 quality testing: Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15580:
--
Reviewers: Marcus Eriksson, Stephen Mallette, Vinay Chella  (was: Marcus 
Eriksson, Vinay Chella)

> 4.0 quality testing: Repair
> ---
>
> Key: CASSANDRA-15580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15580
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> We aim for 4.0 to have the first fully functioning incremental repair 
> solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of 
> repair: (full range, sub range, incremental) function as expected as well as 
> ensuring community tools such as Reaper work. CASSANDRA-3200 adds an 
> experimental option to reduce the amount of data streamed during repair, we 
> should write more tests and see how it works with big nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15538:
--
Reviewers: Blake Eggleston, Ekaterina Dimitrova, Sam Tunnicliffe  (was: 
Blake Eggleston, Sam Tunnicliffe)

> 4.0 quality testing: Local Read/Write Path: Other Areas
> ---
>
> Key: CASSANDRA-15538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15538
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Aleksey Yeschenko*
> Testing in this area refers to the local read/write path (StorageProxy, 
> ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still 
> finding numerous bugs and issues with the 3.0 storage engine rewrite 
> (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the 
> local read/write path with techniques such as property-based testing, fuzzing 
> ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]),
>  and a source audit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15579:
--
Reviewers: Sylvain Lebresne

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-30 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148852#comment-17148852
 ] 

David Capwell commented on CASSANDRA-15234:
---

Given the framework provided by this patch, the following wouldn't be hard to 
support (all, not just 1 or 2)

1) no warning or plan to remove
2) warning that it will be removed some day
3) warning on specific version which will remove

So, we could do the following

{code}
// provide a warning that this will not longer be supported after 5.0
@Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = 
Converter.MillisDurationConverter.class, scheduledRemoveBy = "5.0")
public volatile Duration native_transport_idle_timeout = new Duration("0ms");

// provide a warning that the property is deprecated and will be removed one day
@Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = 
Converter.MillisDurationConverter.class, deprecated = true)
public volatile Duration native_transport_idle_timeout = new Duration("0ms");

// no warning, both properties are fully supported
@Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = 
Converter.MillisDurationConverter.class)
public volatile Duration native_transport_idle_timeout = new Duration("0ms");
{code}

Given the framework makes it trivial to support old names, having no properties 
marked for removal of 5.0 works for me. If we really want to migrate usage to a 
new name, then mark it to be removed one day, and stuff which is personal 
preference (such as enable at the start or end of the name) can have no 
warning; does this make sense?

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-30 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148852#comment-17148852
 ] 

David Capwell edited comment on CASSANDRA-15234 at 6/30/20, 5:33 PM:
-

Given the framework provided by this patch, the following wouldn't be hard to 
support (all, not just 1 or 2)

1) no warning or plan to remove
2) warning that it will be removed some day
3) warning on specific version which will remove

So, we could do the following

{code}
// provide a warning that this will not longer be supported after 5.0
@Replaces(oldName = "native_transport_idle_timeout_in_ms", scheduledRemoveBy = 
"5.0")
public volatile Duration native_transport_idle_timeout = new Duration("0ms");

// provide a warning that the property is deprecated and will be removed one day
@Replaces(oldName = "native_transport_idle_timeout_in_ms", deprecated = true)
public volatile Duration native_transport_idle_timeout = new Duration("0ms");

// no warning, both properties are fully supported
@Replaces(oldName = "native_transport_idle_timeout_in_ms")
public volatile Duration native_transport_idle_timeout = new Duration("0ms");
{code}

Given the framework makes it trivial to support old names, having no properties 
marked for removal of 5.0 works for me. If we really want to migrate usage to a 
new name, then mark it to be removed one day, and stuff which is personal 
preference (such as enable at the start or end of the name) can have no 
warning; does this make sense?


was (Author: dcapwell):
Given the framework provided by this patch, the following wouldn't be hard to 
support (all, not just 1 or 2)

1) no warning or plan to remove
2) warning that it will be removed some day
3) warning on specific version which will remove

So, we could do the following

{code}
// provide a warning that this will not longer be supported after 5.0
@Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = 
Converter.MillisDurationConverter.class, scheduledRemoveBy = "5.0")
public volatile Duration native_transport_idle_timeout = new Duration("0ms");

// provide a warning that the property is deprecated and will be removed one day
@Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = 
Converter.MillisDurationConverter.class, deprecated = true)
public volatile Duration native_transport_idle_timeout = new Duration("0ms");

// no warning, both properties are fully supported
@Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = 
Converter.MillisDurationConverter.class)
public volatile Duration native_transport_idle_timeout = new Duration("0ms");
{code}

Given the framework makes it trivial to support old names, having no properties 
marked for removal of 5.0 works for me. If we really want to migrate usage to a 
new name, then mark it to be removed one day, and stuff which is personal 
preference (such as enable at the start or end of the name) can have no 
warning; does this make sense?

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other

2020-06-30 Thread Stephen Mallette (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Mallette updated CASSANDRA-15909:
-
Description: 
As part of CASSANDRA-15821 it became apparent that certain metric names found 
in keyspace and tables had different names but were in fact the same metric - 
they are as follows:

* Table.SyncTime == Keyspace.RepairSyncTime
* Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows
* Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime
* Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize
* Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize
* Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize
* Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize

Also, client metrics are the only metrics to start with a lower case letter. 
Change those to upper case to match all the other metrics.

Unifying this naming would help make metrics more consistent as part of 
CASSANDRA-15582

  was:
As part of CASSANDRA-15821 it became apparent that certain metric names found 
in keyspace and tables had different names but were in fact the same metric - 
they are as follows:

* Table.SyncTime == Keyspace.RepairSyncTime
* Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows
* Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime
* Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize
* Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize
* Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize
* Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize

Unifying this naming would help make metrics more consistent as part of 
CASSANDRA-15582


> Make Table/Keyspace Metric Names Consistent With Each Other
> ---
>
> Key: CASSANDRA-15909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15909
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
> Fix For: 4.0-beta
>
>
> As part of CASSANDRA-15821 it became apparent that certain metric names found 
> in keyspace and tables had different names but were in fact the same metric - 
> they are as follows:
> * Table.SyncTime == Keyspace.RepairSyncTime
> * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows
> * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime
> * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize
> * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize
> * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize
> * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize
> Also, client metrics are the only metrics to start with a lower case letter. 
> Change those to upper case to match all the other metrics.
> Unifying this naming would help make metrics more consistent as part of 
> CASSANDRA-15582



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15870) When 3.0 reads 2.1 data with a regular column set it expects the cellName to contain a element and fails if not true

2020-06-30 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148832#comment-17148832
 ] 

David Capwell commented on CASSANDRA-15870:
---

I have been bad about updating this; sorry.  I have a patch but have 4 other 
corruption issues on my plate, so prioritizing those over submitting this 
patch.  If anyone thinks they are bitten by this issue I can try to give higher 
priority to OSSing this patch

> When 3.0 reads 2.1 data with a regular column set it expects the 
> cellName to contain a element and fails if not true
> --
>
> Key: CASSANDRA-15870
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15870
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Local/SSTable
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> {code}
> java.lang.AssertionError
>   at org.apache.cassandra.db.rows.BufferCell.(BufferCell.java:48)
>   at 
> org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1461)
>   at 
> org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1380)
>   at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:549)
>   at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:523)
>   at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:133)
>   at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:59)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:364)
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:132)
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:123)
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:207)
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:160)
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:174)
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
>   at 
> org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:240)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:191)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:345)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is similar to other JIRA such as CASSANDRA-14113 but under 
> root causing both exceptions, they only share the same symptom and not the 
> same root cause; hence a new JIRA.
> This was found when a frozen collection was found when a multi-cell 
> collection was expected.  When this happened LegacyCellName#collectionElement 
> comes back as null which eventually gets asserted against in BufferCell 
> (complex cell needs a path).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-06-30 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148831#comment-17148831
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-15299 at 6/30/20, 5:08 PM:
---

Thanks for the comments [~ifesdjeen] & [~omichallat], I've pushed a few commits 
to the [branch|https://github.com/beobal/cassandra/commits/15299-trunk].

{quote} 
There are several things that I wanted to bring to your attention:
{quote}
I've handled most of these in a refactor of Flusher. As you suggested, for 
framed items we now collate the frames and only allocate the payloads when we 
flush to the netty channel. So now, we allocate the payload based on the actual 
number of bytes required for the specific channel.
{quote}{{ExceptionHandlers$PostV5ExceptionHandler#exceptionCaught}}: when 
flushing an exception, we don't call release on the payload.
{quote}
Included in the {{minor cleanups}} commit
{quote}There are several places in {{SimpleClient}} where 
{{largePayload#release}} isn't called.
{quote}
I've refactored the flushing large messages in {{SimpleClient}} to match 
{{Flusher}}, so this is working properly now.
{quote}Other things...
{quote}
{quote}{{Dispatcher#processRequest}}, we don't need to cast error to 
{{Message.Response}} if we change its type to {{ErrorMessage}}.
{quote}
In {{CqlMessageHandler#releaseAfterFlush}}, we can call 
{{sourceFrame#release()}} instead of {{sourceFrame.body.release()}} for 
consistency with other calls

Both in {{minor cleanups}}
{quote}{{Server#requestPayloadInFlightPerEndpoint}} can be a non-static 
{{Server}} member.
{quote}
If you don't mind I'd prefer to leave this as it is for now as it's 
pre-existing and changing would require reworking CASSANDRA-15519 (changing 
limits at runtime).
{quote}Should we hide {{flusher.queued.add()}} behind a method to disallow 
accessing queue directly?
{quote}
I've done this, but I'm not 100% convinced of its utility. As the two 
{{Flusher}} subclasses need access to the queue, we have to provide package 
private methods {{poll}} and {{isEmpty}} as well as one to {{enqueue}}. So 
unless we move {{Flusher}} to its own subpackage, the queue is effectively 
visible to everything else in {{o.a.c.Transport}}
{quote}We can change the code a bit to make {{FlushItemConverter}} instances 
explicit. Right now, we basically have two converters both called 
{{#toFlushItem}} in {{CQLMessageHandler}} and {{LegacyDispatchHandler}}. We 
could have them as inner classes. It's somewhat useful since if you change the 
signature of this method, or stop using it, it'll be hard to find that it is 
actually an implementation of converter.
{quote}
I've left this as it is just for the moment. I'm working on some tests which 
supply a lambda to act as the converter, so I'll come back to this when those 
have solidified a bit more.
{quote}Looks like {{MessageConsumer}} could be generic, since we cast it to 
either request or response.
{quote}
I've parameterised {{MessageConsumer}} & {{CQLMessageHandler}} according to the 
subclass of Message they expect and extended this a bit by moving the logic out 
of {{Message$ProtocolEncoder}} to an abstract {{Message$Decoder}} with concrete subclasses for {{Request}} and {{Response}}.

{quote}Looks like {{CQLMessageHandler#processCorruptFrame}}, initially had an 
intention of handling recovery, but now just throws a CRC exception regardless. 
This does match description, but usage of {{isRecoverable}} seems to be 
redundant here, unless we change semantics of recovery.
{quote}
It is somewhat redundant here, except that it logs a slightly different message 
to indicate whether the CRC mismatch was found in the frame header or body. 
I'll leave it as it is for now as it's technically possible to recover from a 
corrupt body, but would be problematic for clients just now.

I still have some comments to address, as well as those from [~omichallat] ...
{quote}{{Frame$Decoder}} and other classes that are related to legacy path can 
be extracted to a separate class, since {{Frame}} itself is still useful, but 
classes that facilitate legacy encoding/decoding/etc can be extracted.
{quote}
{quote}{{Frame#encodeHeaderInto}} seems to be duplicating the logic we have in 
{{Frame$Encoder#encodeHeader}}, should we unify the two? Maybe we can have 
encoding/decoding methods shared for both legacy and new paths, for example, as 
static methods?
{quote}
{quote}As you have mentioned, it would be great to rename {{Frame}} to 
something different, like {{Envelope}}, since right now we have 
{{FrameDecoder#Frame}} and {{Frame$Decoder}} and variable names that correspond 
with class names, which makes it all hard to follow.
{quote}


was (Author: beobal):
Thanks for the comments [~ifesdjeen] & [~omichallat], I've pushed a few commits 
to the [branch|https://github.com/beobal/cassandra/commits/15299-trunk].

{qu

[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-06-30 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148831#comment-17148831
 ] 

Sam Tunnicliffe commented on CASSANDRA-15299:
-

Thanks for the comments [~ifesdjeen] & [~omichallat].

{quote} 
There are several things that I wanted to bring to your attention:
{quote}
I've handled most of these in a refactor of Flusher. As you suggested, for 
framed items we now collate the frames and only allocate the payloads when we 
flush to the netty channel. So now, we allocate the payload based on the actual 
number of bytes required for the specific channel.
{quote}{{ExceptionHandlers$PostV5ExceptionHandler#exceptionCaught}}: when 
flushing an exception, we don't call release on the payload.
{quote}
Included in the {{minor cleanups}} commit
{quote}There are several places in {{SimpleClient}} where 
{{largePayload#release}} isn't called.
{quote}
I've refactored the flushing large messages in {{SimpleClient}} to match 
{{Flusher}}, so this is working properly now.
{quote}Other things...
{quote}
{quote}{{Dispatcher#processRequest}}, we don't need to cast error to 
{{Message.Response}} if we change its type to {{ErrorMessage}}.
{quote}
In {{CqlMessageHandler#releaseAfterFlush}}, we can call 
{{sourceFrame#release()}} instead of {{sourceFrame.body.release()}} for 
consistency with other calls

Both in {{minor cleanups}}
{quote}{{Server#requestPayloadInFlightPerEndpoint}} can be a non-static 
{{Server}} member.
{quote}
If you don't mind I'd prefer to leave this as it is for now as it's 
pre-existing and changing would require reworking CASSANDRA-15519 (changing 
limits at runtime).
{quote}Should we hide {{flusher.queued.add()}} behind a method to disallow 
accessing queue directly?
{quote}
I've done this, but I'm not 100% convinced of its utility. As the two 
{{Flusher}} subclasses need access to the queue, we have to provide package 
private methods {{poll}} and {{isEmpty}} as well as one to {{enqueue}}. So 
unless we move {{Flusher}} to its own subpackage, the queue is effectively 
visible to everything else in {{o.a.c.Transport}}
{quote}We can change the code a bit to make {{FlushItemConverter}} instances 
explicit. Right now, we basically have two converters both called 
{{#toFlushItem}} in {{CQLMessageHandler}} and {{LegacyDispatchHandler}}. We 
could have them as inner classes. It's somewhat useful since if you change the 
signature of this method, or stop using it, it'll be hard to find that it is 
actually an implementation of converter.
{quote}
> I've left this as it is just for the moment. I'm working on some tests which 
> supply a lambda to act as the converter, so I'll come back to this when those 
> have solidified a bit more.
{quote}Looks like {{MessageConsumer}} could be generic, since we cast it to 
either request or response.
{quote}
I've parameterised {{MessageConsumer}} & {{CQLMessageHandler}} according to the 
subclass of Message they expect and extended this a bit by moving the logic out 
of {{Message$ProtocolEncoder}} to an abstract\{{ Message$Decoder}} with concrete subclasses for {{Request}} and {{Response}}.

>bq.Looks like {{CQLMessageHandler#processCorruptFrame}}, initially had an 
>intention of handling recovery, but now just throws a CRC exception 
>regardless. This does match description, but usage of {{isRecoverable}} seems 
>to be redundant here, unless we change semantics of recovery.

It is somewhat redundant here, except that it logs a slightly different message 
to indicate whether the CRC mismatch was found in the frame header or body. 
I'll leave it as it is for now as it's technically possible to recover from a 
corrupt body, but would be problematic for clients just now.

I still have some comments to address, as well as those from [~omichallat] ...
{quote}Frame$Decoder and other classes that are related to legacy path can be 
extracted to a separate class, since Frame itself is still useful, but classes 
that facilitate legacy encoding/decoding/etc can be extracted.
{quote}
{quote}Frame#encodeHeaderInto seems to be duplicating the logic we have in 
Frame$Encoder#encodeHeader, should we unify the two? Maybe we can have 
encoding/decoding methods shared for both legacy and new paths, for example, as 
static methods?
{quote}
{quote}As you have mentioned, it would be great to rename Frame to something 
different, like Envelope, since right now we have FrameDecoder#Frame and 
Frame$Decoder and variable names that correspond with class names, which makes 
it all hard to follow.
{quote}

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> ---
>
> Key: CASSANDRA-15299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
> Project: Cassandra
>  Issue Type: Improvement
>  

[jira] [Comment Edited] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-06-30 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148831#comment-17148831
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-15299 at 6/30/20, 5:06 PM:
---

Thanks for the comments [~ifesdjeen] & [~omichallat], I've pushed a few commits 
to the [branch|https://github.com/beobal/cassandra/commits/15299-trunk].

{quote} 
There are several things that I wanted to bring to your attention:
{quote}
I've handled most of these in a refactor of Flusher. As you suggested, for 
framed items we now collate the frames and only allocate the payloads when we 
flush to the netty channel. So now, we allocate the payload based on the actual 
number of bytes required for the specific channel.
{quote}{{ExceptionHandlers$PostV5ExceptionHandler#exceptionCaught}}: when 
flushing an exception, we don't call release on the payload.
{quote}
Included in the {{minor cleanups}} commit
{quote}There are several places in {{SimpleClient}} where 
{{largePayload#release}} isn't called.
{quote}
I've refactored the flushing large messages in {{SimpleClient}} to match 
{{Flusher}}, so this is working properly now.
{quote}Other things...
{quote}
{quote}{{Dispatcher#processRequest}}, we don't need to cast error to 
{{Message.Response}} if we change its type to {{ErrorMessage}}.
{quote}
In {{CqlMessageHandler#releaseAfterFlush}}, we can call 
{{sourceFrame#release()}} instead of {{sourceFrame.body.release()}} for 
consistency with other calls

Both in {{minor cleanups}}
{quote}{{Server#requestPayloadInFlightPerEndpoint}} can be a non-static 
{{Server}} member.
{quote}
If you don't mind I'd prefer to leave this as it is for now as it's 
pre-existing and changing would require reworking CASSANDRA-15519 (changing 
limits at runtime).
{quote}Should we hide {{flusher.queued.add()}} behind a method to disallow 
accessing queue directly?
{quote}
I've done this, but I'm not 100% convinced of its utility. As the two 
{{Flusher}} subclasses need access to the queue, we have to provide package 
private methods {{poll}} and {{isEmpty}} as well as one to {{enqueue}}. So 
unless we move {{Flusher}} to its own subpackage, the queue is effectively 
visible to everything else in {{o.a.c.Transport}}
{quote}We can change the code a bit to make {{FlushItemConverter}} instances 
explicit. Right now, we basically have two converters both called 
{{#toFlushItem}} in {{CQLMessageHandler}} and {{LegacyDispatchHandler}}. We 
could have them as inner classes. It's somewhat useful since if you change the 
signature of this method, or stop using it, it'll be hard to find that it is 
actually an implementation of converter.
{quote}
> I've left this as it is just for the moment. I'm working on some tests which 
> supply a lambda to act as the converter, so I'll come back to this when those 
> have solidified a bit more.
{quote}Looks like {{MessageConsumer}} could be generic, since we cast it to 
either request or response.
{quote}
I've parameterised {{MessageConsumer}} & {{CQLMessageHandler}} according to the 
subclass of Message they expect and extended this a bit by moving the logic out 
of {{Message$ProtocolEncoder}} to an abstract\{{ Message$Decoder}} with concrete subclasses for {{Request}} and {{Response}}.

>bq.Looks like {{CQLMessageHandler#processCorruptFrame}}, initially had an 
>intention of handling recovery, but now just throws a CRC exception 
>regardless. This does match description, but usage of {{isRecoverable}} seems 
>to be redundant here, unless we change semantics of recovery.

It is somewhat redundant here, except that it logs a slightly different message 
to indicate whether the CRC mismatch was found in the frame header or body. 
I'll leave it as it is for now as it's technically possible to recover from a 
corrupt body, but would be problematic for clients just now.

I still have some comments to address, as well as those from [~omichallat] ...
{quote}Frame$Decoder and other classes that are related to legacy path can be 
extracted to a separate class, since Frame itself is still useful, but classes 
that facilitate legacy encoding/decoding/etc can be extracted.
{quote}
{quote}Frame#encodeHeaderInto seems to be duplicating the logic we have in 
Frame$Encoder#encodeHeader, should we unify the two? Maybe we can have 
encoding/decoding methods shared for both legacy and new paths, for example, as 
static methods?
{quote}
{quote}As you have mentioned, it would be great to rename Frame to something 
different, like Envelope, since right now we have FrameDecoder#Frame and 
Frame$Decoder and variable names that correspond with class names, which makes 
it all hard to follow.
{quote}


was (Author: beobal):
Thanks for the comments [~ifesdjeen] & [~omichallat].

{quote} 
There are several things that I wanted to bring to your attention:
{quote}
I've handled most of these in a refactor of Flusher

[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148804#comment-17148804
 ] 

Andres de la Peña commented on CASSANDRA-15579:
---

I'm keen to start work on unblocking this, but I don't know what should be the 
scope of this ticket or where to start.

We have a fair number of specific dtests around this area, at least:
 * 
[consistency_test|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py]
 * 
[replication_test|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py]
 * 
[read_repair_test|https://github.com/apache/cassandra-dtest/blob/master/read_repair_test.py]
 * 
[replica_side_filtering_test|https://github.com/apache/cassandra-dtest/blob/master/replica_side_filtering_test.py]

We also have some related in-jvm distributed tests, and things like 
coordination are also implicitly included in some other tests.

[~bdeggleston] Do we have a more specific list of what things do need testing, 
or what cases are missed in the existing tests? Have we identified especially 
suspicious components or use cases that can be prioritized?

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148785#comment-17148785
 ] 

Caleb Rackliffe edited comment on CASSANDRA-15861 at 6/30/20, 3:59 PM:
---

bq. if the sstables are already in compacting state, does it mean 
entire-sstable-streaming will be blocked until compaction is finished?

[~jasonstack] What if we just abort the ongoing compaction involving the 
SSTable we want to stream? (Then we can mark it ourselves for the period 
including manifest generation, stats streaming, and index summary streaming?)

The danger, I guess, is aborting compactions that are almost done. Two ways 
around that I can see. One is to try to prioritize ZCS for non-compacting 
SSTables first. The other is just to fall back to legacy streaming if the 
SSTable is already compacting. Or we can do both of those things.


was (Author: maedhroz):
bq. if the sstables are already in compacting state, does it mean 
entire-sstable-streaming will be blocked until compaction is finished?

[~jasonstack] What if we just abort the ongoing compaction involving the 
SSTable we want to stream? (Then we can mark it ourselves for the period 
including manifest generation, stats streaming, and index summary streaming?)

The danger, I guess, is aborting compactions that are almost done. Two ways 
around that I can see. One is to try to prioritize ZCS for non-compacting 
SSTables first. The other is just to fall back to legacy streaming if the 
SSTable is already compacting. Or we can combine them :)

> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 a

[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148785#comment-17148785
 ] 

Caleb Rackliffe edited comment on CASSANDRA-15861 at 6/30/20, 3:57 PM:
---

bq. if the sstables are already in compacting state, does it mean 
entire-sstable-streaming will be blocked until compaction is finished?

[~jasonstack] What if we just abort the ongoing compaction involving the 
SSTable we want to stream? (Then we can mark it ourselves for the period 
including manifest generation, stats streaming, and index summary streaming?)

The danger, I guess, is aborting compactions that are almost done. Two ways 
around that I can see. One is to try to prioritize ZCS for non-compacting 
SSTables first. The other is just to fall back to legacy streaming if the 
SSTable is already compacting. Or we can combine them :)


was (Author: maedhroz):
bq. if the sstables are already in compacting state, does it mean 
entire-sstable-streaming will be blocked until compaction is finished?

[~jasonstack] What if we just abort the ongoing compaction involving the 
SSTable we want to stream? (Then we can mark it ourselves for the period 
including manifest generation, stats streaming, and index summary streaming?)

> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{

[cassandra] branch trunk updated: Fix a log message typo in StartupChecks

2020-06-30 Thread aleksey
This is an automated email from the ASF dual-hosted git repository.

aleksey pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 3b8ed1e  Fix a log message typo in StartupChecks
3b8ed1e is described below

commit 3b8ed1eb4000119779e618935e60f46f80bad42f
Author: Aleksey Yeshchenko 
AuthorDate: Tue Jun 30 16:53:20 2020 +0100

Fix a log message typo in StartupChecks
---
 src/java/org/apache/cassandra/service/StartupChecks.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/java/org/apache/cassandra/service/StartupChecks.java 
b/src/java/org/apache/cassandra/service/StartupChecks.java
index e8a60f4..12bb309 100644
--- a/src/java/org/apache/cassandra/service/StartupChecks.java
+++ b/src/java/org/apache/cassandra/service/StartupChecks.java
@@ -150,7 +150,7 @@ public class StartupChecks
 }
 catch (AssertionError e)
 {
-logger.warn("lz4-java was unable to load native librarires; this 
will lower the performance of lz4 (network/sstables/etc.): {}", 
Throwables.getRootCause(e).getMessage());
+logger.warn("lz4-java was unable to load native libraries; this 
will lower the performance of lz4 (network/sstables/etc.): {}", 
Throwables.getRootCause(e).getMessage());
 }
 };
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148785#comment-17148785
 ] 

Caleb Rackliffe commented on CASSANDRA-15861:
-

bq. if the sstables are already in compacting state, does it mean 
entire-sstable-streaming will be blocked until compaction is finished?

[~jasonstack] What if we just abort the ongoing compaction involving the 
SSTable we want to stream? (Then we can mark it ourselves for the period 
including manifest generation, stats streaming, and index summary streaming?)

> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{CoordinatorSession#fail}}
> 4. Node1 receives its own repair-failure-message and fails its local repair 
> sessions at {{LocalSessions#failSession}} which triggers async background 
> compaction.
> 5. Node1's background compaction will mutate sstable's repairAt to 0 and 
> pending repair id to null via  
> {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more 
> in-progress repair.
> 6. Node1 actually sends the sstable to node3 where the sstable's STATS 
> component size is different from the original size recorded in the manifest.
> 7. At the end, node3 reports checksum validation failure when it tries to 
> mutate sstable level and "isTransient" attribute in 
> {{CassandraEntireSSTableStreamReader#

[jira] [Commented] (CASSANDRA-15901) Fix unit tests to load test/conf/cassandra.yaml (so to listen on a valid ip)

2020-06-30 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148764#comment-17148764
 ] 

Michael Semb Wever commented on CASSANDRA-15901:


Agreed!
New run 
[here|https://ci-cassandra.apache.org/job/Cassandra-devbranch-test/157/] (on 
cassandra35)

> Fix unit tests to load test/conf/cassandra.yaml (so to listen on a valid ip)
> 
>
> Key: CASSANDRA-15901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15901
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Many of the ci-cassandra jenkins runs fail on {{ip-10-0-5-5: Name or service 
> not known}}. CASSANDRA-15622 addressed some of these but many still remain. 
> Currently test C* nodes are either failing or listening on a public ip 
> depending on which agent they end up.
> The idea behind this ticket is to make ant force the private VPC ip in the 
> cassandra yaml when building, this will force the nodes to listen on the 
> correct ip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release

2020-06-30 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-13994:

Status: Patch Available  (was: Review In Progress)

> Remove dead compact storage code before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-beta
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release

2020-06-30 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-13994:

Status: In Progress  (was: Patch Available)

> Remove dead compact storage code before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-beta
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release

2020-06-30 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148762#comment-17148762
 ] 

Ekaterina Dimitrova commented on CASSANDRA-13994:
-

Thank you [~slebresne] and [~aleksey] for your input. I just moved it to beta 
so we can concentrate now on the final outstanding alpha tickets.

I will rebase and cut the scope of the patch to the removal of the dead code as 
agreed, also will take into consideration the points [~slebresne] made in his 
initial review.

Moving it back to open to show that there is still work to be done but not 
working on it in this very moment.

> Remove dead compact storage code before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-beta
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15891) provide a configuration option such as endpoint_verification_method

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15891:
---
Fix Version/s: 4.x

> provide a configuration option such as endpoint_verification_method
> ---
>
> Key: CASSANDRA-15891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15891
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Thanh
>Priority: Normal
> Fix For: 4.x
>
>
> With cassandra-9220, it's possible to configure endpoint/hostname 
> verification when enabling internode encryption.  However, you don't have any 
> control over what endpoint is used for the endpoint verification; instead, 
> cassandra will automatically try to use node IP (not node hostname) for 
> endpoint verification, so if your node certificates don't include the IP in 
> the ssl certificate's SAN list, then you'll get an error like:
> {code:java}
> ERROR [MessagingService-Outgoing-/10.10.88.194-Gossip] 2018-11-13 
> 10:20:26,903 OutboundTcpConnection.java:606 - SSL handshake error for 
> outbound connection to 50cc97c1[SSL_NULL_WITH_NULL_NULL: 
> Socket[addr=/,port=7001,localport=47684]] 
> javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
> No subject alternative names matching IP address  found 
> at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) {code}
> From what I've seen, most orgs will not have node IPs in their certs.
> So, it will be best if cassandra would provide another configuration option 
> such as *{{endpoint_verification_method}}* which you could set to "ip" or 
> "fqdn" or something else (eg "hostname_alias" if for whatever reason the org 
> doesn't want to use fqdn for endpoint verification).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release

2020-06-30 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-13994:

Fix Version/s: (was: 4.0-alpha)
   4.0-beta

> Remove dead compact storage code before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-beta
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15850:
---
Fix Version/s: 4.x

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
> Fix For: 4.x
>
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15880) Memory leak in CompressedChunkReader

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15880:
---
Fix Version/s: 3.11.x
   4.0

> Memory leak in CompressedChunkReader
> 
>
> Key: CASSANDRA-15880
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15880
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Compression
>Reporter: Jaroslaw Grabowski
>Priority: Normal
> Fix For: 4.0, 3.11.x
>
>
> CompressedChunkReader uses java.lang.ThreadLocal to reuse ByteBuffer for 
> compressed data. ByteBuffers leak due to peculiar ThreadLocal quality.
> ThreadLocals are stored in a map, where the key is a weak reference to a 
> ThreadLocal and the value is the user's object (ByteBuffer in this case). 
> When a last strong reference to a ThreadLocal is lost, weak reference to 
> ThreadLocal (key) is removed but the value (ByteBuffer) is kept until cleaned 
> by ThreadLocal heuristic expunge mechanism. See ThreadLocal's "stale entries" 
> for details.
> When a number of long-living threads is high enough this results in thousands 
> of ByteBuffers stored as stale entries in ThreadLocals. In a not-so-lucky 
> scenario we get OutOfMemoryException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15856) Security vulnerabilities with dependency jars of Cassandra 3.11.6

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15856:
---
Fix Version/s: 3.11.x

> Security vulnerabilities with dependency jars  of Cassandra 3.11.6
> --
>
> Key: CASSANDRA-15856
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15856
> Project: Cassandra
>  Issue Type: Task
>Reporter: Kshitiz Saxena
>Priority: Normal
> Fix For: 3.11.x
>
>
> The latest release of Cassandra 3.11.6 has few dependency jars which have 
> some security vulnerabilities.
>  
> Apache Thrift (org.apache.thrift:libthrift:0.9.2) has below mentioned 
> security vulnerabilities reported
> |+[https://nvd.nist.gov/vuln/detail/CVE-2016-5397]+|
> |+[https://nvd.nist.gov/vuln/detail/CVE-2018-1320]+|
> |+[https://nvd.nist.gov/vuln/detail/CVE-2019-0205]+|
>  
> Netty Project (io.netty:netty-all:4.0.44.Final) has below mentioned security 
> vulnerabilities reported
> |+[https://nvd.nist.gov/vuln/detail/CVE-2019-16869]+|
> |+[https://nvd.nist.gov/vuln/detail/CVE-2019-20444]+|
> |+[https://nvd.nist.gov/vuln/detail/CVE-2019-20445]+|
>  
> Is there a plan to upgrade these jars in any upcoming release?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15903) Doc update: stream-entire-sstable supports all compaction strategies and internode encryption

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15903:
---
Fix Version/s: 4.0

> Doc update: stream-entire-sstable supports all compaction strategies and 
> internode encryption
> -
>
> Key: CASSANDRA-15903
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15903
> Project: Cassandra
>  Issue Type: Task
>Reporter: ZhaoYang
>Priority: Normal
> Fix For: 4.0
>
>
> As [~mck] point out, doc needs to be updated for CASSANDRA-15657  and 
> CASSANDRA-15740.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15866) stream sstable attached index files entirely with data file

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15866:
---
Fix Version/s: 4.x

> stream sstable attached index files entirely with data file
> ---
>
> Key: CASSANDRA-15866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Streaming
>Reporter: ZhaoYang
>Priority: Normal
> Fix For: 4.x
>
>
> When sstable is streamed entirely, there is no need to rebuild sstable 
> attached index on receiver if index files can be streamed entirely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15898) cassandra 3.11.4 deadlock

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15898:
---
Fix Version/s: 3.11.x

> cassandra 3.11.4 deadlock
> -
>
> Key: CASSANDRA-15898
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15898
> Project: Cassandra
>  Issue Type: Bug
>Reporter: john doe
>Priority: Normal
> Fix For: 3.11.x
>
>
> We are running apache-cassandra-3.11.4, 10 node cluster with -Xms32G -Xmx32G 
> -Xmn8G using CMS.
> after running couple of days one of the node become unresponsive and 
> threaddump (jstack -F) shows deadlock.
> Found one Java-level deadlock:
> =
> "Native-Transport-Requests-144": waiting to lock Monitor@0x7cd5142e4d08 
> (Object@0x7f6e00348268, a java/io/ExpiringCache),
>  which is held by "CompactionExecutor:115134"
> "CompactionExecutor:115134": waiting to lock Monitor@0x7f6bcaf130f8 
> (Object@0x7f6dff31faa0, a 
> ch/qos/logback/core/joran/spi/ConfigurationWatchList),
>  which is held by "Native-Transport-Requests-144"
> Found a total of 1 deadlock.
> I have seen this couple of time now with different nodes with following in 
> system.log
> IndexSummaryRedistribution.java:77 - Redistributing index summaries
>  NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot 
> allocate chunk of 1048576
> also lookin in gc log there has not been a ParNew collection for last 10hrs, 
> only CMS collections.
> 1739842.375: [GC (CMS Final Remark) [YG occupancy: 2712269 K (7549760 K)]
> 1739842.375: [Rescan (parallel) , 0.0614157 secs]
> 1739842.437: [weak refs processing, 0.994 secs]
> 1739842.437: [class unloading, 0.0231076 secs]
> 1739842.460: [scrub symbol table, 0.0061049 secs]
> 1739842.466: [scrub string table, 0.0043847 secs][1 CMS-remark: 
> 17696837K(25165824K)] 20409107K(32715584K), 0.0953750 secs] [Times: user=2.95 
> sys=0.00, real=0.09 secs]
> 1739842.471: [CMS-concurrent-sweep-start]
> 1739848.572: [CMS-concurrent-sweep: 6.101/6.101 secs] [Times: user=6.13 
> sys=0.00, real=6.10 secs]
> 1739848.573: [CMS-concurrent-reset-start]
> 1739848.645: [CMS-concurrent-reset: 0.072/0.072 secs] [Times: user=0.08 
> sys=0.00, real=0.08 secs]
> 1739858.653: [GC (CMS Initial Mark) [1 CMS-initial-mark: 
> 17696837K(25165824K)] 
> 20409111K(32715584K), 0.0584838 secs] [Times: user=2.68 sys=0.00, real=0.06 
> secs]
> 1739858.713: [CMS-concurrent-mark-start]
> 1739860.496: [CMS-concurrent-mark: 1.784/1.784 secs] [Times: user=84.77 
> sys=0.00, real=1.79 secs]
> 1739860.497: [CMS-concurrent-preclean-start]
> 1739860.566: [CMS-concurrent-preclean: 0.070/0.070 secs] [Times: user=0.07 
> sys=0.00, real=0.07 secs]
> 1739860.567: [CMS-concurrent-abortable-preclean-start]CMS: abort preclean due 
> to time
> 1739866.333: [CMS-concurrent-abortable-preclean: 5.766/5.766 secs] [Times: 
> user=5.80 sys=0.00, real=5.76 secs]
> Java HotSpot(TM) 64-Bit Server VM (25.162-b12) for linux-amd64 JRE 
> (1.8.0_162-b12)
> Memory: 4k page, physical 792290076k(2780032k free), swap 16777212k(16693756k 
> free)
> CommandLine flags:
> -XX:+AlwaysPreTouch
> -XX:CICompilerCount=15
> -XX:+CMSClassUnloadingEnabled
> -XX:+CMSEdenChunksRecordAlways
> -XX:CMSInitiatingOccupancyFraction=40
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSParallelRemarkEnabled
> -XX:CMSWaitDuration=1
> -XX:ConcGCThreads=50
> -XX:+CrashOnOutOfMemoryError
> -XX:GCLogFileSize=10485760
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:InitialHeapSize=34359738368
> -XX:InitialTenuringThreshold=1
> -XX:+ManagementServer
> -XX:MaxHeapSize=34359738368
> -XX:MaxNewSize=8589934592
> -XX:MaxTenuringThreshold=1
> -XX:MinHeapDeltaBytes=196608
> -XX:NewSize=8589934592
> -XX:NumberOfGCLogFiles=10
> -XX:OldPLABSize=16
> -XX:OldSize=25769803776
> -XX:OnOutOfMemoryError=kill -9 %p
> -XX:ParallelGCThreads=50
> -XX:+PerfDisableSharedMem
> -XX:+PrintGC
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+ResizeTLAB
> -XX:StringTableSize=103
> -XX:SurvivorRatio=8
> -XX:ThreadPriorityPolicy=42
> -XX:ThreadStackSize=256
> -XX:-UseBiasedLocking
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseCondCardMark
> -XX:+UseFastUnorderedTimeStamps
> -XX:+UseGCLogFileRotation
> -XX:+UseNUMA
> -XX:+UseNUMAInterleaving
> -XX:+UseParNewGC
> -XX:+UseTLAB
> -XX:+UseThreadPriorities



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15908) Improve messaging on indexing frozen collections

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15908:
---
Fix Version/s: 4.x

> Improve messaging on indexing frozen collections
> 
>
> Key: CASSANDRA-15908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15908
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Rocco Varela
>Assignee: Rocco Varela
>Priority: Low
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When attempting to create an index on a frozen collection the error message 
> produced can be improved to provide more detail about the problem and 
> possible workarounds. Currently, a user will receive a message indicating 
> "...Frozen collections only support full() indexes" which is not immediately 
> clear for users new to Cassandra indexing and datatype compatibility.
> Here is an example:
> {code:java}
> cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> CREATE TABLE test.mytable ( id int primary key, addresses 
> frozen> );
> cqlsh> CREATE INDEX mytable_addresses_idx on test.mytable (addresses);
>  InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
> create values() index on frozen column addresses. Frozen collections only 
> support full() indexes"{code}
>  
> I'm proposing possibly enhancing the messaging to something like this.
> {quote}Cannot create values() index on frozen column addresses. Frozen 
> collections only support indexes on the entire data structure due to 
> immutability constraints of being frozen, wrap your frozen column with the 
> full() target type to index properly.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15887) Document how to run Cassandra on Windows

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15887:
---
Fix Version/s: (was: 4.x)
   4.0

> Document how to run Cassandra on Windows
> 
>
> Key: CASSANDRA-15887
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15887
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Website
>Reporter: João Reis
>Assignee: Berenguer Blasi
>Priority: Low
> Fix For: 4.0
>
>
> The "Getting Started" section on the website only has instructions about 
> installing Cassandra on Linux.
> It would help us drive Cassandra adoption if we had instructions for 
> developers that want to run Cassandra on their Windows development 
> environment.
> We should include instructions on how to use the existing powershell scripts 
> to run Cassandra on native Windows but the docs should recommend users to 
> prefer using WSL2/Docker before attempting to run it natively in my opinion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table

2020-06-30 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148745#comment-17148745
 ] 

Caleb Rackliffe commented on CASSANDRA-14888:
-

Hi [~Ryangdotson]. Does the file you attached ({{ExtendedDictionary.java}}) 
relate to this issue? It doesn't look like it, but just making sure...

> Several mbeans are not unregistered when dropping a keyspace and table
> --
>
> Key: CASSANDRA-14888
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14888
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Ariel Weisberg
>Assignee: Alex Deparvu
>Priority: Urgent
>  Labels: patch-available
> Fix For: 4.0-beta
>
> Attachments: CASSANDRA-14888.patch, ExtendedDictionary.java
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> CasCommit, CasPrepare, CasPropose, ReadRepairRequests, 
> ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, 
> PartitionsValidated, RepairPrepareTime, RepairSyncTime, 
> RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, 
> WriteFailedIdealCL
> Basically for 3 years people haven't known what they are doing because the 
> entire thing is kind of obscure. Fix it and also add a dtest that detects if 
> any mbeans are left behind after dropping a table and keyspace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15887) Document how to run Cassandra on Windows

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15887:
---
Fix Version/s: (was: 4.0)
   4.x

> Document how to run Cassandra on Windows
> 
>
> Key: CASSANDRA-15887
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15887
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Website
>Reporter: João Reis
>Assignee: Berenguer Blasi
>Priority: Low
> Fix For: 4.x
>
>
> The "Getting Started" section on the website only has instructions about 
> installing Cassandra on Linux.
> It would help us drive Cassandra adoption if we had instructions for 
> developers that want to run Cassandra on their Windows development 
> environment.
> We should include instructions on how to use the existing powershell scripts 
> to run Cassandra on native Windows but the docs should recommend users to 
> prefer using WSL2/Docker before attempting to run it natively in my opinion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15887) Document how to run Cassandra on Windows

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15887:
---
Fix Version/s: 4.0

> Document how to run Cassandra on Windows
> 
>
> Key: CASSANDRA-15887
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15887
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Website
>Reporter: João Reis
>Assignee: Berenguer Blasi
>Priority: Low
> Fix For: 4.0
>
>
> The "Getting Started" section on the website only has instructions about 
> installing Cassandra on Linux.
> It would help us drive Cassandra adoption if we had instructions for 
> developers that want to run Cassandra on their Windows development 
> environment.
> We should include instructions on how to use the existing powershell scripts 
> to run Cassandra on native Windows but the docs should recommend users to 
> prefer using WSL2/Docker before attempting to run it natively in my opinion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15857) Frozen RawTuple is not annotated with frozen in the toString method

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15857:
---
Fix Version/s: 3.11.x
   4.0

> Frozen RawTuple is not annotated with frozen in the toString method
> ---
>
> Key: CASSANDRA-15857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15857
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0, 3.11.x
>
>
> All raw types (e.g. RawCollection, RawUT) that supports freezing wraps the 
> type name with 'frozen<>' in the toString method, except RawTuple.
> Therefore, the RawTuple::toString output misses the frozen wrapper.
> Tuple is always frozen. However since CASSANDRA-15035, it throws when the 
> inner tuple is not explicitly wrapped with frozen within a collection.
> The method, CQL3Type.Raw::toString, is referenced at multiple places in the 
> source. For example, referenced in CreateTypeStatement.Raw and involved in 
> CQLSSTableWriter. Another example is that it is called to produce the 
> SchemaChange at several AlterSchemaStatement implementations.
> A test can prove that missing the frozen wrapper causes exception when 
> building CQLSSTableWriter for user types defined like below. Note that the 
> inner tuple is wrapped with frozen in the initial CQL statement.
> {code:java}
> CREATE TYPE ks.fooType ( f list>> )
> {code}
> {code:java}
> org.apache.cassandra.exceptions.InvalidRequestException: Non-frozen tuples 
> are not allowed inside collections: list>
>   at 
> org.apache.cassandra.cql3.CQL3Type$Raw$RawCollection.throwNestedNonFrozenError(CQL3Type.java:710)
>   at 
> org.apache.cassandra.cql3.CQL3Type$Raw$RawCollection.prepare(CQL3Type.java:669)
>   at 
> org.apache.cassandra.cql3.CQL3Type$Raw$RawCollection.prepareInternal(CQL3Type.java:661)
>   at 
> org.apache.cassandra.schema.Types$RawBuilder$RawUDT.lambda$prepare$1(Types.java:341)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.cassandra.schema.Types$RawBuilder$RawUDT.prepare(Types.java:342)
>   at org.apache.cassandra.schema.Types$RawBuilder.build(Types.java:291)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.createTypes(CQLSSTableWriter.java:551)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.build(CQLSSTableWriter.java:527)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15896) NullPointerException in SELECT JSON statement when a UUID field contains an empty string

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15896:
---
Fix Version/s: 3.0.x

> NullPointerException in SELECT JSON statement when a UUID field contains an 
> empty string
> 
>
> Key: CASSANDRA-15896
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15896
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter, CQL/Semantics
>Reporter: Ostico
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> It seems that Cassandra accept empty strings "" ( FROM JSON string ) for UUID 
> fields but crash when asking for JSON serialization of those fields.
>  
> Cassandra version 3.6.11.6 running in docker from official Dockerhub image.
> Java driver:
> {code:java}
> 
> 
> com.datastax.oss
> java-driver-core
> 4.7.0
> 
> {code}
> The attached code is to allow bug reproducibility:
> {code:java}
> package com.foo.bar;
> import com.datastax.oss.driver.api.core.CqlSession;
> import com.datastax.oss.driver.api.core.CqlSessionBuilder;
> import com.datastax.oss.driver.api.core.cql.PreparedStatement;
> import com.datastax.oss.driver.api.core.cql.ResultSet;
> import com.datastax.oss.driver.api.core.cql.Row;
> import com.fasterxml.jackson.databind.ObjectMapper;
> import org.junit.After;
> import org.junit.Before;
> import org.junit.Test;
> import java.net.InetSocketAddress;
> import java.net.URI;
> import java.util.*;
> import static org.junit.Assert.assertFalse;
> import static org.junit.Assert.assertNotNull;
> /**
>  * @author Domenico Lupinetti  - 23/06/2020
>  */
> public class NullPointerExceptionTest {
> protected String uuid;
> protected CqlSession cqlSession;
> @Before
> public void setUp() throws Exception {
> URI node = new URI( "tcp://localhost:9042" );
> final CqlSessionBuilder builder = CqlSession.builder();
> cqlSession = builder.addContactPoint( new InetSocketAddress(
> node.getHost(),
> node.getPort()
> ) ).withLocalDatacenter( "datacenter1" ).build();
> cqlSession.execute( "CREATE KEYSPACE IF NOT EXISTS test_suite WITH 
> replication = {'class':'SimpleStrategy','replication_factor':1};" );
> String sb = "CREATE TABLE IF NOT EXISTS test_suite.test ( id uuid 
> PRIMARY KEY, another_id uuid, subject text );";
> cqlSession.execute( sb );
> PreparedStatement stm = cqlSession.prepare( "INSERT INTO 
> test_suite.test JSON :payload" );
> this.uuid = UUID.randomUUID().toString();
> HashMap payload = new HashMap<>();
> payload.put( "id", this.uuid );
> // *** This exception do not happens if the field is set as NULL
> payload.put( "another_id", "" );  //<-- EMPTY STRING AS UUID
> payload.put( "subject", "Alighieri, Dante. Divina Commedia" );
> ObjectMapper objM = new ObjectMapper();
> cqlSession.execute(
> stm.bind().setString( "payload", objM.writeValueAsString( 
> payload ) )
> );  //<-- serialize as JSON
> }
> @After
> public void tearDown() throws Exception {
> cqlSession.execute( "DROP TABLE IF EXISTS test_suite.test;" );
> cqlSession.execute( "DROP KEYSPACE test_suite;" );
> cqlSession.close();
> }
> @Test
> public void testNullPointer() {
> PreparedStatement stmt   = cqlSession.prepare( "SELECT JSON id, 
> another_id FROM test_suite.test where id = :id;" );
> ResultSet resultSet  = cqlSession.execute( 
> stmt.bind().setUuid( "id", UUID.fromString( this.uuid ) ) ); // <-- 
> EXCEPTION
> Row   r  = resultSet.one();
> assertNotNull( r );
> assertNotNull( r.getString( "[json]" ) );
> assertFalse( Objects.requireNonNull( r.getString( "[json]" ) 
> ).isEmpty() );
> }
> }
> {code}
> Client stack Trace:
> {code:java}
> com.datastax.oss.driver.api.core.servererrors.ServerError: 
> java.lang.NullPointerExceptioncom.datastax.oss.driver.api.core.servererrors.ServerError:
>  java.lang.NullPointerException
>  at 
> com.datastax.oss.driver.api.core.servererrors.ServerError.copy(ServerError.java:54)
>  at 
> com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
>  at 
> com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
>  at 
> com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
>  at 
> com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
>  at 
> com.datastax.oss.d

[jira] [Updated] (CASSANDRA-15896) NullPointerException in SELECT JSON statement when a UUID field contains an empty string

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15896:
---
Fix Version/s: (was: 3.0.x)

> NullPointerException in SELECT JSON statement when a UUID field contains an 
> empty string
> 
>
> Key: CASSANDRA-15896
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15896
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter, CQL/Semantics
>Reporter: Ostico
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.0, 3.11.x
>
>
> It seems that Cassandra accept empty strings "" ( FROM JSON string ) for UUID 
> fields but crash when asking for JSON serialization of those fields.
>  
> Cassandra version 3.6.11.6 running in docker from official Dockerhub image.
> Java driver:
> {code:java}
> 
> 
> com.datastax.oss
> java-driver-core
> 4.7.0
> 
> {code}
> The attached code is to allow bug reproducibility:
> {code:java}
> package com.foo.bar;
> import com.datastax.oss.driver.api.core.CqlSession;
> import com.datastax.oss.driver.api.core.CqlSessionBuilder;
> import com.datastax.oss.driver.api.core.cql.PreparedStatement;
> import com.datastax.oss.driver.api.core.cql.ResultSet;
> import com.datastax.oss.driver.api.core.cql.Row;
> import com.fasterxml.jackson.databind.ObjectMapper;
> import org.junit.After;
> import org.junit.Before;
> import org.junit.Test;
> import java.net.InetSocketAddress;
> import java.net.URI;
> import java.util.*;
> import static org.junit.Assert.assertFalse;
> import static org.junit.Assert.assertNotNull;
> /**
>  * @author Domenico Lupinetti  - 23/06/2020
>  */
> public class NullPointerExceptionTest {
> protected String uuid;
> protected CqlSession cqlSession;
> @Before
> public void setUp() throws Exception {
> URI node = new URI( "tcp://localhost:9042" );
> final CqlSessionBuilder builder = CqlSession.builder();
> cqlSession = builder.addContactPoint( new InetSocketAddress(
> node.getHost(),
> node.getPort()
> ) ).withLocalDatacenter( "datacenter1" ).build();
> cqlSession.execute( "CREATE KEYSPACE IF NOT EXISTS test_suite WITH 
> replication = {'class':'SimpleStrategy','replication_factor':1};" );
> String sb = "CREATE TABLE IF NOT EXISTS test_suite.test ( id uuid 
> PRIMARY KEY, another_id uuid, subject text );";
> cqlSession.execute( sb );
> PreparedStatement stm = cqlSession.prepare( "INSERT INTO 
> test_suite.test JSON :payload" );
> this.uuid = UUID.randomUUID().toString();
> HashMap payload = new HashMap<>();
> payload.put( "id", this.uuid );
> // *** This exception do not happens if the field is set as NULL
> payload.put( "another_id", "" );  //<-- EMPTY STRING AS UUID
> payload.put( "subject", "Alighieri, Dante. Divina Commedia" );
> ObjectMapper objM = new ObjectMapper();
> cqlSession.execute(
> stm.bind().setString( "payload", objM.writeValueAsString( 
> payload ) )
> );  //<-- serialize as JSON
> }
> @After
> public void tearDown() throws Exception {
> cqlSession.execute( "DROP TABLE IF EXISTS test_suite.test;" );
> cqlSession.execute( "DROP KEYSPACE test_suite;" );
> cqlSession.close();
> }
> @Test
> public void testNullPointer() {
> PreparedStatement stmt   = cqlSession.prepare( "SELECT JSON id, 
> another_id FROM test_suite.test where id = :id;" );
> ResultSet resultSet  = cqlSession.execute( 
> stmt.bind().setUuid( "id", UUID.fromString( this.uuid ) ) ); // <-- 
> EXCEPTION
> Row   r  = resultSet.one();
> assertNotNull( r );
> assertNotNull( r.getString( "[json]" ) );
> assertFalse( Objects.requireNonNull( r.getString( "[json]" ) 
> ).isEmpty() );
> }
> }
> {code}
> Client stack Trace:
> {code:java}
> com.datastax.oss.driver.api.core.servererrors.ServerError: 
> java.lang.NullPointerExceptioncom.datastax.oss.driver.api.core.servererrors.ServerError:
>  java.lang.NullPointerException
>  at 
> com.datastax.oss.driver.api.core.servererrors.ServerError.copy(ServerError.java:54)
>  at 
> com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
>  at 
> com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
>  at 
> com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
>  at 
> com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
>  at 
> com.datastax.o

[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15902:
---
Fix Version/s: 3.11.x

> OOM because repair session thread not closed when terminating repair
> 
>
> Key: CASSANDRA-15902
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15902
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Swen Fuhrmann
>Assignee: Swen Fuhrmann
>Priority: Normal
> Fix For: 3.11.x
>
> Attachments: heap-mem-histo.txt, repair-terminated.txt
>
>
> In our cluster, after a while some nodes running slowly out of memory. On 
> that nodes we observed that Cassandra Reaper terminate repairs with a JMX 
> call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} because 
> reaching timeout of 30 min.
> In the memory heap dump we see lot of instances of 
> {{io.netty.util.concurrent.FastThreadLocalThread}} occupy most of the memory:
> {noformat}
> 119 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x51a80" occupy 8.445.684.480 (93,96 
> %) bytes. {noformat}
> In the thread dump we see lot of repair threads:
> {noformat}
> grep "Repair#" threaddump.txt | wc -l
>   50 {noformat}
>  
> The repair jobs are waiting for the validation to finish:
> {noformat}
> "Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 
> nid=0x542a waiting on condition [0x7f81ee414000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007939bcfc8> (a 
> com.google.common.util.concurrent.AbstractFuture$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
> at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
> at 
> com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
> at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
>  Source)
> at java.lang.Thread.run(Thread.java:748) {noformat}
>  
> Thats the line where the threads stuck:
> {noformat}
> // Wait for validation to complete
> Futures.getUnchecked(validations); {noformat}
>  
> The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops 
> the thread pool executor. It looks like that futures which are in progress 
> will therefor never be completed and the repair thread waits forever and 
> won't be finished.
>  
> Environment:
> Cassandra version: 3.11.4 and 3.11.6
> Cassandra Reaper: 1.4.0
> JVM memory settings:
> {noformat}
> -Xms11771M -Xmx11771M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 
> -XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat}
> on another cluster with same issue:
> {noformat}
> -Xms31744M -Xmx31744M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 
> -XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat}
> Java Runtime:
> {noformat}
> openjdk version "1.8.0_212"
> OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
> OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) 
> {noformat}
>  
> The same issue described in this comment: 
> https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973
> As suggested in the comments I created this new specific ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15851) Add bytebuddy support for in-jvm dtests

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15851:
---
Fix Version/s: 4.0

> Add bytebuddy support for in-jvm dtests
> ---
>
> Key: CASSANDRA-15851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15851
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>
> Old python dtests support byteman, but that is quite horrible to work with, 
> [bytebuddy|https://bytebuddy.net/#/] is much better, so we should add support 
> for that in the in-jvm dtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15870) When 3.0 reads 2.1 data with a regular column set it expects the cellName to contain a element and fails if not true

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15870:
---
Fix Version/s: 3.11.x
   3.0.x

> When 3.0 reads 2.1 data with a regular column set it expects the 
> cellName to contain a element and fails if not true
> --
>
> Key: CASSANDRA-15870
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15870
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema, Local/SSTable
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> {code}
> java.lang.AssertionError
>   at org.apache.cassandra.db.rows.BufferCell.(BufferCell.java:48)
>   at 
> org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1461)
>   at 
> org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1380)
>   at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:549)
>   at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:523)
>   at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:133)
>   at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:59)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:364)
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:132)
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:123)
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:207)
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:160)
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:174)
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
>   at 
> org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:240)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:191)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:345)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is similar to other JIRA such as CASSANDRA-14113 but under 
> root causing both exceptions, they only share the same symptom and not the 
> same root cause; hence a new JIRA.
> This was found when a frozen collection was found when a multi-cell 
> collection was expected.  When this happened LegacyCellName#collectionElement 
> comes back as null which eventually gets asserted against in BufferCell 
> (complex cell needs a path).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15904) nodetool getendpoints man page improvements

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15904:
---
Fix Version/s: 4.x

> nodetool getendpoints man page improvements
> ---
>
> Key: CASSANDRA-15904
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15904
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Arvinder Singh
>Assignee: Erick Ramirez
>Priority: Normal
> Fix For: 4.x
>
>
> Please include support for compound primary key. Ex:
> nodetool getendpoints keyspace1 table1 pk1:pk2:pk2
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15896) NullPointerException in SELECT JSON statement when a UUID field contains an empty string

2020-06-30 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15896:
---
Fix Version/s: 3.11.x
   3.0.x
   4.0

> NullPointerException in SELECT JSON statement when a UUID field contains an 
> empty string
> 
>
> Key: CASSANDRA-15896
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15896
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter, CQL/Semantics
>Reporter: Ostico
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> It seems that Cassandra accept empty strings "" ( FROM JSON string ) for UUID 
> fields but crash when asking for JSON serialization of those fields.
>  
> Cassandra version 3.6.11.6 running in docker from official Dockerhub image.
> Java driver:
> {code:java}
> 
> 
> com.datastax.oss
> java-driver-core
> 4.7.0
> 
> {code}
> The attached code is to allow bug reproducibility:
> {code:java}
> package com.foo.bar;
> import com.datastax.oss.driver.api.core.CqlSession;
> import com.datastax.oss.driver.api.core.CqlSessionBuilder;
> import com.datastax.oss.driver.api.core.cql.PreparedStatement;
> import com.datastax.oss.driver.api.core.cql.ResultSet;
> import com.datastax.oss.driver.api.core.cql.Row;
> import com.fasterxml.jackson.databind.ObjectMapper;
> import org.junit.After;
> import org.junit.Before;
> import org.junit.Test;
> import java.net.InetSocketAddress;
> import java.net.URI;
> import java.util.*;
> import static org.junit.Assert.assertFalse;
> import static org.junit.Assert.assertNotNull;
> /**
>  * @author Domenico Lupinetti  - 23/06/2020
>  */
> public class NullPointerExceptionTest {
> protected String uuid;
> protected CqlSession cqlSession;
> @Before
> public void setUp() throws Exception {
> URI node = new URI( "tcp://localhost:9042" );
> final CqlSessionBuilder builder = CqlSession.builder();
> cqlSession = builder.addContactPoint( new InetSocketAddress(
> node.getHost(),
> node.getPort()
> ) ).withLocalDatacenter( "datacenter1" ).build();
> cqlSession.execute( "CREATE KEYSPACE IF NOT EXISTS test_suite WITH 
> replication = {'class':'SimpleStrategy','replication_factor':1};" );
> String sb = "CREATE TABLE IF NOT EXISTS test_suite.test ( id uuid 
> PRIMARY KEY, another_id uuid, subject text );";
> cqlSession.execute( sb );
> PreparedStatement stm = cqlSession.prepare( "INSERT INTO 
> test_suite.test JSON :payload" );
> this.uuid = UUID.randomUUID().toString();
> HashMap payload = new HashMap<>();
> payload.put( "id", this.uuid );
> // *** This exception do not happens if the field is set as NULL
> payload.put( "another_id", "" );  //<-- EMPTY STRING AS UUID
> payload.put( "subject", "Alighieri, Dante. Divina Commedia" );
> ObjectMapper objM = new ObjectMapper();
> cqlSession.execute(
> stm.bind().setString( "payload", objM.writeValueAsString( 
> payload ) )
> );  //<-- serialize as JSON
> }
> @After
> public void tearDown() throws Exception {
> cqlSession.execute( "DROP TABLE IF EXISTS test_suite.test;" );
> cqlSession.execute( "DROP KEYSPACE test_suite;" );
> cqlSession.close();
> }
> @Test
> public void testNullPointer() {
> PreparedStatement stmt   = cqlSession.prepare( "SELECT JSON id, 
> another_id FROM test_suite.test where id = :id;" );
> ResultSet resultSet  = cqlSession.execute( 
> stmt.bind().setUuid( "id", UUID.fromString( this.uuid ) ) ); // <-- 
> EXCEPTION
> Row   r  = resultSet.one();
> assertNotNull( r );
> assertNotNull( r.getString( "[json]" ) );
> assertFalse( Objects.requireNonNull( r.getString( "[json]" ) 
> ).isEmpty() );
> }
> }
> {code}
> Client stack Trace:
> {code:java}
> com.datastax.oss.driver.api.core.servererrors.ServerError: 
> java.lang.NullPointerExceptioncom.datastax.oss.driver.api.core.servererrors.ServerError:
>  java.lang.NullPointerException
>  at 
> com.datastax.oss.driver.api.core.servererrors.ServerError.copy(ServerError.java:54)
>  at 
> com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
>  at 
> com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
>  at 
> com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
>  at 
> com.datastax.oss.driver.internal.core.session.DefaultSession.execute(Def

[jira] [Commented] (CASSANDRA-15821) Metrics Documentation Enhancements

2020-06-30 Thread Stephen Mallette (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148734#comment-17148734
 ] 

Stephen Mallette commented on CASSANDRA-15821:
--

This issue now depends on CASSANDRA-15909 given the expected metric name 
consistency changes on that ticket.

> Metrics Documentation Enhancements
> --
>
> Key: CASSANDRA-15821
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15821
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Website
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-15582 involves quality around metrics and it was mentioned that 
> reviewing and [improving 
> documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst]
>  around metrics would fall into that scope. Please consider some of this 
> analysis in determining what improvements to make here:
> Please see [this 
> spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing]
>  that itemizes almost all of cassandra's metrics and whether they are 
> documented or not (and other notes).  That spreadsheet is "almost all" 
> because there are some metrics that don't seem to initialize as part of 
> Cassandra startup (i was able to trigger some to initialize, but all were not 
> immediately obvious). The missing metrics seem to be related to the following:
> * ThreadPool metrics - only some initialize at startup the list of which 
> follow below
> * Streaming Metrics
> * HintedHandoff Metrics
> * HintsService Metrics
> Here are the ThreadPool scopes that get listed:
> {code}
> AntiEntropyStage
> CacheCleanupExecutor
> CompactionExecutor
> GossipStage
> HintsDispatcher
> MemtableFlushWriter
> MemtablePostFlush
> MemtableReclaimMemory
> MigrationStage
> MutationStage
> Native-Transport-Requests
> PendingRangeCalculator
> PerDiskMemtableFlushWriter_0
> ReadStage
> Repair-Task
> RequestResponseStage
> Sampler
> SecondaryIndexManagement
> ValidationExecutor
> ViewBuildExecutor
> {code}
> I noticed that Keyspace Metrics have this note: "Most of these metrics are 
> the same as the Table Metrics above, only they are aggregated at the Keyspace 
> level." I think I've isolated those metrics on table that are not on keyspace 
> to specifically be:
> {code}
> BloomFilterFalsePositives
> BloomFilterFalseRatio
> BytesAnticompacted
> BytesFlushed
> BytesMutatedAnticompaction
> BytesPendingRepair
> BytesRepaired
> BytesUnrepaired
> CompactionBytesWritten
> CompressionRatio
> CoordinatorReadLatency
> CoordinatorScanLatency
> CoordinatorWriteLatency
> EstimatedColumnCountHistogram
> EstimatedPartitionCount
> EstimatedPartitionSizeHistogram
> KeyCacheHitRate
> LiveSSTableCount
> MaxPartitionSize
> MeanPartitionSize
> MinPartitionSize
> MutatedAnticompactionGauge
> PercentRepaired
> RowCacheHitOutOfRange
> RowCacheHit
> RowCacheMiss
> SpeculativeSampleLatencyNanos
> SyncTime
> WaitingOnFreeMemtableSpace
> DroppedMutations
> {code}
> Someone with greater knowledge of this area might consider it worth the 
> effort to see if any of these metrics should be aggregated to the keyspace 
> level in case they were inadvertently missed. In any case, perhaps the 
> documentation could easily now reflect which metric names could be expected 
> on Keyspace.
> The DroppedMessage metrics have a much larger body of scopes than just what 
> were documented:
> {code}
> ASYMMETRIC_SYNC_REQ
> BATCH_REMOVE_REQ
> BATCH_REMOVE_RSP
> BATCH_STORE_REQ
> BATCH_STORE_RSP
> CLEANUP_MSG
> COUNTER_MUTATION_REQ
> COUNTER_MUTATION_RSP
> ECHO_REQ
> ECHO_RSP
> FAILED_SESSION_MSG
> FAILURE_RSP
> FINALIZE_COMMIT_MSG
> FINALIZE_PROMISE_MSG
> FINALIZE_PROPOSE_MSG
> GOSSIP_DIGEST_ACK
> GOSSIP_DIGEST_ACK2
> GOSSIP_DIGEST_SYN
> GOSSIP_SHUTDOWN
> HINT_REQ
> HINT_RSP
> INTERNAL_RSP
> MUTATION_REQ
> MUTATION_RSP
> PAXOS_COMMIT_REQ
> PAXOS_COMMIT_RSP
> PAXOS_PREPARE_REQ
> PAXOS_PREPARE_RSP
> PAXOS_PROPOSE_REQ
> PAXOS_PROPOSE_RSP
> PING_REQ
> PING_RSP
> PREPARE_CONSISTENT_REQ
> PREPARE_CONSISTENT_RSP
> PREPARE_MSG
> RANGE_REQ
> RANGE_RSP
> READ_REPAIR_REQ
> READ_REPAIR_RSP
> READ_REQ
> READ_RSP
> REPAIR_RSP
> REPLICATION_DONE_REQ
> REPLICATION_DONE_RSP
> REQUEST_RSP
> SCHEMA_PULL_REQ
> SCHEMA_PULL_RSP
> SCHEMA_PUSH_REQ
> SCHEMA_PUSH_RSP
> SCHEMA_VERSION_REQ
> SCHEMA_VERSION_RSP
> SNAPSHOT_MSG
> SNAPSHOT_REQ
> SNAPSHOT_RSP
> STATUS_REQ
> STATUS_RSP
> SYNC_REQ
> SYNC_RSP
> TRUNCATE_REQ
> TRUNCATE_RSP
> VALIDATION_REQ
> VALIDATION_RSP
> _SAMPLE
> _TEST_1
> _TEST_2
> _TRACE
> {code}
> I suppose I may yet be missing some metrics as my knowledge of what's 
> available is limited to what I can get from JMX after cassandra 
> initialization (and some initial starting c

[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release

2020-06-30 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-13994:
--
Summary: Remove dead compact storage code before 4.0 release  (was: Remove 
COMPACT STORAGE internals before 4.0 release)

> Remove dead compact storage code before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-30 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148732#comment-17148732
 ] 

Aleksey Yeschenko commented on CASSANDRA-13994:
---

Agree with Sylvain on all points here. This is just cleanup of dead, 
unreachable code that doesn't change any API. It can go in an alpha, a beta, or 
even RC if needed.

> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-15897:


Assignee: Sylvain Lebresne

> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15891) provide a configuration option such as endpoint_verification_method

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15891:
-
Change Category: Operability
 Complexity: Normal
Component/s: Messaging/Internode
 Status: Open  (was: Triage Needed)

> provide a configuration option such as endpoint_verification_method
> ---
>
> Key: CASSANDRA-15891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15891
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Thanh
>Priority: Normal
>
> With cassandra-9220, it's possible to configure endpoint/hostname 
> verification when enabling internode encryption.  However, you don't have any 
> control over what endpoint is used for the endpoint verification; instead, 
> cassandra will automatically try to use node IP (not node hostname) for 
> endpoint verification, so if your node certificates don't include the IP in 
> the ssl certificate's SAN list, then you'll get an error like:
> {code:java}
> ERROR [MessagingService-Outgoing-/10.10.88.194-Gossip] 2018-11-13 
> 10:20:26,903 OutboundTcpConnection.java:606 - SSL handshake error for 
> outbound connection to 50cc97c1[SSL_NULL_WITH_NULL_NULL: 
> Socket[addr=/,port=7001,localport=47684]] 
> javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
> No subject alternative names matching IP address  found 
> at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) {code}
> From what I've seen, most orgs will not have node IPs in their certs.
> So, it will be best if cassandra would provide another configuration option 
> such as *{{endpoint_verification_method}}* which you could set to "ip" or 
> "fqdn" or something else (eg "hostname_alias" if for whatever reason the org 
> doesn't want to use fqdn for endpoint verification).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15850:
-
Change Category: Performance
 Complexity: Normal
Component/s: Local/Startup and Shutdown
 Status: Open  (was: Triage Needed)

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other

2020-06-30 Thread Stephen Mallette (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Mallette updated CASSANDRA-15909:
-
Fix Version/s: 4.0-beta

> Make Table/Keyspace Metric Names Consistent With Each Other
> ---
>
> Key: CASSANDRA-15909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15909
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
> Fix For: 4.0-beta
>
>
> As part of CASSANDRA-15821 it became apparent that certain metric names found 
> in keyspace and tables had different names but were in fact the same metric - 
> they are as follows:
> * Table.SyncTime == Keyspace.RepairSyncTime
> * Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows
> * Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime
> * Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize
> * Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize
> * Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize
> * Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize
> Unifying this naming would help make metrics more consistent as part of 
> CASSANDRA-15582



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15909) Make Table/Keyspace Metric Names Consistent With Each Other

2020-06-30 Thread Stephen Mallette (Jira)
Stephen Mallette created CASSANDRA-15909:


 Summary: Make Table/Keyspace Metric Names Consistent With Each 
Other
 Key: CASSANDRA-15909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15909
 Project: Cassandra
  Issue Type: Improvement
  Components: Observability/Metrics
Reporter: Stephen Mallette
Assignee: Stephen Mallette


As part of CASSANDRA-15821 it became apparent that certain metric names found 
in keyspace and tables had different names but were in fact the same metric - 
they are as follows:

* Table.SyncTime == Keyspace.RepairSyncTime
* Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows
* Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime
* Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize
* Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize
* Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize
* Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize

Unifying this naming would help make metrics more consistent as part of 
CASSANDRA-15582



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15850:
-
  Workflow: Cassandra Default Workflow  (was: Cassandra Bug Workflow)
Issue Type: Improvement  (was: Bug)

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-30 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148717#comment-17148717
 ] 

Sylvain Lebresne commented on CASSANDRA-15850:
--

>From a look at the code, between gossip settling and starting the CQL server, 
>the only thing that happens is that all the tables are "reloaded" (which 
>involves a number of steps) to account for changes that could have happened 
>once Gossip settles, and compactions are started.

None of that shouldn't be super long for a given table, but it's not the most 
optimized thing ever either, and we do reload all tables sequentially, so this 
may well be the culprit for the delay you are seeing.

Assuming I'm correct (I'm only going from a quick read of the code here), I 
don't think any configuration option will help reduce that delay (but it does 
make sense the # of tables is a main factor).

It's not a bug, the server is doing work, albeit maybe inefficiently.

I'm sure this could be improved though. At a minimum, it would be more user 
friendly to add a log message to explain what is being done so users are not 
left wondering what is going on.

I'm sure we can also make that faster. 2 things comes in mind in particular:
 - it seems the only reason to do this reloading is for the compaction 
strategy(ies) to take any disk boundaries change into account, but reloading 
does other things, and a bit of benchmarking could probably tell us if we could 
save meaningful time by doing a more targetted reloading.
 - parallelizing the work might yield benefits.

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection

2020-06-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148685#comment-17148685
 ] 

Andres de la Peña commented on CASSANDRA-15907:
---

{quote}
the second approach will execute RFP requests in two places:
 # at the beginning of 2nd phase, based on the collected outdated rows from 1st 
phase. These RFP requests can run in parallel and the number can be large.
 # at merge-listener, for additional rows requested by SRP. These RFP requests 
have to run in serial, but the number is usually small.
{quote}
I understand that that would limit the number of cached results, at the expense 
of producing more queries during the second phase. As for parallelizing, that 
would help us a bit but I think it's not going to save us from the degenerate 
cases that worry us, which are those where everything is so out of sync that we 
have to read the entire database.

Perhaps we might consider a more sophisticated way of finding a balance between 
the numbers of cached rows and grouped queries. We could try to not cache all 
the results but advance in blocks of a certain fixed number of cached results, 
so we limit the number of cached results while we can still group keys to do 
less queries. That is, we could have that pessimistic SRP read prefetching and 
caching N rows completed with extra queries to the silent replicas, plugged to 
another group of unmerged-merged counters to prefetch more results if 
(probably) needed, if that makes sense.

Regarding the guardrails, a very reasonable threshold for in-memory cached 
results like, for example, 100 rows, can produce 100 internal queries if they 
are all in different partitions, which are definitively too many queries. Thus, 
we could also consider having another guardrail to limit the number of 
additional SRP/RFP internal queries per user query, so we can fail before 
getting to a timeout. That guardrail could however become obsolete for RFP if 
we implement multi-key queries and we can do the current second phase with a 
single query per replica.

> Operational Improvements & Hardening for Replica Filtering Protection
> -
>
> Key: CASSANDRA-15907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Coordination, Feature/2i Index
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Labels: 2i, memory
> Fix For: 4.0-beta
>
>
> CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i 
> and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a 
> few things we should follow up on, however, to make life a bit easier for 
> operators and generally de-risk usage:
> (Note: Line numbers are based on {{trunk}} as of 
> {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.)
> *Minor Optimizations*
> * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be 
> able to use simple arrays instead of lists for {{rowsToFetch}} and 
> {{originalPartitions}}. Alternatively (or also), we may be able to null out 
> references in these two collections more aggressively. (ex. Using 
> {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, 
> assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.)
> * {{ReplicaFilteringProtection:323}} - We may be able to use 
> {{EncodingStats.merge()}} and remove the custom {{stats()}} method.
> * {{DataResolver:111 & 228}} - Cache an instance of 
> {{UnaryOperator#identity()}} instead of creating one on the fly.
> * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather 
> rather than serially querying every row that needs to be completed. This 
> isn't a clear win perhaps, given it targets the latency of single queries and 
> adds some complexity. (Certainly a decent candidate to kick even out of this 
> issue.)
> *Documentation and Intelligibility*
> * There are a few places (CHANGES.txt, tracing output in 
> {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side 
> filtering protection" (which makes it seem like the coordinator doesn't 
> filter) rather than "replica filtering protection" (which sounds more like 
> what we actually do, which is protect ourselves against incorrect replica 
> filtering results). It's a minor fix, but would avoid confusion.
> * The method call chain in {{DataResolver}} might be a bit simpler if we put 
> the {{repairedDataTracker}} in {{ResolveContext}}.
> *Guardrails*
> * As it stands, we don't have a way to enforce an upper bound on the memory 
> usage of {{ReplicaFilteringProtection}} which caches row responses from the 
> first round of requests. (Remember, these are later used 

[jira] [Updated] (CASSANDRA-15908) Improve messaging on indexing frozen collections

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15908:
-
Change Category: Operability
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Improve messaging on indexing frozen collections
> 
>
> Key: CASSANDRA-15908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15908
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Rocco Varela
>Assignee: Rocco Varela
>Priority: Low
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When attempting to create an index on a frozen collection the error message 
> produced can be improved to provide more detail about the problem and 
> possible workarounds. Currently, a user will receive a message indicating 
> "...Frozen collections only support full() indexes" which is not immediately 
> clear for users new to Cassandra indexing and datatype compatibility.
> Here is an example:
> {code:java}
> cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> CREATE TABLE test.mytable ( id int primary key, addresses 
> frozen> );
> cqlsh> CREATE INDEX mytable_addresses_idx on test.mytable (addresses);
>  InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
> create values() index on frozen column addresses. Frozen collections only 
> support full() indexes"{code}
>  
> I'm proposing possibly enhancing the messaging to something like this.
> {quote}Cannot create values() index on frozen column addresses. Frozen 
> collections only support indexes on the entire data structure due to 
> immutability constraints of being frozen, wrap your frozen column with the 
> full() target type to index properly.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15847) High Local read latency for few tables

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15847:
-
Resolution: Invalid
Status: Resolved  (was: Triage Needed)

The user mailing list (u...@cassandra.apache.org) is the appropriate venue for 
getting such help. JIRA is for reporting bugs, and documenting idea for new 
improvements and features.


> High Local read latency for few tables
> --
>
> Key: CASSANDRA-15847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15847
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/sstable
>Reporter: Ananda Babu Velupala
>Priority: Normal
>
> Hi Team,
> I am seeing high Local read latency for 3 tables in node(its 5 node cluster) 
> and keyspace has total 16 sstables and hitting 10 sstables for read from that 
> table, can you please suggest any path forward to fix read latency. 
> Appreciate your help.. Thanks
> Cassandra version : 3.11.3
> SSTable Hitratio:
> ==
> k2view_usp/service_network_element_relation histograms
> Percentile SSTables Write Latency Read Latency Partition Size Cell Count
> (micros) (micros) (bytes)
> 50% 3.00 0.00 219.34 179 10
> 75% 7.00 0.00 315.85 179 10
> 95% 10.00 0.00 454.83 179 10
> 98% 10.00 0.00 545.79 215 10
> 99% 10.00 0.00 545.79 310 20
> Min 0.00 0.00 51.01 43 0
> Max 10.00 0.00 545.79 89970660 8409007
>  
> TABLE STATS:
> ==
> Table: service_network_element_relation_mirTable: 
> service_network_element_relation_mir SSTable count: 3 Space used (live): 
> 283698097 Space used (total): 283698097 Space used by snapshots (total): 0 
> Off heap memory used (total): 5335824 SSTable Compression Ratio: 
> 0.39563345719027554 Number of partitions (estimate): 2194136 Memtable cell 
> count: 0 Memtable data size: 0 Memtable off heap memory used: 0 Memtable 
> switch count: 0 Local read count: 0 Local read latency: NaN ms Local write 
> count: 0 Local write latency: NaN ms Pending flushes: 0 Percent repaired: 
> 100.0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom 
> filter space used: 4567016 Bloom filter off heap memory used: 4566992 Index 
> summary off heap memory used: 705208 Compression metadata off heap memory 
> used: 63624 Compacted partition minimum bytes: 104 Compacted partition 
> maximum bytes: 310 Compacted partition mean bytes: 154 Average live cells per 
> slice (last five minutes): NaN Maximum live cells per slice (last five 
> minutes): 0 Average tombstones per slice (last five minutes): NaN Maximum 
> tombstones per slice (last five minutes): 0 Dropped Mutations: 0
>  
>  
> Table: service_network_element_relationTable: 
> service_network_element_relation SSTable count: 11 Space used (live): 
> 8067239427 Space used (total): 8067239427 Space used by snapshots (total): 0 
> Off heap memory used (total): 143032693 SSTable Compression Ratio: 
> 0.21558247949161227 Number of partitions (estimate): 29357598 Memtable cell 
> count: 2714 Memtable data size: 691617 Memtable off heap memory used: 0 
> Memtable switch count: 9 Local read count: 6369399 Local read latency: 0.311 
> ms Local write count: 161229 Local write latency: NaN ms Pending flushes: 0 
> Percent repaired: 99.91 Bloom filter false positives: 1508 Bloom filter false 
> ratio: 0.00012 Bloom filter space used: 113071680 Bloom filter off heap 
> memory used: 113071592 Index summary off heap memory used: 27244541 
> Compression metadata off heap memory used: 2716560 Compacted partition 
> minimum bytes: 43 Compacted partition maximum bytes: 89970660 Compacted 
> partition mean bytes: 265 Average live cells per slice (last five minutes): 
> 1.1779891304347827 Maximum live cells per slice (last five minutes): 103 
> Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per 
> slice (last five minutes): 1 Dropped Mutations: 0
>  
> Table: service_relationTable: service_relation SSTable count: 7 Space used 
> (live): 281354042 Space used (total): 281354042 Space used by snapshots 
> (total): 35695068 Off heap memory used (total): 6423276 SSTable Compression 
> Ratio: 0.17685515178431085 Number of partitions (estimate): 1719400 Memtable 
> cell count: 1150 Memtable data size: 67482 Memtable off heap memory used: 0 
> Memtable switch count: 3 Local read count: 5506327 Local read latency: 0.182 
> ms Local write count: 5237 Local write latency: 0.084 ms Pending flushes: 0 
> Percent repaired: 55.48 Bloom filter false positives: 17 Bloom filter false 
> ratio: 0.0 Bloom filter space used: 5549664 Bloom filter off heap memory 
> used: 5549608 Index summary off heap memory used: 737348 Compression metadata 
> off heap memory used: 136320 Compacted partition minimum bytes: 87 Compacted 
> partition maximum bytes: 4055269 Compacted partition mean bytes

[jira] [Commented] (CASSANDRA-15901) Fix unit tests to load test/conf/cassandra.yaml (so to listen on a valid ip)

2020-06-30 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148621#comment-17148621
 ] 

Berenguer Blasi commented on CASSANDRA-15901:
-

The latest run with the latest commit looks ok imo:
 * [CI 
j11|https://app.circleci.com/pipelines/github/bereng/cassandra/52/workflows/573ad5be-e34d-4668-a0af-2726d4b35568]
 Failure seems unrelated and passes locally
 * [CI 
j8|https://app.circleci.com/pipelines/github/bereng/cassandra/52/workflows/16e15155-7dce-4877-86f5-315c6a837d36]
 Seems to be a new flaky test but unrelated to the PR imo. It passes when ran 
locally but failed once locally on {{ant test}}
 * The 
[latest|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-test/156/]
 ci-cassandra run looks much better but:
 ** ActiveRepairServiceTest could be a 
[legit|https://ci-cassandra.apache.org/job/Cassandra-trunk/199/testReport/org.apache.cassandra.service/ActiveRepairServiceTest/testQueueWhenPoolFullStrategy_cdc/history/]
 flaky
 ** ClearSpanshotTest passes locally and it failed with some weird VM error 
:shrug:
 ** Connection tests have given timeouts 
[before|https://ci-cassandra.apache.org/job/Cassandra-trunk/199/testReport/org.apache.cassandra.net/ConnectionTest/testMessageDeliveryOnReconnect_cdc/history/]

It would be good to have a second opinion here. But I think the failures we are 
hitting are legit flaky tests now that we've removed much of the noise. [~mck] 
would you be so kind to run the tests again but not on cassandra13 to see what 
happens? I think we can then move this to review if no weird stuff happens. 
Wdyt?

> Fix unit tests to load test/conf/cassandra.yaml (so to listen on a valid ip)
> 
>
> Key: CASSANDRA-15901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15901
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Many of the ci-cassandra jenkins runs fail on {{ip-10-0-5-5: Name or service 
> not known}}. CASSANDRA-15622 addressed some of these but many still remain. 
> Currently test C* nodes are either failing or listening on a public ip 
> depending on which agent they end up.
> The idea behind this ticket is to make ant force the private VPC ip in the 
> cassandra yaml when building, this will force the nodes to listen on the 
> correct ip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15580) 4.0 quality testing: Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15580:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Blake Eggleston*

We aim for 4.0 to have the first fully functioning incremental repair solution 
(CASSANDRA-9143)! Furthermore we aim to verify that all types of repair: (full 
range, sub range, incremental) function as expected as well as ensuring 
community tools such as Reaper work. CASSANDRA-3200 adds an experimental option 
to reduce the amount of data streamed during repair, we should write more tests 
and see how it works with big nodes.

  was:
*Shepherd: Blake Eggleston*

We aim for 4.0 to have the first fully functioning incremental repair solution 
(CASSANDRA-9143)! Furthermore we aim to verify that all types of repair: (full 
range, sub range, incremental) function as expected as well as ensuring 
community tools such as Reaper work. CASSANDRA-3200 adds an experimental option 
to reduce the amount of data streamed during repair, we should write more tests 
and see how it works with big nodes.


> 4.0 quality testing: Repair
> ---
>
> Key: CASSANDRA-15580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15580
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> We aim for 4.0 to have the first fully functioning incremental repair 
> solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of 
> repair: (full range, sub range, incremental) function as expected as well as 
> ensuring community tools such as Reaper work. CASSANDRA-3200 adds an 
> experimental option to reduce the amount of data streamed during repair, we 
> should write more tests and see how it works with big nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15584) 4.0 quality testing: Tooling - External Ecosystem

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15584:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Sam Tunnicliffe*

Many users of Apache Cassandra employ open source tooling to automate Cassandra 
configuration, runtime management, and repair scheduling. Prior to release, we 
need to confirm that popular third-party tools such as Reaper, Priam, etc. 
function properly.

  was:
*Shepherd: Sam Tunnicliffe*

Many users of Apache Cassandra employ open source tooling to automate Cassandra 
configuration, runtime management, and repair scheduling. Prior to release, we 
need to confirm that popular third-party tools such as Reaper, Priam, etc. 
function properly.


> 4.0 quality testing: Tooling - External Ecosystem
> -
>
> Key: CASSANDRA-15584
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15584
> Project: Cassandra
>  Issue Type: Task
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Sam Tunnicliffe*
> Many users of Apache Cassandra employ open source tooling to automate 
> Cassandra configuration, runtime management, and repair scheduling. Prior to 
> release, we need to confirm that popular third-party tools such as Reaper, 
> Priam, etc. function properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15585) 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15585:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Jordan West*

This area refers to contributions to test frameworks/tooling (e.g., dtests, 
QuickTheories, CASSANDRA-14821), and automation enabling those tools to be 
applied at scale (e.g., replay testing via Spark-based replay of captured FQL 
logs).

  was:
*Shepherd: Jordan West*

This area refers to contributions to test frameworks/tooling (e.g., dtests, 
QuickTheories, CASSANDRA-14821), and automation enabling those tools to be 
applied at scale (e.g., replay testing via Spark-based replay of captured FQL 
logs).


> 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation
> -
>
> Key: CASSANDRA-15585
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15585
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Jordan West*
> This area refers to contributions to test frameworks/tooling (e.g., dtests, 
> QuickTheories, CASSANDRA-14821), and automation enabling those tools to be 
> applied at scale (e.g., replay testing via Spark-based replay of captured FQL 
> logs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15581) 4.0 quality testing: Compaction

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15581:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Marcus Eriksson*

Alongside the local and distributed read/write paths, we'll also want to 
validate compaction. CASSANDRA-6696 introduced substantial changes/improvements 
that require testing (esp. JBOD).

  was:
*Shepherd: Marcus Eriksson*

Alongside the local and distributed read/write paths, we'll also want to 
validate compaction. CASSANDRA-6696 introduced substantial changes/improvements 
that require testing (esp. JBOD).


> 4.0 quality testing: Compaction
> ---
>
> Key: CASSANDRA-15581
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15581
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Marcus Eriksson*
> Alongside the local and distributed read/write paths, we'll also want to 
> validate compaction. CASSANDRA-6696 introduced substantial 
> changes/improvements that require testing (esp. JBOD).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15583) 4.0 quality testing: Tooling, Bundled and First Party

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15583:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Sam Tunnicliffe*

Test plans should cover bundled first-party tooling and CLIs such as nodetool, 
cqlsh, and new tools supporting full query and audit logging (CASSANDRA-13983, 
CASSANDRA-12151).

  was:
*Shepherd: Sam Tunnicliffe*

Test plans should cover bundled first-party tooling and CLIs such as nodetool, 
cqlsh, and new tools supporting full query and audit logging (CASSANDRA-13983, 
CASSANDRA-12151).


> 4.0 quality testing: Tooling, Bundled and First Party
> -
>
> Key: CASSANDRA-15583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15583
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Sam Tunnicliffe*
> Test plans should cover bundled first-party tooling and CLIs such as 
> nodetool, cqlsh, and new tools supporting full query and audit logging 
> (CASSANDRA-13983, CASSANDRA-12151).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15587) 4.0 quality testing: Platforms and Runtimes

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15587:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: {color:#ff}NONE{color}*

CASSANDRA-9608 introduces support for Java 11. We'll want to verify that 
Cassandra under Java 11 meets expectations of stability.

  was:
*Shepherd: {color:#FF}NONE{color}*

CASSANDRA-9608 introduces support for Java 11. We'll want to verify that 
Cassandra under Java 11 meets expectations of stability.


> 4.0 quality testing: Platforms and Runtimes
> ---
>
> Key: CASSANDRA-15587
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15587
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: {color:#ff}NONE{color}*
> CASSANDRA-9608 introduces support for Java 11. We'll want to verify that 
> Cassandra under Java 11 meets expectations of stability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15537) 4.0 quality testing: Local Read/Write Path: Upgrade and Diff Test

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15537:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

Execution of upgrade and diff tests via cassandra-diff have proven to be one of 
the most effective approaches toward identifying issues with the local 
read/write path. These include instances of data loss, data corruption, data 
resurrection, incorrect responses to queries, incomplete responses, and others. 
Upgrade and diff tests can be executed concurrent with fault injection (such as 
host or network failure); as well as during mixed-version scenarios (such as 
upgrading half of the instances in a cluster, and running upgradesstables on 
only half of the upgraded instances).

Upgrade and diff tests are expected to continue through the release cycle, and 
are a great way for contributors to gain confidence in the correctness of the 
database under their own workloads.

  was:
Execution of upgrade and diff tests via cassandra-diff have proven to be one of 
the most effective approaches toward identifying issues with the local 
read/write path. These include instances of data loss, data corruption, data 
resurrection, incorrect responses to queries, incomplete responses, and others. 
Upgrade and diff tests can be executed concurrent with fault injection (such as 
host or network failure); as well as during mixed-version scenarios (such as 
upgrading half of the instances in a cluster, and running upgradesstables on 
only half of the upgraded instances).

Upgrade and diff tests are expected to continue through the release cycle, and 
are a great way for contributors to gain confidence in the correctness of the 
database under their own workloads.


> 4.0 quality testing: Local Read/Write Path: Upgrade and Diff Test
> -
>
> Key: CASSANDRA-15537
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15537
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> Execution of upgrade and diff tests via cassandra-diff have proven to be one 
> of the most effective approaches toward identifying issues with the local 
> read/write path. These include instances of data loss, data corruption, data 
> resurrection, incorrect responses to queries, incomplete responses, and 
> others. Upgrade and diff tests can be executed concurrent with fault 
> injection (such as host or network failure); as well as during mixed-version 
> scenarios (such as upgrading half of the instances in a cluster, and running 
> upgradesstables on only half of the upgraded instances).
> Upgrade and diff tests are expected to continue through the release cycle, 
> and are a great way for contributors to gain confidence in the correctness of 
> the database under their own workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15579:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Blake Eggleston*

Testing in this area focuses on non-node-local aspects of the read-write path: 
coordination, replication, read repair, etc.

  was:
*Shepherd: Blake Eggleston*

Testing in this area focuses on non-node-local aspects of the read-write path: 
coordination, replication, read repair, etc.


> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15538:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Aleksey Yeschenko*

Testing in this area refers to the local read/write path (StorageProxy, 
ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still 
finding numerous bugs and issues with the 3.0 storage engine rewrite 
(CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the local 
read/write path with techniques such as property-based testing, fuzzing 
([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]),
 and a source audit.

  was:
*Shepherd: Aleksey Yeschenko*

Testing in this area refers to the local read/write path (StorageProxy, 
ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still 
finding numerous bugs and issues with the 3.0 storage engine rewrite 
(CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the local 
read/write path with techniques such as property-based testing, fuzzing 
([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]),
 and a source audit.


> 4.0 quality testing: Local Read/Write Path: Other Areas
> ---
>
> Key: CASSANDRA-15538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15538
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Aleksey Yeschenko*
> Testing in this area refers to the local read/write path (StorageProxy, 
> ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still 
> finding numerous bugs and issues with the 3.0 storage engine rewrite 
> (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the 
> local read/write path with techniques such as property-based testing, fuzzing 
> ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]),
>  and a source audit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15536) 4.0 Quality: Components and Test Plans

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15536:
--
Description: 
[Source doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#].

Jira migrated from 
[cwiki|https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality:+Components+and+Test+Plans]

 The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a 
state where major users would run it in production when it is cut. To gain this 
confidence there are various ongoing testing efforts involving correctness, 
performance, and ease of use. In this page we try to coordinate and identify 
blockers for subsystems before we can release 4.0

For each component we strive to have shepherds and contributors involved. 
Shepherds should be committers or knowledgeable component owners and are 
responsible for driving their blocking tickets to completion and ensuring 
quality in their claimed area, while contributors have signed up to help verify 
that subsystem by running tests or contributing fixes. Shepherds also ideally 
help set testing standards and ensure that we meet a high standard of quality 
in their claimed area.

If you are interested in contributing to testing 4.0, please add your name as 
assignee if you want to drive things, reviewer if just participate and review, 
and get involved in the the tracking ticket, and dev list/IRC discussions 
involving that component.
h3. Targeted Components / Subsystems

We've tried to collect some of the major components or subsystems that we want 
to ensure work properly towards having a great 4.0 release. If you think 
something is missing please add it. Better yet volunteer to contribute to 
testing it!
h4. Internode Messaging

In 4.0 we're getting a new Netty based inter-node communication system 
(CASSANDRA-8457). As internode messaging is vital to the correctness and 
performance of the database we should make sure that all forms (TLS, 
compressed, low latency, high latency, etc ...) of internode messaging function 
correctly.
h4. Test Infrastructure / Automation: Diff Testing

Diff testing is a form of model-based testing in which two clusters are 
exhaustively compared to assert identity. To support Apache Cassandra 4.0 
validation, contributors have developed cassandra-diff. This is a Spark 
application that distributes the token range over a configurable number of 
Spark executors, then parallelizes randomized forward and reverse reads with 
varying paging sizes to read and compare every row present in the cluster, 
persisting a record of mismatches for investigation. This methodology has been 
instrumental to identifying data loss, data corruption, and incorrect response 
issues introduced in early Cassandra 3.0 releases.

cassandra-diff and associated documentation can be found at: 
[https://github.com/apache/cassandra-diff]. Contributors are encouraged to run 
diff tests against clusters they manage and report issues to ensure workload 
diversity across the project.
h4. System Tables and Internal Schema

This task covers a review of and minor bug fixes to local and distributed 
system keyspaces. Planned work in this area is now complete.
h4. Source Audit and Performance Testing: Streaming

This task covers an audit of the Streaming implementation in Apache Cassandra 
4.0. In this release, contributors have implemented full-SSTable streaming to 
improve performance and reduce memory pressure. Internode messaging changes 
implemented in CASSANDRA-15066 adjacent to streaming suggested that review of 
the streaming implementation itself may be desirable. Prior work also covered 
performance testing of full-SSTable streaming.
h4. Test Infrastructure / Automation: "Harry"

CASSANDRA-15348 - Harry: generator library and extensible framework for fuzz 
testing Apache Cassandra TRIAGE NEEDED

Harry is a component for fuzz testing and verification of the Apache Cassandra 
clusters at scale. Harry allows to run tests that are able to validate state of 
both dense nodes (to test local read-write path) and large clusters (to test 
distributed read-write path), and do it efficiently. Harry defines a model that 
holds the state of the database, generators that produce reproducible, 
pseudo-random schemas, mutations, and queries, and a validator that asserts the 
correctness of the model following execution of generated traffic. See 
CASSANDRA-15348 for additional details.
h4. Local Read/Write Path: IndexInfo (CASSANDRA-11206)

Users upgrading from Cassandra 3.0.x to trunk will pick up CASSANDRA-11206 in 
the process. Contributors to 4.0 testing and validation have allocated time to 
testing and validation of these changes via source audit and implementation of 
property-based tests (currently underway). The majority of planned work here is 
complete, with a final set of perf tests in progres

[jira] [Assigned] (CASSANDRA-15581) 4.0 quality testing: Compaction

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie reassigned CASSANDRA-15581:
-

Assignee: Benjamin Lerer  (was: Stephen Mallette)

> 4.0 quality testing: Compaction
> ---
>
> Key: CASSANDRA-15581
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15581
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.0-beta
>
>
> *Shepherd: Marcus Eriksson*
> Alongside the local and distributed read/write paths, we'll also want to 
> validate compaction. CASSANDRA-6696 introduced substantial 
> changes/improvements that require testing (esp. JBOD).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15585) 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation

2020-06-30 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148610#comment-17148610
 ] 

Josh McKenzie commented on CASSANDRA-15585:
---

This ticket is blocked with no assignee for quite some time and no movement. 
Could you clarify the status on Harry and what we should do with this ticket 
[~ifesdjeen] / [~cscotta] ?

> 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation
> -
>
> Key: CASSANDRA-15585
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15585
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.0-beta
>
>
> *Shepherd: Jordan West*
> This area refers to contributions to test frameworks/tooling (e.g., dtests, 
> QuickTheories, CASSANDRA-14821), and automation enabling those tools to be 
> applied at scale (e.g., replay testing via Spark-based replay of captured FQL 
> logs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15583) 4.0 quality testing: Tooling, Bundled and First Party

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie reassigned CASSANDRA-15583:
-

Assignee: Gianluca Righetto

> 4.0 quality testing: Tooling, Bundled and First Party
> -
>
> Key: CASSANDRA-15583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15583
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-beta
>
>
> *Shepherd: Sam Tunnicliffe*
> Test plans should cover bundled first-party tooling and CLIs such as 
> nodetool, cqlsh, and new tools supporting full query and audit logging 
> (CASSANDRA-13983, CASSANDRA-12151).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie reassigned CASSANDRA-15579:
-

Assignee: Andres de la Peña

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15580) 4.0 quality testing: Repair

2020-06-30 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie reassigned CASSANDRA-15580:
-

Assignee: Berenguer Blasi

> 4.0 quality testing: Repair
> ---
>
> Key: CASSANDRA-15580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15580
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>
> *Shepherd: Blake Eggleston*
> We aim for 4.0 to have the first fully functioning incremental repair 
> solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of 
> repair: (full range, sub range, incremental) function as expected as well as 
> ensuring community tools such as Reaper work. CASSANDRA-3200 adds an 
> experimental option to reduce the amount of data streamed during repair, we 
> should write more tests and see how it works with big nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   >