date:20210825



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16879:

Status: Review In Progress  (was: Patch Available)

> Verify correct ownership of attached locations on disk at C* startup
> 
>
> Key: CASSANDRA-16879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16879
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.x
>
>
> There's two primary things related to startup and disk ownership we should 
> mitigate.
> First, an instance can come up with an incorrectly mounted volume attached as 
> its configured data directory. This causes the wrong system tables to be 
> read. If the instance which was previously using the volume is also down, its 
> token could be taken over by the instance coming up.
> Secondly, in a JBOD setup, the non-system keyspaces may reside on a separate 
> volume to the system tables. In this scenario, we need to ensure that all 
> directories belong to the same instance, and that as the instance starts up 
> it can access all the directories it expects to be able to. (including data, 
> commit log, hints and saved cache dirs)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16879) Verify correct ownership of attached locations on disk at C* startup



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404737#comment-17404737
 ] 

Caleb Rackliffe commented on CASSANDRA-16879:
-

Very minor note...I think we have typically included all authors WRT the 
{{Co-authored-by}} tag. In this case, I think that would mean both you and Sam 
are tagged. (I can see how you could go that way with it, of course, if the 
main author tag already pointed to Sam. Not a huge deal either way I guess, 
unless we had some reporting/analytics that assumed "primary" authors are also 
technically co-authors...)

> Verify correct ownership of attached locations on disk at C* startup
> 
>
> Key: CASSANDRA-16879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16879
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.x
>
>
> There's two primary things related to startup and disk ownership we should 
> mitigate.
> First, an instance can come up with an incorrectly mounted volume attached as 
> its configured data directory. This causes the wrong system tables to be 
> read. If the instance which was previously using the volume is also down, its 
> token could be taken over by the instance coming up.
> Secondly, in a JBOD setup, the non-system keyspaces may reside on a separate 
> volume to the system tables. In this scenario, we need to ensure that all 
> directories belong to the same instance, and that as the instance starts up 
> it can access all the directories it expects to be able to. (including data, 
> commit log, hints and saved cache dirs)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-16884:
-
Status: Review In Progress  (was: Patch Available)

> Bump zstd-jni version to 1.5.0-4
> 
>
> Key: CASSANDRA-16884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: High
>
> The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
> been a lot of development on the zstd library and the jni binding during the 
> 2.5 years, including a fuzzer which detected a handful of corruption bugs and 
> performance improvements. 
> The version of zstd-jni maps with the one of the native library. The current 
> native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 
> I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-16884:
-
Status: Patch Available  (was: In Progress)

> Bump zstd-jni version to 1.5.0-4
> 
>
> Key: CASSANDRA-16884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: High
>
> The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
> been a lot of development on the zstd library and the jni binding during the 
> 2.5 years, including a fuzzer which detected a handful of corruption bugs and 
> performance improvements. 
> The version of zstd-jni maps with the one of the native library. The current 
> native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 
> I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-16884:
-
Status: In Progress  (was: Patch Available)

> Bump zstd-jni version to 1.5.0-4
> 
>
> Key: CASSANDRA-16884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: High
>
> The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
> been a lot of development on the zstd library and the jni binding during the 
> 2.5 years, including a fuzzer which detected a handful of corruption bugs and 
> performance improvements. 
> The version of zstd-jni maps with the one of the native library. The current 
> native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 
> I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-16884:
-
Status: Ready to Commit  (was: Review In Progress)

+1

> Bump zstd-jni version to 1.5.0-4
> 
>
> Key: CASSANDRA-16884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: High
>
> The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
> been a lot of development on the zstd library and the jni binding during the 
> 2.5 years, including a fuzzer which detected a handful of corruption bugs and 
> performance improvements. 
> The version of zstd-jni maps with the one of the native library. The current 
> native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 
> I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-16884:
-
Status: Patch Available  (was: In Progress)

> Bump zstd-jni version to 1.5.0-4
> 
>
> Key: CASSANDRA-16884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: High
>
> The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
> been a lot of development on the zstd library and the jni binding during the 
> 2.5 years, including a fuzzer which detected a handful of corruption bugs and 
> performance improvements. 
> The version of zstd-jni maps with the one of the native library. The current 
> native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 
> I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-16884:
-
Status: In Progress  (was: Patch Available)

> Bump zstd-jni version to 1.5.0-4
> 
>
> Key: CASSANDRA-16884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: High
>
> The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
> been a lot of development on the zstd library and the jni binding during the 
> 2.5 years, including a fuzzer which detected a handful of corruption bugs and 
> performance improvements. 
> The version of zstd-jni maps with the one of the native library. The current 
> native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 
> I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16842) Allow CommitLogSegmentReader to optionally skip sync marker CRC checks



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16842:

  Fix Version/s: (was: 4.x)
 4.1
Source Control Link: 
https://github.com/apache/cassandra/commit/f9b7c1e6984f5b81aae1e3a2191d4e9599db15ae
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed to trunk as 
https://github.com/apache/cassandra/commit/f9b7c1e6984f5b81aae1e3a2191d4e9599db15ae

> Allow CommitLogSegmentReader to optionally skip sync marker CRC checks
> --
>
> Key: CASSANDRA-16842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16842
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> CommitLog sync markers are written in two phases. In the first, zeroes are 
> written for the position of the next sync marker and the sync marker CRC 
> value. In the second, when the next sync marker is written, the actual 
> position and CRC values are written. If the process shuts down in a 
> disorderly fashion, it is entirely possible for a valid next marker position 
> to be written to our memory mapped file but not the final CRC value. Later, 
> when we attempt to replay the segment, we will fail without recovering any of 
> the perfectly valid mutations it contains. (This assumes we’re confining 
> ourselves to the case where there is no compression or encryption.)
> {noformat}
> ERROR 2020-11-18T10:55:23,888 [main] 
> org.apache.cassandra.utils.JVMStabilityInspector:102 - Exiting due to error 
> while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Encountered bad header at position 23091775 of commit log 
> …/CommitLog-6-1605699607608.log, with invalid CRC. The end of segment marker 
> should be zero.
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:731)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.readSyncMarker(CommitLogReplayer.java:274)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:436)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:189)
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:170
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:151)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:332)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:656)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:808)
> {noformat}
> It may be useful to provide an option that would allow us to override the 
> default/strict behavior here and skip the CRC check if a non-zero end 
> position is present, allowing valid mutations to be recovered and startup to 
> proceed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch trunk updated: Allow CommitLogSegmentReader to optionally skip sync marker CRC checks

2021-08-25 Thread maedhroz

This is an automated email from the ASF dual-hosted git repository.

maedhroz pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new f9b7c1e  Allow CommitLogSegmentReader to optionally skip sync marker 
CRC checks
f9b7c1e is described below

commit f9b7c1e6984f5b81aae1e3a2191d4e9599db15ae
Author: Marcus Eriksson 
AuthorDate: Mon Jan 11 10:55:44 2021 +0100

Allow CommitLogSegmentReader to optionally skip sync marker CRC checks

patch by Caleb Rackliffe; reviewed by Josh McKenzie for CASSANDRA-16842

Co-authored-by: Jordan West 
Co-authored-by: Caleb Rackliffe 
Co-authored-by: Marcus Eriksson 
---
 CHANGES.txt|   1 +
 .../db/commitlog/CommitLogSegmentReader.java   |  29 +
 .../cassandra/db/commitlog/CommitLogTest.java  | 128 +
 3 files changed, 137 insertions(+), 21 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index a9c8ebd..be3ea40 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.1
+ * Allow CommitLogSegmentReader to optionally skip sync marker CRC checks 
(CASSANDRA-16842)
  * allow blocking IPs from updating metrics about traffic (CASSANDRA-16859)
  * Request-Based Native Transport Rate-Limiting (CASSANDRA-16663)
  * Implement nodetool getauditlog command (CASSANDRA-16725)
diff --git 
a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentReader.java 
b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentReader.java
index e23a915..33e70c1 100644
--- a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentReader.java
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentReader.java
@@ -26,6 +26,10 @@ import javax.crypto.Cipher;
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.AbstractIterator;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.Config;
 import 
org.apache.cassandra.db.commitlog.EncryptedFileSegmentInputStream.ChunkProvider;
 import org.apache.cassandra.db.commitlog.CommitLogReadHandler.*;
 import org.apache.cassandra.io.FSReadError;
@@ -46,6 +50,11 @@ import static 
org.apache.cassandra.utils.FBUtilities.updateChecksumInt;
  */
 public class CommitLogSegmentReader implements 
Iterable
 {
+public static final String ALLOW_IGNORE_SYNC_CRC = Config.PROPERTY_PREFIX 
+ "commitlog.allow_ignore_sync_crc";
+private static volatile boolean allowSkipSyncMarkerCrc = 
Boolean.getBoolean(ALLOW_IGNORE_SYNC_CRC);
+
+private static final Logger logger = 
LoggerFactory.getLogger(CommitLogSegmentReader.class);
+
 private final CommitLogReadHandler handler;
 private final CommitLogDescriptor descriptor;
 private final RandomAccessReader reader;
@@ -75,6 +84,11 @@ public class CommitLogSegmentReader implements 
Iterable iterator()
 {
@@ -151,8 +165,23 @@ public class CommitLogSegmentReader implements 
Iterable generateData()
+public static Collection generateData() throws Exception
+{
+return Arrays.asList(new Object[][]
+{
+{ null, EncryptionContextGenerator.createDisabledContext()}, // No 
compression, no encryption
+{ null, newEncryptionContext() }, // Encryption
+{ new ParameterizedClass(LZ4Compressor.class.getName(), 
Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext() },
+{ new ParameterizedClass(SnappyCompressor.class.getName(), 
Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+{ new ParameterizedClass(DeflateCompressor.class.getName(), 
Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+{ new ParameterizedClass(ZstdCompressor.class.getName(), 
Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()}
+});
+}
+
+private static EncryptionContext newEncryptionContext() throws Exception
 {
-return Arrays.asList(new Object[][]{
-{null, EncryptionContextGenerator.createDisabledContext()}, // No 
compression, no encryption
-{null, EncryptionContextGenerator.createContext(true)}, // 
Encryption
-{new ParameterizedClass(LZ4Compressor.class.getName(), 
Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
-{new ParameterizedClass(SnappyCompressor.class.getName(), 
Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
-{new ParameterizedClass(DeflateCompressor.class.getName(), 
Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
-{new ParameterizedClass(ZstdCompressor.class.getName(), 
Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()}});
+EncryptionContext context = 
EncryptionContextGenerator.createContext(true);
+

[jira] [Updated] (CASSANDRA-16879) Verify correct ownership of attached locations on disk at C* startup



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16879:

Reviewers: Caleb Rackliffe

> Verify correct ownership of attached locations on disk at C* startup
> 
>
> Key: CASSANDRA-16879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16879
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.x
>
>
> There's two primary things related to startup and disk ownership we should 
> mitigate.
> First, an instance can come up with an incorrectly mounted volume attached as 
> its configured data directory. This causes the wrong system tables to be 
> read. If the instance which was previously using the volume is also down, its 
> token could be taken over by the instance coming up.
> Secondly, in a JBOD setup, the non-system keyspaces may reside on a separate 
> volume to the system tables. In this scenario, we need to ensure that all 
> directories belong to the same instance, and that as the instance starts up 
> it can access all the directories it expects to be able to. (including data, 
> commit log, hints and saved cache dirs)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16879) Verify correct ownership of attached locations on disk at C* startup



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404621#comment-17404621
 ] 

Josh McKenzie edited comment on CASSANDRA-16879 at 8/25/21, 7:23 PM:
-

A few failures I'm confident are unrelated to the ticket.
 * JDK11: testUnloggedPartitionsPerBatch which passes locally. Think this was a 
circle config + env issue w/timeout.
 * replaceAliveHost which is failing in generally at the moment
 * JDK8: incompletePropose which OOM'ed - see this on a couple other branches 
and unrelated to this ticket. Passing fine locally and on JDK11.

||Item|Link|
|JDK8 
tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/59/workflows/ff19f043-dc27-4d83-baf4-0510614a9c0c]|
|JDK11 
tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/59/workflows/6580b51c-8254-478b-a1f0-7cd6b6392c31]|
|Branch|[Link|https://github.com/apache/cassandra/compare/cassandra-4.0...josh-mckenzie:CASSANDRA-16879?expand=1]|


was (Author: jmckenzie):
A few failures I'm confident are unrelated to the ticket.
 * JDK11: testUnloggedPartitionsPerBatch which passes locally. Think this was a 
circle config + env issue w/timeout.
 * replaceAliveHost which is failing in generaly atm
 * JDK8: incompletePropose which OOM'ed - see this on a couple other branches 
and unrelated to this ticket. Passing fine locally and on JDK11.

||Item|Link|
|JDK8 
tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/59/workflows/ff19f043-dc27-4d83-baf4-0510614a9c0c]|
|JDK11 
tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/59/workflows/6580b51c-8254-478b-a1f0-7cd6b6392c31]|
|Branch|[Link|https://github.com/apache/cassandra/compare/cassandra-4.0...josh-mckenzie:CASSANDRA-16879?expand=1]|

> Verify correct ownership of attached locations on disk at C* startup
> 
>
> Key: CASSANDRA-16879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16879
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.x
>
>
> There's two primary things related to startup and disk ownership we should 
> mitigate.
> First, an instance can come up with an incorrectly mounted volume attached as 
> its configured data directory. This causes the wrong system tables to be 
> read. If the instance which was previously using the volume is also down, its 
> token could be taken over by the instance coming up.
> Secondly, in a JBOD setup, the non-system keyspaces may reside on a separate 
> volume to the system tables. In this scenario, we need to ensure that all 
> directories belong to the same instance, and that as the instance starts up 
> it can access all the directories it expects to be able to. (including data, 
> commit log, hints and saved cache dirs)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4

2021-08-25 Thread Yifan Cai (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-16884:
--
Change Category: Quality Assurance
 Complexity: Low Hanging Fruit
  Reviewers: Dinesh Joshi
   Priority: High  (was: Normal)
 Status: Open  (was: Triage Needed)

PR: [https://github.com/apache/cassandra/pull/1170]
CI: 
[https://app.circleci.com/pipelines/github/yifan-c/cassandra?branch=CASSANDRA-16884%2F4.0]

This is a simple one-liner patch just to bump the version of zstd-jni.

> Bump zstd-jni version to 1.5.0-4
> 
>
> Key: CASSANDRA-16884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: High
>
> The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
> been a lot of development on the zstd library and the jni binding during the 
> 2.5 years, including a fuzzer which detected a handful of corruption bugs and 
> performance improvements. 
> The version of zstd-jni maps with the one of the native library. The current 
> native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 
> I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4

2021-08-25 Thread Yifan Cai (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-16884:
--
Test and Documentation Plan: ci
 Status: Patch Available  (was: Open)

> Bump zstd-jni version to 1.5.0-4
> 
>
> Key: CASSANDRA-16884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: High
>
> The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
> been a lot of development on the zstd library and the jni binding during the 
> 2.5 years, including a fuzzer which detected a handful of corruption bugs and 
> performance improvements. 
> The version of zstd-jni maps with the one of the native library. The current 
> native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 
> I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-16884) Bump zstd-jni version to 1.5.0-4

2021-08-25 Thread Yifan Cai (Jira)

Yifan Cai created CASSANDRA-16884:
-

 Summary: Bump zstd-jni version to 1.5.0-4
 Key: CASSANDRA-16884
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16884
 Project: Cassandra
  Issue Type: Task
  Components: Build
Reporter: Yifan Cai
Assignee: Yifan Cai


The current zstd-jni version (1.3.8-5) was released in 04/12/2019. There has 
been a lot of development on the zstd library and the jni binding during the 
2.5 years, including a fuzzer which detected a handful of corruption bugs and 
performance improvements. 

The version of zstd-jni maps with the one of the native library. The current 
native lib version is 1.5.0, hence 1.5.0-4 for zstd-jni. 

I am proposing bumping the zstd-jni version to the current.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16721) Repaired data tracking on a read coordinator is susceptible to races between local and remote requests



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404655#comment-17404655
 ] 

Sam Tunnicliffe commented on CASSANDRA-16721:
-

Looks good to me, modulo a 
[few|https://github.com/apache/cassandra/pull/1160/files#r695998655], 
[trivial|https://github.com/apache/cassandra/pull/1160/files#r696006742], 
[nits|https://github.com/apache/cassandra/pull/1160/files#r696008838] (sorry, I 
forgot it was a PR not just a branch). I'm also a fan of Alex's suggestion to 
replace {{TEST_FORCE_ASYNC_LOCAL_READS}} with some ByteBuddy manipulation in 
the test. 

All of the above can be fixed (or not) on commit, so +1 from me too & thanks!


> Repaired data tracking on a read coordinator is susceptible to races between 
> local and remote requests
> --
>
> Key: CASSANDRA-16721
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16721
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Sam Tunnicliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> At read time on a coordinator which is also a replica, the local and remote 
> reads can race such that the remote responses are received while the local 
> read is executing. If the remote responses are mismatching, triggering a 
> {{DigestMismatchException}} and subsequent round of full data reads and read 
> repair, the local runnable may find the {{isTrackingRepairedStatus}} flag 
> flipped mid-execution.  If this happens after a certain point in execution, 
> it would mean
> that the RepairedDataInfo instance in use is the singleton null object 
> {{RepairedDataInfo.NULL_REPAIRED_DATA_INFO}}. If this happens, it can lead to 
> an NPE when calling {{RepairedDataInfo::extend}} when the local results are 
> iterated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16721) Repaired data tracking on a read coordinator is susceptible to races between local and remote requests



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16721:

Reviewers: Alex Petrov, Caleb Rackliffe, Sam Tunnicliffe, Sam Tunnicliffe  
(was: Alex Petrov, Caleb Rackliffe, Sam Tunnicliffe)
   Alex Petrov, Caleb Rackliffe, Sam Tunnicliffe, Sam Tunnicliffe  
(was: Caleb Rackliffe, Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Repaired data tracking on a read coordinator is susceptible to races between 
> local and remote requests
> --
>
> Key: CASSANDRA-16721
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16721
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Sam Tunnicliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> At read time on a coordinator which is also a replica, the local and remote 
> reads can race such that the remote responses are received while the local 
> read is executing. If the remote responses are mismatching, triggering a 
> {{DigestMismatchException}} and subsequent round of full data reads and read 
> repair, the local runnable may find the {{isTrackingRepairedStatus}} flag 
> flipped mid-execution.  If this happens after a certain point in execution, 
> it would mean
> that the RepairedDataInfo instance in use is the singleton null object 
> {{RepairedDataInfo.NULL_REPAIRED_DATA_INFO}}. If this happens, it can lead to 
> an NPE when calling {{RepairedDataInfo::extend}} when the local results are 
> iterated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-16883) Weak visibility guarantees of Accumulator can lead to failure to recognize digest mismatches

Caleb Rackliffe created CASSANDRA-16883:
---

 Summary: Weak visibility guarantees of Accumulator can lead to 
failure to recognize digest mismatches
 Key: CASSANDRA-16883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16883
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Coordination
Reporter: Caleb Rackliffe
Assignee: Caleb Rackliffe


The context for this problem is largely the same as CASSANDRA-16807. The 
difference is that for 4.0+, CASSANDRA-16097 added an assertion to 
{{DigestResolver#responseMatch()}} that ensures the responses snapshot has at 
least one visible element (although of course only one element trivially cannot 
generate a mismatch and short-circuits immediately). In 3.0 and 3.11, this 
assertion does not exist, and when the underlying problem occurs (i.e. zero 
responses are visible on {{Accumulator}} when there should be 2), we can 
silently avoid the digest matching entirely. This seems like it would make it 
both impossible to do a potentially necessary full data read to resolve the 
correct response and prevent repair.

The fix here should be similar to the one in CASSANDRA-16807, although there 
might be some test infrastructure that needs porting in order to make that work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16883) Weak visibility guarantees of Accumulator can lead to failure to recognize digest mismatches



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16883:

 Bug Category: Parent values: Correctness(12982)Level 1 values: 
Consistency(12989)
   Complexity: Normal
Discovered By: Fuzz Test
Fix Version/s: 3.11.x
   3.0.x
 Severity: Critical
   Status: Open  (was: Triage Needed)

> Weak visibility guarantees of Accumulator can lead to failure to recognize 
> digest mismatches
> 
>
> Key: CASSANDRA-16883
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16883
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> The context for this problem is largely the same as CASSANDRA-16807. The 
> difference is that for 4.0+, CASSANDRA-16097 added an assertion to 
> {{DigestResolver#responseMatch()}} that ensures the responses snapshot has at 
> least one visible element (although of course only one element trivially 
> cannot generate a mismatch and short-circuits immediately). In 3.0 and 3.11, 
> this assertion does not exist, and when the underlying problem occurs (i.e. 
> zero responses are visible on {{Accumulator}} when there should be 2), we can 
> silently avoid the digest matching entirely. This seems like it would make it 
> both impossible to do a potentially necessary full data read to resolve the 
> correct response and prevent repair.
> The fix here should be similar to the one in CASSANDRA-16807, although there 
> might be some test infrastructure that needs porting in order to make that 
> work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16873) Tolerate missing DNS entry when completing a host replacement

2021-08-25 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404638#comment-17404638
 ] 

Brandon Williams commented on CASSANDRA-16873:
--

Alright - I think we've said enough on this ticket - let's follow up on the ML.

> Tolerate missing DNS entry when completing a host replacement
> -
>
> Key: CASSANDRA-16873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16873
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.0.x
>
>
> In one of our deployments, after a host replacement a subset of nodes still 
> saw the nodes as JOINING despite the rest of the cluster seeing it as NORMAL 
> with a failure to gossip. This was traced to a DNS lookup failure on the 
> nodes during an interim state leading to an exception being thrown and gossip 
> state never transitioning.
> Rather than implicitly requiring operators to bounce the node by throwing an 
> exception, we should instead suppress the exception when checking if a node 
> is replacing the same host address and ID if we get an UnknownHostException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16873) Tolerate missing DNS entry when completing a host replacement

2021-08-25 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404637#comment-17404637
 ] 

Benedict Elliott Smith commented on CASSANDRA-16873:


bq. And so it is for our sanity to focus development on trunk as much as 
possible. That is, the stability of the code remains dependent on reviews, 
where our ability to review is limited, and reviewing patches for multiple 
branches does costs more (in both attention and in time).

I'm not sure this works out as simply as you suppose. The review burden for 
each patch increases the further the branches drift from each other. This was 
the very reason I wanted to backport the simulator stability work, so as to 
reduce the review burden for work that _does_ need to be backported (of which 
there will be a lot). Limiting ourselves to the least-possible backports 
potentially makes each backport costlier, reducing our review bandwidth for 
trunk.

In reality this cost starts being accounted for as contributors shy away from 
back porting necessary work because of the additional burden.

bq. I can also see that by encouraging the discussions to establish the waivers 
we can more organically grow the guideline documentation around it.

Again, as far as governance goes there is no need to seek a waiver for anything 
that isn't a feature - however that is defined. We need to vote on new project 
governance documents if we want to impose any stronger restrictions that 
require a waiver.

> Tolerate missing DNS entry when completing a host replacement
> -
>
> Key: CASSANDRA-16873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16873
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.0.x
>
>
> In one of our deployments, after a host replacement a subset of nodes still 
> saw the nodes as JOINING despite the rest of the cluster seeing it as NORMAL 
> with a failure to gossip. This was traced to a DNS lookup failure on the 
> nodes during an interim state leading to an exception being thrown and gossip 
> state never transitioning.
> Rather than implicitly requiring operators to bounce the node by throwing an 
> exception, we should instead suppress the exception when checking if a node 
> is replacing the same host address and ID if we get an UnknownHostException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16873) Tolerate missing DNS entry when completing a host replacement

2021-08-25 Thread Michael Semb Wever (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404624#comment-17404624
 ] 

Michael Semb Wever edited comment on CASSANDRA-16873 at 8/25/21, 5:30 PM:
--

My two cents, this patch falls into that "small improvement that doesn't do 
much else but fix something" category, and should be included in 4.0.x


bq. Either way, this exchange highlights an issue to address which is that an 
absolutist policy leads to gamification of terminology. Clearly we shouldn't be 
calling something a bug fix so it can be included in a release.

I agree that it is really important that we avoid any encouragement of gaming 
what a bug is.

We do have some precedence for distinguishing "small improvements ok to go into 
patch versions" versus normal improvements, and we have been seeing more of it 
on the ML recently. But we have no guidelines in place how to make the 
distinction (it's being worked on). IMHO this is going to bite us now that we 
commit to annual releases (and are still ironing out what "stable trunk" means 
for us and how to achieve it) with folk encouraged to see the severity or need 
of an improvement as a reason to re-classify it. Including an improvement into 
a Patch version should be a decision based on what the patch touches and does. 
Until we clear up what that actually means, I am against improvements going 
into a Patch version by default, without first some discussion and consensus 
(on the ticket) to apply the waiver. 

My reasoning for voting on taking a more limited approach is… if we are to grow 
the community, and build momentum, it is going to be much harder for us to 
ensure quality (stable branches) through reviews. And so it is for our sanity 
to focus development on trunk as much as possible. That is, the stability of 
the code remains dependent on reviews, where our ability to review is limited, 
and reviewing patches for multiple branches does costs more (in both attention 
and in time).

I can also see that by encouraging the discussions to establish the waivers we 
can more organically grow the guideline documentation around it.


was (Author: michaelsembwever):
My two cents, this patch falls into that "small improvement that doesn't do 
much else but fix something" category, and should be included in 4.0.x


bq. Either way, this exchange highlights an issue to address which is that an 
absolutist policy leads to gamification of terminology. Clearly we shouldn't be 
calling something a bug fix so it can be included in a release.

I agree that it is really important that we avoid any encouragement of gaming 
what a bug is.

We do have some precedence for distinguishing "small improvements ok to go into 
patch versions" versus normal improvements, and we have been seeing more of it 
on the ML recently. But we have no guidelines in place how to make the 
distinction (it's being worked on). IMHO this is going to bite us now that we 
commit to annual releases (and are still ironing out what "stable trunk" means 
for us and how to achieve it) with folk encouraged to see the severity or need 
of an improvement as a reason to re-classify it. Including an improvement into 
a Patch version should be a decision based on what the patch touches and does. 
Until we clear up what that actually means, I am against improvements going 
into a Patch version by default, without first some discussion and consensus 
(on the ticket) to apply the waiver. 

My reasoning for voting on taking a more limited approach is… if we are to grow 
the community, and build momentum, it is going to be much harder for us to 
ensure quality (stable branches) through reviews only. And so it is for our 
sanity to focus development on trunk as much as possible. That is, the 
stability of the code remains dependent on reviews, where our ability to review 
is limited, and reviewing patches for multiple branches does costs more (in 
both attention and in time).

I can also see that by encouraging the discussions to establish the waivers we 
can more organically grow the guideline documentation around it.

> Tolerate missing DNS entry when completing a host replacement
> -
>
> Key: CASSANDRA-16873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16873
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.0.x
>
>
> In one of our deployments, after a host replacement a subset of nodes still 
> saw the nodes as JOINING despite the rest of the cluster seeing it as NORMAL 
> with a failure to gossip. This was traced to a DNS lookup failure on the 
> nodes during an interim state leading to an exception

[jira] [Comment Edited] (CASSANDRA-16873) Tolerate missing DNS entry when completing a host replacement

2021-08-25 Thread Michael Semb Wever (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404624#comment-17404624
 ] 

Michael Semb Wever edited comment on CASSANDRA-16873 at 8/25/21, 5:26 PM:
--

My two cents, this patch falls into that "small improvement that doesn't do 
much else but fix something" category, and should be included in 4.0.x


bq. Either way, this exchange highlights an issue to address which is that an 
absolutist policy leads to gamification of terminology. Clearly we shouldn't be 
calling something a bug fix so it can be included in a release.

I agree that it is really important that we avoid any encouragement of gaming 
what a bug is.

We do have some precedence for distinguishing "small improvements ok to go into 
patch versions" versus normal improvements, and we have been seeing more of it 
on the ML recently. But we have no guidelines in place how to make the 
distinction (it's being worked on). IMHO this is going to bite us now that we 
commit to annual releases (and are still ironing out what "stable trunk" means 
for us and how to achieve it) with folk encouraged to see the severity or need 
of an improvement as a reason to re-classify it. Including an improvement into 
a Patch version should be a decision based on what the patch touches and does. 
Until we clear up what that actually means, I am against improvements going 
into a Patch version by default, without first some discussion and consensus 
(on the ticket) to apply the waiver. 

My reasoning for voting on taking a more limited approach is… if we are to grow 
the community, and build momentum, it is going to be much harder for us to 
ensure quality (stable branches) through reviews only. And so it is for our 
sanity to focus development on trunk as much as possible. That is, the 
stability of the code remains dependent on reviews, where our ability to review 
is limited, and reviewing patches for multiple branches does costs more (in 
both attention and in time).

I can also see that by encouraging the discussions to establish the waivers we 
can more organically grow the guideline documentation around it.


was (Author: michaelsembwever):
My two cents, this patch falls into that "small improvement that doesn't do 
much else but fix something" category. 


bq. Either way, this exchange highlights an issue to address which is that an 
absolutist policy leads to gamification of terminology. Clearly we shouldn't be 
calling something a bug fix so it can be included in a release.

I agree that it is really important that we avoid any encouragement of gaming 
what a bug is.

We do have some precedence for distinguishing "small improvements ok to go into 
patch versions" versus normal improvements, and we have been seeing more of it 
on the ML recently. But we have no guidelines in place how to make the 
distinction (it's being worked on). IMHO this is going to bite us now that we 
commit to annual releases (and are still ironing out what "stable trunk" means 
for us and how to achieve it) with folk encouraged to see the severity or need 
of an improvement as a reason to re-classify it. Including an improvement into 
a Patch version should be a decision based on what the patch touches and does. 
Until we clear up what that actually means, I am against improvements going 
into a Patch version by default, without first some discussion and consensus 
(on the ticket) to apply the waiver. 

My reasoning for voting on taking a more limited approach is… if we are to grow 
the community, and build momentum, it is going to be much harder for us to 
ensure quality (stable branches) through reviews only. And so it is for our 
sanity to focus development on trunk as much as possible. That is, the 
stability of the code remains dependent on reviews, where our ability to review 
is limited, and reviewing patches for multiple branches does costs more (in 
both attention and in time).

I can also see that by encouraging the discussions to establish the waivers we 
can more organically grow the guideline documentation around it.

> Tolerate missing DNS entry when completing a host replacement
> -
>
> Key: CASSANDRA-16873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16873
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.0.x
>
>
> In one of our deployments, after a host replacement a subset of nodes still 
> saw the nodes as JOINING despite the rest of the cluster seeing it as NORMAL 
> with a failure to gossip. This was traced to a DNS lookup failure on the 
> nodes during an interim state leading to an exception being thrown and gossip 
>

[jira] [Commented] (CASSANDRA-16873) Tolerate missing DNS entry when completing a host replacement

2021-08-25 Thread Michael Semb Wever (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404624#comment-17404624
 ] 

Michael Semb Wever commented on CASSANDRA-16873:


My two cents, this patch falls into that "small improvement that doesn't do 
much else but fix something" category. 


bq. Either way, this exchange highlights an issue to address which is that an 
absolutist policy leads to gamification of terminology. Clearly we shouldn't be 
calling something a bug fix so it can be included in a release.

I agree that it is really important that we avoid any encouragement of gaming 
what a bug is.

We do have some precedence for distinguishing "small improvements ok to go into 
patch versions" versus normal improvements, and we have been seeing more of it 
on the ML recently. But we have no guidelines in place how to make the 
distinction (it's being worked on). IMHO this is going to bite us now that we 
commit to annual releases (and are still ironing out what "stable trunk" means 
for us and how to achieve it) with folk encouraged to see the severity or need 
of an improvement as a reason to re-classify it. Including an improvement into 
a Patch version should be a decision based on what the patch touches and does. 
Until we clear up what that actually means, I am against improvements going 
into a Patch version by default, without first some discussion and consensus 
(on the ticket) to apply the waiver. 

My reasoning for voting on taking a more limited approach is… if we are to grow 
the community, and build momentum, it is going to be much harder for us to 
ensure quality (stable branches) through reviews only. And so it is for our 
sanity to focus development on trunk as much as possible. That is, the 
stability of the code remains dependent on reviews, where our ability to review 
is limited, and reviewing patches for multiple branches does costs more (in 
both attention and in time).

I can also see that by encouraging the discussions to establish the waivers we 
can more organically grow the guideline documentation around it.

> Tolerate missing DNS entry when completing a host replacement
> -
>
> Key: CASSANDRA-16873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16873
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.0.x
>
>
> In one of our deployments, after a host replacement a subset of nodes still 
> saw the nodes as JOINING despite the rest of the cluster seeing it as NORMAL 
> with a failure to gossip. This was traced to a DNS lookup failure on the 
> nodes during an interim state leading to an exception being thrown and gossip 
> state never transitioning.
> Rather than implicitly requiring operators to bounce the node by throwing an 
> exception, we should instead suppress the exception when checking if a node 
> is replacing the same host address and ID if we get an UnknownHostException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16879) Verify correct ownership of attached locations on disk at C* startup



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-16879:
--
Test and Documentation Plan: New testing. Will need to document this on the 
official docs page based on javadocs in classes.
 Status: Patch Available  (was: Open)

> Verify correct ownership of attached locations on disk at C* startup
> 
>
> Key: CASSANDRA-16879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16879
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
>
> There's two primary things related to startup and disk ownership we should 
> mitigate.
> First, an instance can come up with an incorrectly mounted volume attached as 
> its configured data directory. This causes the wrong system tables to be 
> read. If the instance which was previously using the volume is also down, its 
> token could be taken over by the instance coming up.
> Secondly, in a JBOD setup, the non-system keyspaces may reside on a separate 
> volume to the system tables. In this scenario, we need to ensure that all 
> directories belong to the same instance, and that as the instance starts up 
> it can access all the directories it expects to be able to. (including data, 
> commit log, hints and saved cache dirs)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16879) Verify correct ownership of attached locations on disk at C* startup



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-16879:
--
Fix Version/s: 4.x

> Verify correct ownership of attached locations on disk at C* startup
> 
>
> Key: CASSANDRA-16879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16879
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.x
>
>
> There's two primary things related to startup and disk ownership we should 
> mitigate.
> First, an instance can come up with an incorrectly mounted volume attached as 
> its configured data directory. This causes the wrong system tables to be 
> read. If the instance which was previously using the volume is also down, its 
> token could be taken over by the instance coming up.
> Secondly, in a JBOD setup, the non-system keyspaces may reside on a separate 
> volume to the system tables. In this scenario, we need to ensure that all 
> directories belong to the same instance, and that as the instance starts up 
> it can access all the directories it expects to be able to. (including data, 
> commit log, hints and saved cache dirs)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16879) Verify correct ownership of attached locations on disk at C* startup



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404621#comment-17404621
 ] 

Josh McKenzie commented on CASSANDRA-16879:
---

A few failures I'm confident are unrelated to the ticket.
 * JDK11: testUnloggedPartitionsPerBatch which passes locally. Think this was a 
circle config + env issue w/timeout.
 * replaceAliveHost which is failing in generaly atm
 * JDK8: incompletePropose which OOM'ed - see this on a couple other branches 
and unrelated to this ticket. Passing fine locally and on JDK11.

||Item|Link|
|JDK8 
tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/59/workflows/ff19f043-dc27-4d83-baf4-0510614a9c0c]|
|JDK11 
tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/59/workflows/6580b51c-8254-478b-a1f0-7cd6b6392c31]|
|Branch|[Link|https://github.com/apache/cassandra/compare/cassandra-4.0...josh-mckenzie:CASSANDRA-16879?expand=1]|

> Verify correct ownership of attached locations on disk at C* startup
> 
>
> Key: CASSANDRA-16879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16879
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
>
> There's two primary things related to startup and disk ownership we should 
> mitigate.
> First, an instance can come up with an incorrectly mounted volume attached as 
> its configured data directory. This causes the wrong system tables to be 
> read. If the instance which was previously using the volume is also down, its 
> token could be taken over by the instance coming up.
> Secondly, in a JBOD setup, the non-system keyspaces may reside on a separate 
> volume to the system tables. In this scenario, we need to ensure that all 
> directories belong to the same instance, and that as the instance starts up 
> it can access all the directories it expects to be able to. (including data, 
> commit log, hints and saved cache dirs)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16877:

  Fix Version/s: (was: 4.0.x)
 4.0.1
  Since Version: 4.0-alpha1
Source Control Link: 
https://github.com/apache/cassandra/commit/b8242730918c2e8edec83aeafeeae8255378125d
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks. Agreed about the tests, so committed (with one nit addressed and one 
swerved) to 4.0 in {{b8242730918c2e8edec83aeafeeae8255378125d}} and merged up 
to trunk.

> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.1
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16871) Add resource flags to CircleCi config generation script



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16871:
--
Fix Version/s: 4.x
   4.0.x
   3.11.x
   3.0.x

> Add resource flags to CircleCi config generation script
> ---
>
> Key: CASSANDRA-16871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16871
> Project: Cassandra
>  Issue Type: Task
>  Components: CI
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Low
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> Currently we have three versions of the CircleCI config file using different 
> resources. Changing the resources configuration is as easy as copying the 
> desired template file, for example:
> {code}
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> {code}
> If we want to make changes to the file, for example to set a specific dtest 
> repo or running the test multiplexer, we can run the provided generation 
> script, copy the template file and probably exclude the additional changes:
> {code}
> # edit config-2_1.yml
> .circleci/generate.sh
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> # undo the changes in config.yml.LOWRES, config.yml.MIDRES and 
> config.yml.HIGHRES
> {code}
> A very common alternative to this is just editing the environment variables 
> in the automatically generated {{config.yml}} file, which are repeated some 
> 19 times across the file:
> {code}
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> # edit config.yml, where env vars are repeated
> {code}
> I think we could do this slightly easier by adding a set of flags to the 
> generation script to apply the resources patch directly to {{config.yml}}, 
> without changing the templates:
> {code}
> # edit config-2_1.yml
> .circleci/generate.sh -m
> {code}
> This has the advantage of not requiring manually editing the automatically 
> generated file and also providing some validation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16871) Add resource flags to CircleCi config generation script



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16871:
--
Test and Documentation Plan: Testing should be done manually by running the 
modified script. The patch includes changes in the documentation for CircleCI.
 Status: Patch Available  (was: In Progress)

> Add resource flags to CircleCi config generation script
> ---
>
> Key: CASSANDRA-16871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16871
> Project: Cassandra
>  Issue Type: Task
>  Components: CI
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Low
>
> Currently we have three versions of the CircleCI config file using different 
> resources. Changing the resources configuration is as easy as copying the 
> desired template file, for example:
> {code}
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> {code}
> If we want to make changes to the file, for example to set a specific dtest 
> repo or running the test multiplexer, we can run the provided generation 
> script, copy the template file and probably exclude the additional changes:
> {code}
> # edit config-2_1.yml
> .circleci/generate.sh
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> # undo the changes in config.yml.LOWRES, config.yml.MIDRES and 
> config.yml.HIGHRES
> {code}
> A very common alternative to this is just editing the environment variables 
> in the automatically generated {{config.yml}} file, which are repeated some 
> 19 times across the file:
> {code}
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> # edit config.yml, where env vars are repeated
> {code}
> I think we could do this slightly easier by adding a set of flags to the 
> generation script to apply the resources patch directly to {{config.yml}}, 
> without changing the templates:
> {code}
> # edit config-2_1.yml
> .circleci/generate.sh -m
> {code}
> This has the advantage of not requiring manually editing the automatically 
> generated file and also providing some validation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16877:

Status: Ready to Commit  (was: Review In Progress)

> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.x
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch trunk updated (5dd472e -> 39efc83)

2021-08-25 Thread samt

This is an automated email from the ASF dual-hosted git repository.

samt pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 5dd472e  Merge branch 'cassandra-4.0' into trunk
 new b824273  Remove assumption that all urgent messages are small
 new 39efc83  Merge branch 'cassandra-4.0' into trunk

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt|  1 +
 .../apache/cassandra/net/OutboundConnections.java  | 25 +++--
 .../cassandra/net/OutboundConnectionsTest.java | 60 +++---
 3 files changed, 62 insertions(+), 24 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] 01/01: Merge branch 'cassandra-4.0' into trunk

2021-08-25 Thread samt

This is an automated email from the ASF dual-hosted git repository.

samt pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 39efc8307acb62f8f5e9459269d193dc6f319037
Merge: 5dd472e b824273
Author: Sam Tunnicliffe 
AuthorDate: Wed Aug 25 17:42:50 2021 +0100

Merge branch 'cassandra-4.0' into trunk

 CHANGES.txt|  1 +
 .../apache/cassandra/net/OutboundConnections.java  | 25 +++--
 .../cassandra/net/OutboundConnectionsTest.java | 60 +++---
 3 files changed, 62 insertions(+), 24 deletions(-)

diff --cc CHANGES.txt
index a60b4e0,9ed3cec..a9c8ebd
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,21 -1,5 +1,22 @@@
 -4.0.1
 +4.1
 + * allow blocking IPs from updating metrics about traffic (CASSANDRA-16859)
 + * Request-Based Native Transport Rate-Limiting (CASSANDRA-16663)
 + * Implement nodetool getauditlog command (CASSANDRA-16725)
 + * Clean up repair code (CASSANDRA-13720)
 + * Background schedule to clean up orphaned hints files (CASSANDRA-16815)
 + * Modify SecondaryIndexManager#indexPartition() to retrieve only columns for 
which indexes are actually being built (CASSANDRA-16776)
 + * Batch the token metadata update to improve the speed (CASSANDRA-15291)
 + * Reduce the log level on "expected" repair exceptions (CASSANDRA-16775)
 + * Make JMXTimer expose attributes using consistent time unit 
(CASSANDRA-16760)
 + * Remove check on gossip status from DynamicEndpointSnitch::updateScores 
(CASSANDRA-11671)
 + * Fix AbstractReadQuery::toCQLString not returning valid CQL 
(CASSANDRA-16510)
 + * Log when compacting many tombstones (CASSANDRA-16780)
 + * Display bytes per level in tablestats for LCS tables (CASSANDRA-16799)
 + * Add isolated flush timer to CommitLogMetrics and ensure writes correspond 
to single WaitingOnCommit data points (CASSANDRA-16701)
 + * Add a system property to set hostId if not yet initialized 
(CASSANDRA-14582)
 + * GossiperTest.testHasVersion3Nodes didn't take into account trunk version 
changes, fixed to rely on latest version (CASSANDRA-16651)
 +Merged from 4.0:
+  * Remove assumption that all urgent messages are small (CASSANDRA-16877)
   * ArrayClustering.unsharedHeapSize does not include the data so undercounts 
the heap size (CASSANDRA-16845)
   * Improve help, doc and error messages about sstabledump -k and -x arguments 
(CASSANDRA-16818)
   * Add repaired/unrepaired bytes back to nodetool (CASSANDRA-15282)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch cassandra-4.0 updated: Remove assumption that all urgent messages are small

2021-08-25 Thread samt

This is an automated email from the ASF dual-hosted git repository.

samt pushed a commit to branch cassandra-4.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-4.0 by this push:
 new b824273  Remove assumption that all urgent messages are small
b824273 is described below

commit b8242730918c2e8edec83aeafeeae8255378125d
Author: Sam Tunnicliffe 
AuthorDate: Thu Aug 12 10:47:54 2021 +0100

Remove assumption that all urgent messages are small

Patch by Sam Tunnicliffe; reviewed by Caleb Rackliffe for CASSANDRA-16877
---
 CHANGES.txt|  1 +
 .../apache/cassandra/net/OutboundConnections.java  | 25 +++--
 .../cassandra/net/OutboundConnectionsTest.java | 60 +++---
 3 files changed, 62 insertions(+), 24 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index ecd4409..9ed3cec 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0.1
+ * Remove assumption that all urgent messages are small (CASSANDRA-16877)
  * ArrayClustering.unsharedHeapSize does not include the data so undercounts 
the heap size (CASSANDRA-16845)
  * Improve help, doc and error messages about sstabledump -k and -x arguments 
(CASSANDRA-16818)
  * Add repaired/unrepaired bytes back to nodetool (CASSANDRA-15282)
diff --git a/src/java/org/apache/cassandra/net/OutboundConnections.java 
b/src/java/org/apache/cassandra/net/OutboundConnections.java
index f1e1276..3f607d1 100644
--- a/src/java/org/apache/cassandra/net/OutboundConnections.java
+++ b/src/java/org/apache/cassandra/net/OutboundConnections.java
@@ -36,6 +36,7 @@ import org.apache.cassandra.config.Config;
 import org.apache.cassandra.gms.Gossiper;
 import org.apache.cassandra.locator.InetAddressAndPort;
 import org.apache.cassandra.metrics.InternodeOutboundMetrics;
+import org.apache.cassandra.utils.NoSpamLogger;
 import org.apache.cassandra.utils.concurrent.SimpleCondition;
 
 import static org.apache.cassandra.net.MessagingService.current_version;
@@ -199,12 +200,26 @@ public class OutboundConnections
 if (specifyConnection != null)
 return specifyConnection;
 
-if (msg.verb().priority == Verb.Priority.P0)
-return URGENT_MESSAGES;
+if (msg.serializedSize(current_version) > LARGE_MESSAGE_THRESHOLD)
+{
+if (msg.verb().priority == Verb.Priority.P0)
+{
+NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
TimeUnit.MINUTES,
+ "Enqueued URGENT message which exceeds large 
message threshold");
+
+if (logger.isTraceEnabled())
+logger.trace("{} message with size {} exceeded large 
message threshold {}",
+ msg.verb(),
+ msg.serializedSize(current_version),
+ LARGE_MESSAGE_THRESHOLD);
+}
+
+return LARGE_MESSAGES;
+}
 
-return msg.serializedSize(current_version) <= LARGE_MESSAGE_THRESHOLD
-   ? SMALL_MESSAGES
-   : LARGE_MESSAGES;
+return msg.verb().priority == Verb.Priority.P0
+   ? URGENT_MESSAGES
+   : SMALL_MESSAGES;
 }
 
 @VisibleForTesting
diff --git a/test/unit/org/apache/cassandra/net/OutboundConnectionsTest.java 
b/test/unit/org/apache/cassandra/net/OutboundConnectionsTest.java
index 32faea3..538636a 100644
--- a/test/unit/org/apache/cassandra/net/OutboundConnectionsTest.java
+++ b/test/unit/org/apache/cassandra/net/OutboundConnectionsTest.java
@@ -35,11 +35,15 @@ import org.junit.Test;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.commitlog.CommitLog;
 import org.apache.cassandra.gms.GossipDigestSyn;
+import org.apache.cassandra.io.IVersionedAsymmetricSerializer;
 import org.apache.cassandra.io.IVersionedSerializer;
 import org.apache.cassandra.io.util.DataInputPlus;
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.locator.InetAddressAndPort;
 
+import static org.apache.cassandra.net.MessagingService.current_version;
+import static 
org.apache.cassandra.net.OutboundConnections.LARGE_MESSAGE_THRESHOLD;
+
 public class OutboundConnectionsTest
 {
 static final InetAddressAndPort LOCAL_ADDR = 
InetAddressAndPort.getByAddressOverrideDefaults(InetAddresses.forString("127.0.0.1"),
 9476);
@@ -48,6 +52,24 @@ public class OutboundConnectionsTest
 private static final List INTERNODE_MESSAGING_CONN_TYPES = 
ImmutableList.of(ConnectionType.URGENT_MESSAGES, ConnectionType.LARGE_MESSAGES, 
ConnectionType.SMALL_MESSAGES);
 
 private OutboundConnections connections;
+// for testing messages larger than the size threshold, we just need a 
serializer to report a size, as fake as it may be
+public static final IVersionedSerializer SERIALIZER = new 
IVersionedSerializer()
+{
+public

[jira] [Commented] (CASSANDRA-16871) Add resource flags to CircleCi config generation script



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404578#comment-17404578
 ] 

Andres de la Peña commented on CASSANDRA-16871:
---

Thanks for looking into this :)

bq. I would make a point that config.yml is actually the lowres as I saw people 
being confused about that. 
Good idea. I have changed the script messages and the readme trying to make it 
clear that the default {{config.yml}} uses low resources and that it is indeed 
a copy of {{confilg.yml.LOWRES}}. I have included a very brief [introductory 
section|https://github.com/adelapena/cassandra/blob/16871-3.0/.circleci/readme.md#circleci-config-files]
 in the readme in an attempt to give some context.

bq. One thing, if you are using higher resources and later in time you decide 
also to change any of the environment variables, we should make it clear that 
running only .circleci/generate.sh will return people to default low resources 
with the new variables. If they want to keep the new resources they should use 
a flag again. Someone new might get confused.
Makes sense, I have added an explicit warning about this 
[here|https://github.com/apache/cassandra/commit/4852b080a773493cb2e6d0bcb53931a703a2ebfa].

> Add resource flags to CircleCi config generation script
> ---
>
> Key: CASSANDRA-16871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16871
> Project: Cassandra
>  Issue Type: Task
>  Components: CI
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Low
>
> Currently we have three versions of the CircleCI config file using different 
> resources. Changing the resources configuration is as easy as copying the 
> desired template file, for example:
> {code}
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> {code}
> If we want to make changes to the file, for example to set a specific dtest 
> repo or running the test multiplexer, we can run the provided generation 
> script, copy the template file and probably exclude the additional changes:
> {code}
> # edit config-2_1.yml
> .circleci/generate.sh
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> # undo the changes in config.yml.LOWRES, config.yml.MIDRES and 
> config.yml.HIGHRES
> {code}
> A very common alternative to this is just editing the environment variables 
> in the automatically generated {{config.yml}} file, which are repeated some 
> 19 times across the file:
> {code}
> cp .circleci/config.yml.MIDRES .circleci/config.yml
> # edit config.yml, where env vars are repeated
> {code}
> I think we could do this slightly easier by adding a set of flags to the 
> generation script to apply the resources patch directly to {{config.yml}}, 
> without changing the templates:
> {code}
> # edit config-2_1.yml
> .circleci/generate.sh -m
> {code}
> This has the advantage of not requiring manually editing the automatically 
> generated file and also providing some validation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-08-25 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404538#comment-17404538
 ] 

Brandon Williams commented on CASSANDRA-16718:
--

Can you describe the network configuration here that requires prefer_local?

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16882) Save CircleCI resources with optional test jobs



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404433#comment-17404433
 ] 

Andres de la Peña edited comment on CASSANDRA-16882 at 8/25/21, 2:46 PM:
-

I'm adding a fourth option that combines approaches 2 and 3, so the mandatory 
tests can be started either individually or all together with a single start 
button:

||Option||Branch||CI||
|1|[16882-option-1-trunk|https://github.com/adelapena/cassandra/tree/16882-option-1-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/800/workflows/9cb8ca7b-ab57-431e-a22b-643d61c92c29]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/800/workflows/3e26fd7e-5c5a-4ec3-8af9-4c247d96556a]|
|2|[16882-option-2-trunk|https://github.com/adelapena/cassandra/tree/16882-option-2-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/798/workflows/a859cfbc-fdf8-4468-beb9-b2ee17dc1ae3]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/798/workflows/a4a86879-e283-4aa9-8121-c51fa79095e6]|
|3|[16882-option-3-trunk|https://github.com/adelapena/cassandra/tree/16882-option-3-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/802/workflows/0372f5d6-d1f0-4f0e-91a3-aa75a2712bae]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/802/workflows/3a53f1d3-e43a-4aaa-b163-601b57ca28ac]|
|4|[16882-option-4-trunk|https://github.com/adelapena/cassandra/tree/16882-option-4-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/803/workflows/08ae07d5-6a1e-4e5b-bc0c-32bdc9b9f190]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/803/workflows/51f1b801-afdd-45da-93e7-4f8e24067640]|
This gives us the flexibility of the second approach with the click savings of 
the third approach. However, the downside is that is done by duplicating the 
jobs, because CircleCI doesn't allow disjunctions in job dependencies. That 
leaves us with a more complex graph, and I'm afraid that could be more 
confusing than just writing in the doc what tests are mandatory.


was (Author: adelapena):
I'm adding a fourth option that combines approaches 2 and 3, so the mandatory 
tests can be started either individually or all together with a single start 
button:

||Option||Branch||CI||
|1|[16882-option-1-trunk|https://github.com/adelapena/cassandra/tree/16882-option-1-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/800/workflows/9cb8ca7b-ab57-431e-a22b-643d61c92c29]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/800/workflows/3e26fd7e-5c5a-4ec3-8af9-4c247d96556a]|
|2|[16882-option-2-trunk|https://github.com/adelapena/cassandra/tree/16882-option-2-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/798/workflows/a859cfbc-fdf8-4468-beb9-b2ee17dc1ae3]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/798/workflows/a4a86879-e283-4aa9-8121-c51fa79095e6]|
|3|[16882-option-3-trunk|https://github.com/adelapena/cassandra/tree/16882-option-3-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/799/workflows/91f90e3a-e032-4d57-ba60-45d925c07c99]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/799/workflows/265a64f2-70b6-4a88-8045-89bdf50e5d8d]|
|4|[16882-option-4-trunk|https://github.com/adelapena/cassandra/tree/16882-option-4-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/801/workflows/3b044fbb-0fda-4b30-9544-cdc259f8f09b]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/801/workflows/4c205d19-22ea-4ae8-8618-09c9ec7dcbe9]|
This gives us the flexibility of the second approach with the click savings of 
the third approach. However, the downside is that is done by duplicating the 
jobs, because CircleCI doesn't allow disjunctions in job dependencies. That 
leaves us with a more complex graph, and I'm afraid that could be more 
confusing than just writing in the doc what tests are mandatory.

> Save CircleCI resources with optional test jobs
> ---
>
> Key: CASSANDRA-16882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16882
> Project: Cassandra
>  Issue Type: Task
>  Components: CI
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>
> This ticket implements the addition of approval steps in the CircleCI 
> workflows as it was proposed in [this 
> email|https://lists.apache.org/thread.html/r57bab800d037c087af01b3779fd266d83b538cdd29c120f74a5dbe63%40%3Cdev.cassandra.apache.org%3E]
>  sent to the dev list:
> The current CircleCI configuration automatically runs the unit tests, JVM 
> dtests and cqhshlib tests. This is done by default for every commit or, with 
> some

[jira] [Commented] (CASSANDRA-16789) Add TTL support to nodetool snapshots



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404482#comment-17404482
 ] 

Stefan Miklosovic commented on CASSANDRA-16789:
---

+1

> Add TTL support to nodetool snapshots
> -
>
> Key: CASSANDRA-16789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16789
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Tool/nodetool
>Reporter: Paulo Motta
>Assignee: Abuli Palagashvili
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add new parameter {{--ttl}} to {{nodetool snapshot}} command. This parameter 
> can be specified in human readable duration (ie. 30mins, 1h, 300d) and should 
> not be lower than 1 minute.
> The expiration date should be added to the snapshot manifest in ISO format.
> A periodic thread should efficiently scan snapshots and automatically clear 
> those past expiration date. The periodicity of the scan thread should be 1 
> minute by default but be overridable via a system property.
> The command {{nodetool listsnapshots}} should display the expiration date 
> when the snapshot contains a TTL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16842) Allow CommitLogSegmentReader to optionally skip sync marker CRC checks



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-16842:
--
Status: Ready to Commit  (was: Review In Progress)

> Allow CommitLogSegmentReader to optionally skip sync marker CRC checks
> --
>
> Key: CASSANDRA-16842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16842
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CommitLog sync markers are written in two phases. In the first, zeroes are 
> written for the position of the next sync marker and the sync marker CRC 
> value. In the second, when the next sync marker is written, the actual 
> position and CRC values are written. If the process shuts down in a 
> disorderly fashion, it is entirely possible for a valid next marker position 
> to be written to our memory mapped file but not the final CRC value. Later, 
> when we attempt to replay the segment, we will fail without recovering any of 
> the perfectly valid mutations it contains. (This assumes we’re confining 
> ourselves to the case where there is no compression or encryption.)
> {noformat}
> ERROR 2020-11-18T10:55:23,888 [main] 
> org.apache.cassandra.utils.JVMStabilityInspector:102 - Exiting due to error 
> while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Encountered bad header at position 23091775 of commit log 
> …/CommitLog-6-1605699607608.log, with invalid CRC. The end of segment marker 
> should be zero.
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:731)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.readSyncMarker(CommitLogReplayer.java:274)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:436)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:189)
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:170
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:151)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:332)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:656)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:808)
> {noformat}
> It may be useful to provide an option that would allow us to override the 
> default/strict behavior here and skip the CRC check if a non-zero end 
> position is present, allowing valid mutations to be recovered and startup to 
> proceed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16842) Allow CommitLogSegmentReader to optionally skip sync marker CRC checks



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404471#comment-17404471
 ] 

Josh McKenzie commented on CASSANDRA-16842:
---

A few formatting nits but otherwise +1

> Allow CommitLogSegmentReader to optionally skip sync marker CRC checks
> --
>
> Key: CASSANDRA-16842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16842
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CommitLog sync markers are written in two phases. In the first, zeroes are 
> written for the position of the next sync marker and the sync marker CRC 
> value. In the second, when the next sync marker is written, the actual 
> position and CRC values are written. If the process shuts down in a 
> disorderly fashion, it is entirely possible for a valid next marker position 
> to be written to our memory mapped file but not the final CRC value. Later, 
> when we attempt to replay the segment, we will fail without recovering any of 
> the perfectly valid mutations it contains. (This assumes we’re confining 
> ourselves to the case where there is no compression or encryption.)
> {noformat}
> ERROR 2020-11-18T10:55:23,888 [main] 
> org.apache.cassandra.utils.JVMStabilityInspector:102 - Exiting due to error 
> while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Encountered bad header at position 23091775 of commit log 
> …/CommitLog-6-1605699607608.log, with invalid CRC. The end of segment marker 
> should be zero.
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:731)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.readSyncMarker(CommitLogReplayer.java:274)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:436)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:189)
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:170
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:151)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:332)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:656)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:808)
> {noformat}
> It may be useful to provide an option that would allow us to override the 
> default/strict behavior here and skip the CRC check if a non-zero end 
> position is present, allowing valid mutations to be recovered and startup to 
> proceed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16842) Allow CommitLogSegmentReader to optionally skip sync marker CRC checks



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-16842:
--
Status: Review In Progress  (was: Patch Available)

> Allow CommitLogSegmentReader to optionally skip sync marker CRC checks
> --
>
> Key: CASSANDRA-16842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16842
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Commit Log
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CommitLog sync markers are written in two phases. In the first, zeroes are 
> written for the position of the next sync marker and the sync marker CRC 
> value. In the second, when the next sync marker is written, the actual 
> position and CRC values are written. If the process shuts down in a 
> disorderly fashion, it is entirely possible for a valid next marker position 
> to be written to our memory mapped file but not the final CRC value. Later, 
> when we attempt to replay the segment, we will fail without recovering any of 
> the perfectly valid mutations it contains. (This assumes we’re confining 
> ourselves to the case where there is no compression or encryption.)
> {noformat}
> ERROR 2020-11-18T10:55:23,888 [main] 
> org.apache.cassandra.utils.JVMStabilityInspector:102 - Exiting due to error 
> while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Encountered bad header at position 23091775 of commit log 
> …/CommitLog-6-1605699607608.log, with invalid CRC. The end of segment marker 
> should be zero.
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:731)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.readSyncMarker(CommitLogReplayer.java:274)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:436)
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:189)
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:170
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:151)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:332)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:656)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:808)
> {noformat}
> It may be useful to provide an option that would allow us to override the 
> default/strict behavior here and skip the CRC check if a non-zero end 
> position is present, allowing valid mutations to be recovered and startup to 
> proceed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16850) Add client warnings and abort to tombstone and coordinator reads which go past a low/high watermark

2021-08-25 Thread Marcus Eriksson (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-16850:

Reviewers: Blake Eggleston, Marcus Eriksson  (was: Blake Eggleston)

> Add client warnings and abort to tombstone and coordinator reads which go 
> past a low/high watermark
> ---
>
> Key: CASSANDRA-16850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16850
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Logging
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We currently will abort queries if we hit too many tombstones, but its common 
> that we would want to also warn clients (client warnings) about this before 
> we get that point; its also common that different logic would like to be able 
> to warn/abort about client options (such as reading a large partition).  To 
> allow this we should add a concept of low/high watermarks (warn/abort) to 
> tombstones and coordinator reads.
> Another issue is that current aborts look the same as a random failure, so 
> from an SLA point of view it would be good to differentiate between user 
> behavior being rejected and unexplained issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14309) Make hint window persistent across restarts



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-14309:
--
Fix Version/s: (was: 4.x)
   4.1

> Make hint window persistent across restarts
> ---
>
> Key: CASSANDRA-14309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14309
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Kurt Greaves
>Assignee: Stefan Miklosovic
>Priority: Low
> Fix For: 4.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current hint system stores a window of hints as defined by 
> {{max_hint_window_in_ms}}, however this window is not persistent across 
> restarts.
> Examples (cluster with RF=3 and 3 nodes, A, B, and C):
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # A is restarted
>  # A goes down again without hints replaying from B and C
>  # B and C will store up to another {{max_hint_window_in_ms}} of hints for A
>  
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # B is restarted
>  # B will store up to another {{max_hint_window_in_ms}} of hints for A
>  
> Note that in both these scenarios they can continue forever. If A or B keeps 
> getting restarted hints will continue to pile up.
>  
> Idea of this ticket is to stop this behaviour from happening and only ever 
> store up to {{max_hint_window_in_ms}} of hints for a particular node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14309) Make hint window persistent across restarts



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-14309:
--
Status: Ready to Commit  (was: Review In Progress)

> Make hint window persistent across restarts
> ---
>
> Key: CASSANDRA-14309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14309
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Kurt Greaves
>Assignee: Stefan Miklosovic
>Priority: Low
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current hint system stores a window of hints as defined by 
> {{max_hint_window_in_ms}}, however this window is not persistent across 
> restarts.
> Examples (cluster with RF=3 and 3 nodes, A, B, and C):
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # A is restarted
>  # A goes down again without hints replaying from B and C
>  # B and C will store up to another {{max_hint_window_in_ms}} of hints for A
>  
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # B is restarted
>  # B will store up to another {{max_hint_window_in_ms}} of hints for A
>  
> Note that in both these scenarios they can continue forever. If A or B keeps 
> getting restarted hints will continue to pile up.
>  
> Idea of this ticket is to stop this behaviour from happening and only ever 
> store up to {{max_hint_window_in_ms}} of hints for a particular node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14309) Make hint window persistent across restarts



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404438#comment-17404438
 ] 

Stefan Miklosovic edited comment on CASSANDRA-14309 at 8/25/21, 1:18 PM:
-

I made this feature enabled by default and I updated NEWS. Branches are same.

I am running the build here 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1062/

I plan to merge on Friday 27th in the afternoon CEST if anybody wants to take a 
look before.


was (Author: stefan.miklosovic):
I made this feature enabled by default and I updated NEWS. Branches are same.

I am running the build here 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1062/

If plan to merge on Friday 27th in the afternoon CEST if anybody wants to take 
a look before.

> Make hint window persistent across restarts
> ---
>
> Key: CASSANDRA-14309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14309
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Kurt Greaves
>Assignee: Stefan Miklosovic
>Priority: Low
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current hint system stores a window of hints as defined by 
> {{max_hint_window_in_ms}}, however this window is not persistent across 
> restarts.
> Examples (cluster with RF=3 and 3 nodes, A, B, and C):
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # A is restarted
>  # A goes down again without hints replaying from B and C
>  # B and C will store up to another {{max_hint_window_in_ms}} of hints for A
>  
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # B is restarted
>  # B will store up to another {{max_hint_window_in_ms}} of hints for A
>  
> Note that in both these scenarios they can continue forever. If A or B keeps 
> getting restarted hints will continue to pile up.
>  
> Idea of this ticket is to stop this behaviour from happening and only ever 
> store up to {{max_hint_window_in_ms}} of hints for a particular node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14309) Make hint window persistent across restarts



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404438#comment-17404438
 ] 

Stefan Miklosovic commented on CASSANDRA-14309:
---

I made this feature enabled by default and I updated NEWS. Branches are same.

I am running the build here 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1062/

If plan to merge on Friday 27th in the afternoon CEST if anybody wants to take 
a look before.

> Make hint window persistent across restarts
> ---
>
> Key: CASSANDRA-14309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14309
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Kurt Greaves
>Assignee: Stefan Miklosovic
>Priority: Low
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current hint system stores a window of hints as defined by 
> {{max_hint_window_in_ms}}, however this window is not persistent across 
> restarts.
> Examples (cluster with RF=3 and 3 nodes, A, B, and C):
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # A is restarted
>  # A goes down again without hints replaying from B and C
>  # B and C will store up to another {{max_hint_window_in_ms}} of hints for A
>  
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # B is restarted
>  # B will store up to another {{max_hint_window_in_ms}} of hints for A
>  
> Note that in both these scenarios they can continue forever. If A or B keeps 
> getting restarted hints will continue to pile up.
>  
> Idea of this ticket is to stop this behaviour from happening and only ever 
> store up to {{max_hint_window_in_ms}} of hints for a particular node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16882) Save CircleCI resources with optional test jobs



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404433#comment-17404433
 ] 

Andres de la Peña commented on CASSANDRA-16882:
---

I'm adding a fourth option that combines approaches 2 and 3, so the mandatory 
tests can be started either individually or all together with a single start 
button:

||Option||Branch||CI||
|1|[16882-option-1-trunk|https://github.com/adelapena/cassandra/tree/16882-option-1-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/800/workflows/9cb8ca7b-ab57-431e-a22b-643d61c92c29]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/800/workflows/3e26fd7e-5c5a-4ec3-8af9-4c247d96556a]|
|2|[16882-option-2-trunk|https://github.com/adelapena/cassandra/tree/16882-option-2-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/798/workflows/a859cfbc-fdf8-4468-beb9-b2ee17dc1ae3]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/798/workflows/a4a86879-e283-4aa9-8121-c51fa79095e6]|
|3|[16882-option-3-trunk|https://github.com/adelapena/cassandra/tree/16882-option-3-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/799/workflows/91f90e3a-e032-4d57-ba60-45d925c07c99]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/799/workflows/265a64f2-70b6-4a88-8045-89bdf50e5d8d]|
|4|[16882-option-4-trunk|https://github.com/adelapena/cassandra/tree/16882-option-4-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/801/workflows/3b044fbb-0fda-4b30-9544-cdc259f8f09b]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/801/workflows/4c205d19-22ea-4ae8-8618-09c9ec7dcbe9]|
This gives us the flexibility of the second approach with the click savings of 
the third approach. However, the downside is that is done by duplicating the 
jobs, because CircleCI doesn't allow disjunctions in job dependencies. That 
leaves us with a more complex graph, and I'm afraid that could be more 
confusing than just writing in the doc what tests are mandatory.

> Save CircleCI resources with optional test jobs
> ---
>
> Key: CASSANDRA-16882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16882
> Project: Cassandra
>  Issue Type: Task
>  Components: CI
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>
> This ticket implements the addition of approval steps in the CircleCI 
> workflows as it was proposed in [this 
> email|https://lists.apache.org/thread.html/r57bab800d037c087af01b3779fd266d83b538cdd29c120f74a5dbe63%40%3Cdev.cassandra.apache.org%3E]
>  sent to the dev list:
> The current CircleCI configuration automatically runs the unit tests, JVM 
> dtests and cqhshlib tests. This is done by default for every commit or, with 
> some configuration, for every push.
> Along the lifecycle of a ticket it is quite frequent to have multiple commits 
> and pushes, all running these test jobs. I'd say that frequently it is not 
> necessary to run the tests for some of those intermediate commits and pushes. 
> For example, one can show proofs of concept, or have multiple rounds of 
> review before actually running the tests. Running the tests for every change 
> can produce an unnecessary expense of CircleCI resources.
> I think we could make running those tests optional, as well as clearly 
> specifying in the documentation what are the tests runs that are mandatory 
> before actually committing. We could do this in different ways:
>  # Make the entire CircleCI workflow optional, so the build job requires
>  manual approval. Once the build is approved the mandatory test jobs would
>  be run without any further approval, exactly as it's currently done.
>  # Make all the test jobs optional, so every test job requires manual 
> approval, and the documentation specifies which tests are mandatory in the 
> final steps of a ticket.
>  # Make all the mandatory test jobs depend on a single optional job, so we 
> have a single button to optionally run all the mandatory tests.
> I think any of these changes, or a combination of them, would significantly
>  reduce the usage of resources without making things less tested. The only
>  downside I can think of is that we would need some additional clicks on the
>  CircleCI GUI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16882) Save CircleCI resources with optional test jobs



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404404#comment-17404404
 ] 

Andres de la Peña commented on CASSANDRA-16882:
---

CC [~edimitrova]

> Save CircleCI resources with optional test jobs
> ---
>
> Key: CASSANDRA-16882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16882
> Project: Cassandra
>  Issue Type: Task
>  Components: CI
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>
> This ticket implements the addition of approval steps in the CircleCI 
> workflows as it was proposed in [this 
> email|https://lists.apache.org/thread.html/r57bab800d037c087af01b3779fd266d83b538cdd29c120f74a5dbe63%40%3Cdev.cassandra.apache.org%3E]
>  sent to the dev list:
> The current CircleCI configuration automatically runs the unit tests, JVM 
> dtests and cqhshlib tests. This is done by default for every commit or, with 
> some configuration, for every push.
> Along the lifecycle of a ticket it is quite frequent to have multiple commits 
> and pushes, all running these test jobs. I'd say that frequently it is not 
> necessary to run the tests for some of those intermediate commits and pushes. 
> For example, one can show proofs of concept, or have multiple rounds of 
> review before actually running the tests. Running the tests for every change 
> can produce an unnecessary expense of CircleCI resources.
> I think we could make running those tests optional, as well as clearly 
> specifying in the documentation what are the tests runs that are mandatory 
> before actually committing. We could do this in different ways:
>  # Make the entire CircleCI workflow optional, so the build job requires
>  manual approval. Once the build is approved the mandatory test jobs would
>  be run without any further approval, exactly as it's currently done.
>  # Make all the test jobs optional, so every test job requires manual 
> approval, and the documentation specifies which tests are mandatory in the 
> final steps of a ticket.
>  # Make all the mandatory test jobs depend on a single optional job, so we 
> have a single button to optionally run all the mandatory tests.
> I think any of these changes, or a combination of them, would significantly
>  reduce the usage of resources without making things less tested. The only
>  downside I can think of is that we would need some additional clicks on the
>  CircleCI GUI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16882) Save CircleCI resources with optional test jobs



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16882:
--
Change Category: Quality Assurance
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Save CircleCI resources with optional test jobs
> ---
>
> Key: CASSANDRA-16882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16882
> Project: Cassandra
>  Issue Type: Task
>  Components: CI
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>
> This ticket implements the addition of approval steps in the CircleCI 
> workflows as it was proposed in [this 
> email|https://lists.apache.org/thread.html/r57bab800d037c087af01b3779fd266d83b538cdd29c120f74a5dbe63%40%3Cdev.cassandra.apache.org%3E]
>  sent to the dev list:
> The current CircleCI configuration automatically runs the unit tests, JVM 
> dtests and cqhshlib tests. This is done by default for every commit or, with 
> some configuration, for every push.
> Along the lifecycle of a ticket it is quite frequent to have multiple commits 
> and pushes, all running these test jobs. I'd say that frequently it is not 
> necessary to run the tests for some of those intermediate commits and pushes. 
> For example, one can show proofs of concept, or have multiple rounds of 
> review before actually running the tests. Running the tests for every change 
> can produce an unnecessary expense of CircleCI resources.
> I think we could make running those tests optional, as well as clearly 
> specifying in the documentation what are the tests runs that are mandatory 
> before actually committing. We could do this in different ways:
>  # Make the entire CircleCI workflow optional, so the build job requires
>  manual approval. Once the build is approved the mandatory test jobs would
>  be run without any further approval, exactly as it's currently done.
>  # Make all the test jobs optional, so every test job requires manual 
> approval, and the documentation specifies which tests are mandatory in the 
> final steps of a ticket.
>  # Make all the mandatory test jobs depend on a single optional job, so we 
> have a single button to optionally run all the mandatory tests.
> I think any of these changes, or a combination of them, would significantly
>  reduce the usage of resources without making things less tested. The only
>  downside I can think of is that we would need some additional clicks on the
>  CircleCI GUI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16882) Save CircleCI resources with optional test jobs



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404399#comment-17404399
 ] 

Andres de la Peña commented on CASSANDRA-16882:
---

Here are drafts of what each approach would look like for trunk:
||Option||Branch||CI||
|1|[16882-option-1-trunk|https://github.com/adelapena/cassandra/tree/16882-option-1-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/800/workflows/9cb8ca7b-ab57-431e-a22b-643d61c92c29]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/800/workflows/3e26fd7e-5c5a-4ec3-8af9-4c247d96556a]|
|2|[16882-option-2-trunk|https://github.com/adelapena/cassandra/tree/16882-option-2-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/798/workflows/a859cfbc-fdf8-4468-beb9-b2ee17dc1ae3]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/798/workflows/a4a86879-e283-4aa9-8121-c51fa79095e6]|
|3|[16882-option-3-trunk|https://github.com/adelapena/cassandra/tree/16882-option-3-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/799/workflows/91f90e3a-e032-4d57-ba60-45d925c07c99]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/799/workflows/265a64f2-70b6-4a88-8045-89bdf50e5d8d]|

> Save CircleCI resources with optional test jobs
> ---
>
> Key: CASSANDRA-16882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16882
> Project: Cassandra
>  Issue Type: Task
>  Components: CI
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>
> This ticket implements the addition of approval steps in the CircleCI 
> workflows as it was proposed in [this 
> email|https://lists.apache.org/thread.html/r57bab800d037c087af01b3779fd266d83b538cdd29c120f74a5dbe63%40%3Cdev.cassandra.apache.org%3E]
>  sent to the dev list:
> The current CircleCI configuration automatically runs the unit tests, JVM 
> dtests and cqhshlib tests. This is done by default for every commit or, with 
> some configuration, for every push.
> Along the lifecycle of a ticket it is quite frequent to have multiple commits 
> and pushes, all running these test jobs. I'd say that frequently it is not 
> necessary to run the tests for some of those intermediate commits and pushes. 
> For example, one can show proofs of concept, or have multiple rounds of 
> review before actually running the tests. Running the tests for every change 
> can produce an unnecessary expense of CircleCI resources.
> I think we could make running those tests optional, as well as clearly 
> specifying in the documentation what are the tests runs that are mandatory 
> before actually committing. We could do this in different ways:
>  # Make the entire CircleCI workflow optional, so the build job requires
>  manual approval. Once the build is approved the mandatory test jobs would
>  be run without any further approval, exactly as it's currently done.
>  # Make all the test jobs optional, so every test job requires manual 
> approval, and the documentation specifies which tests are mandatory in the 
> final steps of a ticket.
>  # Make all the mandatory test jobs depend on a single optional job, so we 
> have a single button to optionally run all the mandatory tests.
> I think any of these changes, or a combination of them, would significantly
>  reduce the usage of resources without making things less tested. The only
>  downside I can think of is that we would need some additional clicks on the
>  CircleCI GUI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-16882) Save CircleCI resources with optional test jobs

Andres de la Peña created CASSANDRA-16882:
-

Summary: Save CircleCI resources with optional test jobs
Key: CASSANDRA-16882
URL: https://issues.apache.org/jira/browse/CASSANDRA-16882
Project: Cassandra
Issue Type: Task
Components: CI
Reporter: Andres de la Peña
Assignee: Andres de la Peña

This ticket implements the addition of approval steps in the CircleCI workflows
as it was proposed in [this
email|https://lists.apache.org/thread.html/r57bab800d037c087af01b3779fd266d83b538cdd29c120f74a5dbe63%40%3Cdev.cassandra.apache.org%3E]
sent to the dev list:

The current CircleCI configuration automatically runs the unit tests, JVM
dtests and cqhshlib tests. This is done by default for every commit or, with
some configuration, for every push.

Along the lifecycle of a ticket it is quite frequent to have multiple commits
and pushes, all running these test jobs. I'd say that frequently it is not
necessary to run the tests for some of those intermediate commits and pushes.
For example, one can show proofs of concept, or have multiple rounds of review
before actually running the tests. Running the tests for every change can
produce an unnecessary expense of CircleCI resources.

I think we could make running those tests optional, as well as clearly
specifying in the documentation what are the tests runs that are mandatory
before actually committing. We could do this in different ways:
# Make the entire CircleCI workflow optional, so the build job requires
manual approval. Once the build is approved the mandatory test jobs would
be run without any further approval, exactly as it's currently done.
# Make all the test jobs optional, so every test job requires manual approval,
and the documentation specifies which tests are mandatory in the final steps of
a ticket.
# Make all the mandatory test jobs depend on a single optional job, so we have
a single button to optionally run all the mandatory tests.

I think any of these changes, or a combination of them, would significantly
reduce the usage of resources without making things less tested. The only
downside I can think of is that we would need some additional clicks on the
CircleCI GUI.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16873) Tolerate missing DNS entry when completing a host replacement

2021-08-25 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404384#comment-17404384
 ] 

Benedict Elliott Smith commented on CASSANDRA-16873:


Either way, this exchange highlights an issue to address which is that an 
absolutist policy leads to gamification of terminology. Clearly we shouldn't be 
calling something a bug fix _so it can be included in a release_.

I agree it would be a good idea to discuss this more on list. Eventually we'll 
zero in on a policy we can all agree to adopt as well as agree what it means.

It's probably sensible to preferentially refactor the _existing_ wiki docs we 
have on this topic though, and to vote again to modify them. Otherwise we're 
just going to have a classic "now you have two problems" situation.

> Tolerate missing DNS entry when completing a host replacement
> -
>
> Key: CASSANDRA-16873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16873
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.0.x
>
>
> In one of our deployments, after a host replacement a subset of nodes still 
> saw the nodes as JOINING despite the rest of the cluster seeing it as NORMAL 
> with a failure to gossip. This was traced to a DNS lookup failure on the 
> nodes during an interim state leading to an exception being thrown and gossip 
> state never transitioning.
> Rather than implicitly requiring operators to bounce the node by throwing an 
> exception, we should instead suppress the exception when checking if a node 
> is replacing the same host address and ID if we get an UnknownHostException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16873) Tolerate missing DNS entry when completing a host replacement



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404378#comment-17404378
 ] 

Josh McKenzie commented on CASSANDRA-16873:
---

bq. if we're calling this an improvement it needs to go into 4.1 as 4.0 is 
bugfix only.
[~mck] is working on a wiki doc about this very issue, and there's some 
confusion on the topic. Probably worth taking to the ML, but my understanding 
is roughly as follows:

# 4.0->5.0 == protocol/deprecation/API break
# 4.0->4.1 == new features and disruptive changes
# 4.0.0-4.0.1 == bug fixes and small improvements that are optional, don't 
change default behavior, are additive, and should not impact existing clusters

fwiw, the above matches Mick's draft wiki article but _doesn't_ match Caleb's 
understanding (he and I were discussing this offline yesterday).

TL;DR: Probably should hit the ML. I'm happy for this to go wherever, but we 
should all get aligned and document this.

> Tolerate missing DNS entry when completing a host replacement
> -
>
> Key: CASSANDRA-16873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16873
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 4.0.x
>
>
> In one of our deployments, after a host replacement a subset of nodes still 
> saw the nodes as JOINING despite the rest of the cluster seeing it as NORMAL 
> with a failure to gossip. This was traced to a DNS lookup failure on the 
> nodes during an interim state leading to an exception being thrown and gossip 
> state never transitioning.
> Rather than implicitly requiring operators to bounce the node by throwing an 
> exception, we should instead suppress the exception when checking if a node 
> is replacing the same host address and ID if we get an UnknownHostException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14564) Adding regular column to COMPACT tables without clustering columns should trigger an InvalidRequestException



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404335#comment-17404335
 ] 

Stefan Miklosovic commented on CASSANDRA-14564:
---

I am not completely sure about the test results, I ll try to debug it more. I 
am not sure if it is just flaky or I introduced some regression.

>  Adding regular column to COMPACT tables without clustering columns should 
> trigger an InvalidRequestException
> -
>
> Key: CASSANDRA-14564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14564
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core, Legacy/CQL
>Reporter: Laxmikant Upadhyay
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Labels: lhf
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> I have upgraded my system from cassandra 2.1.16 to 3.11.2. We had some tables 
> with COMPACT STORAGE enabled. We see some weird   behaviour of cassandra 
> while adding a column into it.
> Cassandra does not give any error while altering  however the added column is 
> invisible. 
> Same behaviour when we create a new table with compact storage and try to 
> alter it. Below is the commands ran in sequence: 
>  
> {code:java}
> x@cqlsh:xuser> CREATE TABLE xuser.employee(emp_id int PRIMARY KEY,emp_name 
> text, emp_city text, emp_sal varint, emp_phone varint ) WITH  COMPACT STORAGE;
> x@cqlsh:xuser> desc table xuser.employee ;
> CREATE TABLE xuser.employee (
> emp_id int PRIMARY KEY,
> emp_city text,
> emp_name text,
> emp_phone varint,
> emp_sal varint
> ) WITH COMPACT STORAGE
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';{code}
> Now altering the table by adding a new column:
>   
> {code:java}
> x@cqlsh:xuser>  alter table employee add profile text;
> x@cqlsh:xuser> desc table xuser.employee ;
> CREATE TABLE xuser.employee (
> emp_id int PRIMARY KEY,
> emp_city text,
> emp_name text,
> emp_phone varint,
> emp_sal varint
> ) WITH COMPACT STORAGE
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> notice that above desc table result does not have newly added column profile. 
> However when i try to add it again it gives column already exist;
> {code:java}
> x@cqlsh:xuser>  alter table employee add profile text;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> column name profile because it conflicts with an existing column"
> x@cqlsh:xuser> select emp_name,profile from employee;
>  emp_name | profile
> --+-
> (0 rows)
> x@cqlsh:xuser>
> {code}
> Inserting also behaves strange:
> {code:java}
> x@cqlsh:xuser> INSERT INTO employee (emp_id , emp_city , emp_name , emp_phone 
> , emp_sal ,profile) VALUES ( 1, 'ggn', 'john', 123456, 5, 'SE');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
> clustering keys are missing: column1"
> x@cqlsh:xuser> INSERT INTO employee (emp_id , emp_city , emp_name , emp_phone 
> , emp_sal ,profile,column1) VALUES ( 1, 'ggn', 'john', 123456, 5, 
> 'SE',null);
> x@cqlsh:xuser> select * from employee;
>  emp_id | emp_city | emp_name | emp_phone | emp_sal
> +--+--+---+-
> (0 rows)
> {code}
> *How to solve that ticket* 
> ([~blerer])--
>  
> Adding regular columns to non-dense compact tables should be forbidden as it 
> is the

[jira] [Updated] (CASSANDRA-14564) Adding regular column to COMPACT tables without clustering columns should trigger an InvalidRequestException