[jira] [Commented] (CASSANDRA-16016) sstablemetadata unit test, docs and params parsing hardening

2020-10-05 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208485#comment-17208485
 ] 

Berenguer Blasi commented on CASSANDRA-16016:
-

This is up for review but can't move status forward

> sstablemetadata unit test, docs and params parsing hardening
> 
>
> Key: CASSANDRA-16016
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16016
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/sstable
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During CASSANDRA-15883 / CASSANDRA-15991 it was detected unit test coverage 
> for this tool is minimal. There is a unit test to enhance upon under 
> {{test/unit/org/apache/cassandra/tools}}. Also docs are missing some options 
> and args parsing is brittle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16120) Add ability for jvm-dtest to grep instance logs

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208470#comment-17208470
 ] 

David Capwell edited comment on CASSANDRA-16120 at 10/6/20, 4:27 AM:
-

Committed: 
https://github.com/apache/cassandra/commit/63b172e137e0306aefd84f373963d8014c5a5efa

CI Results:

Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16120-trunk-238FCFFB-707F-49E9-AB7A-05A0FC6A4C0F
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/68/


was (Author: dcapwell):
Starting commit (pending):

CI Results:

Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16120-trunk-238FCFFB-707F-49E9-AB7A-05A0FC6A4C0F
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/68/

> Add ability for jvm-dtest to grep instance logs
> ---
>
> Key: CASSANDRA-16120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16120
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> One of the main gaps between python dtest and jvm dtest is python dtest 
> supports the ability to grep the logs of an instance; we need this capability 
> as some tests require validating logs were triggered.
> Pydocs for common log methods 
> {code}
> |  grep_log(self, expr, filename='system.log', from_mark=None)
> |  Returns a list of lines matching the regular expression in parameter
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors(self, filename='system.log')
> |  Returns a list of errors with stack traces
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors_from(self, filename='system.log', seek_start=0)
> {code}
> {code}
> |  watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log')
> |  Watch the log until one or more (regular) expression are found.
> |  This methods when all the expressions have been found or the method
> |  timeouts (a TimeoutError is then raised). On successful completion,
> |  a list of pair (line matched, match object) is returned.
> {code}
> Below is a POC showing a way to do such logic
> {code}
> package org.apache.cassandra.distributed.test;
> import java.io.BufferedReader;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.io.UncheckedIOException;
> import java.nio.charset.StandardCharsets;
> import java.util.Iterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.regex.Matcher;
> import java.util.regex.Pattern;
> import java.util.stream.Stream;
> import java.util.stream.StreamSupport;
> import com.google.common.io.Closeables;
> import org.junit.Test;
> import org.apache.cassandra.distributed.Cluster;
> import org.apache.cassandra.utils.AbstractIterator;
> public class AllTheLogs extends TestBaseImpl
> {
>@Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>String log = String.format("build/test/logs/%s/TEST-%s.log", tag, 
> suite);
>grep(log, "Enqueuing flush of tables").forEach(l -> 
> System.out.println("I found the thing: " + l));
>}
>}
>private static Stream grep(String file, String regex) throws 
> IOException
>{
>return grep(file, Pattern.compile(regex));
>}
>private static Stream grep(String file, Pattern regex) throws 
> IOException
>{
>BufferedReader reader = new BufferedReader(new InputStreamReader(new 
> FileInputStream(file), StandardCharsets.UTF_8));
>Iterator it = new AbstractIterator()
>{
>protected String computeNext()
>{
>try
>{
>String s;
>while ((s = reader.readLine()) != null)
>{
>Matcher m = regex.matcher(s);
>if (m.find())
>return s;
>}
>reader.close();
>return endOfData();
>}
>catch (IOException e)
>{
>Closeables.closeQuietly(reader);
>throw new 

[cassandra] branch trunk updated: Add ability for jvm-dtest to grep instance logs

2020-10-05 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 63b172e  Add ability for jvm-dtest to grep instance logs
63b172e is described below

commit 63b172e137e0306aefd84f373963d8014c5a5efa
Author: David Capwell 
AuthorDate: Mon Oct 5 20:21:09 2020 -0700

Add ability for jvm-dtest to grep instance logs

patch by David Capwell; reviewed by Alex Petrov, Yifan Cai for 
CASSANDRA-16120
---
 test/conf/logback-dtest.xml|  27 +
 .../distributed/impl/AbstractCluster.java  |  25 +++--
 ...nstanceIDDefiner.java => ClusterIDDefiner.java} |  22 ++--
 .../cassandra/distributed/impl/FileLogAction.java  | 115 +
 .../cassandra/distributed/impl/Instance.java   |  25 -
 .../distributed/impl/InstanceIDDefiner.java|  12 ++-
 .../cassandra/distributed/test/JVMDTestTest.java   |  32 ++
 7 files changed, 217 insertions(+), 41 deletions(-)

diff --git a/test/conf/logback-dtest.xml b/test/conf/logback-dtest.xml
index 370e1e5..52eaf33 100644
--- a/test/conf/logback-dtest.xml
+++ b/test/conf/logback-dtest.xml
@@ -18,35 +18,18 @@
 -->
 
 
+  
   
 
   
   
 
-  
-
-./build/test/logs/${cassandra.testtag}/TEST-${suitename}.log
-
-  
./build/test/logs/${cassandra.testtag}/TEST-${suitename}.log.%i.gz
-  1
-  20
-
-
-
-  20MB
-
-
+  
+
./build/test/logs/${cassandra.testtag}/${suitename}/${cluster_id}/${instance_id}/system.log
 
   %-5level [%thread] ${instance_id} %date{ISO8601} 
%msg%n
 
-false
-  
-
-  
-0
-0
-1024
-
+true
   
 
   
@@ -70,7 +53,7 @@
   
 
   
-
+ 
 
 
   
diff --git 
a/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java 
b/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
index b6f359a..bd3f338 100644
--- 
a/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
+++ 
b/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
@@ -22,12 +22,12 @@ import java.io.File;
 import java.net.InetSocketAddress;
 import java.util.ArrayList;
 import java.util.Arrays;
-import java.util.Collection;
 import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.UUID;
 import java.util.concurrent.CopyOnWriteArrayList;
 import java.util.concurrent.Future;
 import java.util.concurrent.TimeUnit;
@@ -35,7 +35,6 @@ import java.util.concurrent.atomic.AtomicInteger;
 import java.util.function.BiConsumer;
 import java.util.function.BiPredicate;
 import java.util.function.Consumer;
-import java.util.function.Predicate;
 import java.util.stream.Collectors;
 import java.util.stream.IntStream;
 import java.util.stream.Stream;
@@ -46,11 +45,11 @@ import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.dht.IPartitioner;
+import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.distributed.api.ConsistencyLevel;
 import org.apache.cassandra.distributed.api.Feature;
 import org.apache.cassandra.distributed.api.ICluster;
-import org.apache.cassandra.dht.IPartitioner;
-import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.distributed.api.ICoordinator;
 import org.apache.cassandra.distributed.api.IInstance;
 import org.apache.cassandra.distributed.api.IInstanceConfig;
@@ -60,6 +59,7 @@ import org.apache.cassandra.distributed.api.IListen;
 import org.apache.cassandra.distributed.api.IMessage;
 import org.apache.cassandra.distributed.api.IMessageFilters;
 import org.apache.cassandra.distributed.api.IUpgradeableInstance;
+import org.apache.cassandra.distributed.api.LogAction;
 import org.apache.cassandra.distributed.api.NodeToolResult;
 import org.apache.cassandra.distributed.api.TokenSupplier;
 import org.apache.cassandra.distributed.shared.InstanceClassLoader;
@@ -108,6 +108,7 @@ public abstract class AbstractCluster 
implements ICluster 
implements ICluster)Instance::new, classLoader)
@@ -267,6 +268,18 @@ public abstract class AbstractCluster 
implements ICluster 
implements ICluster fn)
+{
+RandomAccessFile reader;
+try
+{
+reader = new RandomAccessFile(file, "r");
+}
+catch (FileNotFoundException e)
+{
+// if file isn't present, don't return an empty stream as it looks 
the same as no log lines matched
+throw new UncheckedIOException(e);
+}
+if (startPosition > 0) // -1 used to disable, so ignore any negative 
values or 0 (default offset)
+{
+try
+{
+reader.seek(startPosition);
+}
+catch (IOException e)
+{

[jira] [Commented] (CASSANDRA-16120) Add ability for jvm-dtest to grep instance logs

2020-10-05 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208472#comment-17208472
 ] 

Yifan Cai commented on CASSANDRA-16120:
---

UUID suffix works. Thanks!

> Add ability for jvm-dtest to grep instance logs
> ---
>
> Key: CASSANDRA-16120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16120
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> One of the main gaps between python dtest and jvm dtest is python dtest 
> supports the ability to grep the logs of an instance; we need this capability 
> as some tests require validating logs were triggered.
> Pydocs for common log methods 
> {code}
> |  grep_log(self, expr, filename='system.log', from_mark=None)
> |  Returns a list of lines matching the regular expression in parameter
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors(self, filename='system.log')
> |  Returns a list of errors with stack traces
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors_from(self, filename='system.log', seek_start=0)
> {code}
> {code}
> |  watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log')
> |  Watch the log until one or more (regular) expression are found.
> |  This methods when all the expressions have been found or the method
> |  timeouts (a TimeoutError is then raised). On successful completion,
> |  a list of pair (line matched, match object) is returned.
> {code}
> Below is a POC showing a way to do such logic
> {code}
> package org.apache.cassandra.distributed.test;
> import java.io.BufferedReader;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.io.UncheckedIOException;
> import java.nio.charset.StandardCharsets;
> import java.util.Iterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.regex.Matcher;
> import java.util.regex.Pattern;
> import java.util.stream.Stream;
> import java.util.stream.StreamSupport;
> import com.google.common.io.Closeables;
> import org.junit.Test;
> import org.apache.cassandra.distributed.Cluster;
> import org.apache.cassandra.utils.AbstractIterator;
> public class AllTheLogs extends TestBaseImpl
> {
>@Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>String log = String.format("build/test/logs/%s/TEST-%s.log", tag, 
> suite);
>grep(log, "Enqueuing flush of tables").forEach(l -> 
> System.out.println("I found the thing: " + l));
>}
>}
>private static Stream grep(String file, String regex) throws 
> IOException
>{
>return grep(file, Pattern.compile(regex));
>}
>private static Stream grep(String file, Pattern regex) throws 
> IOException
>{
>BufferedReader reader = new BufferedReader(new InputStreamReader(new 
> FileInputStream(file), StandardCharsets.UTF_8));
>Iterator it = new AbstractIterator()
>{
>protected String computeNext()
>{
>try
>{
>String s;
>while ((s = reader.readLine()) != null)
>{
>Matcher m = regex.matcher(s);
>if (m.find())
>return s;
>}
>reader.close();
>return endOfData();
>}
>catch (IOException e)
>{
>Closeables.closeQuietly(reader);
>throw new UncheckedIOException(e);
>}
>}
>};
>return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, 
> Spliterator.ORDERED), false);
>}
> }
> {code}
> And
> {code}
> @Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>//TODO missing way to get node id
> //cluster.get(1);
>String log = 
> 

[jira] [Comment Edited] (CASSANDRA-16120) Add ability for jvm-dtest to grep instance logs

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208470#comment-17208470
 ] 

David Capwell edited comment on CASSANDRA-16120 at 10/6/20, 4:01 AM:
-

Starting commit (pending):

CI Results:

Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16120-trunk-238FCFFB-707F-49E9-AB7A-05A0FC6A4C0F
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/68/


was (Author: dcapwell):
Starting commit (pending):

CI Results:

Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/617/workflows/cdc809fc-f5eb-444e-b155-3ad98e926b9e
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/68/

> Add ability for jvm-dtest to grep instance logs
> ---
>
> Key: CASSANDRA-16120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16120
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> One of the main gaps between python dtest and jvm dtest is python dtest 
> supports the ability to grep the logs of an instance; we need this capability 
> as some tests require validating logs were triggered.
> Pydocs for common log methods 
> {code}
> |  grep_log(self, expr, filename='system.log', from_mark=None)
> |  Returns a list of lines matching the regular expression in parameter
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors(self, filename='system.log')
> |  Returns a list of errors with stack traces
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors_from(self, filename='system.log', seek_start=0)
> {code}
> {code}
> |  watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log')
> |  Watch the log until one or more (regular) expression are found.
> |  This methods when all the expressions have been found or the method
> |  timeouts (a TimeoutError is then raised). On successful completion,
> |  a list of pair (line matched, match object) is returned.
> {code}
> Below is a POC showing a way to do such logic
> {code}
> package org.apache.cassandra.distributed.test;
> import java.io.BufferedReader;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.io.UncheckedIOException;
> import java.nio.charset.StandardCharsets;
> import java.util.Iterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.regex.Matcher;
> import java.util.regex.Pattern;
> import java.util.stream.Stream;
> import java.util.stream.StreamSupport;
> import com.google.common.io.Closeables;
> import org.junit.Test;
> import org.apache.cassandra.distributed.Cluster;
> import org.apache.cassandra.utils.AbstractIterator;
> public class AllTheLogs extends TestBaseImpl
> {
>@Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>String log = String.format("build/test/logs/%s/TEST-%s.log", tag, 
> suite);
>grep(log, "Enqueuing flush of tables").forEach(l -> 
> System.out.println("I found the thing: " + l));
>}
>}
>private static Stream grep(String file, String regex) throws 
> IOException
>{
>return grep(file, Pattern.compile(regex));
>}
>private static Stream grep(String file, Pattern regex) throws 
> IOException
>{
>BufferedReader reader = new BufferedReader(new InputStreamReader(new 
> FileInputStream(file), StandardCharsets.UTF_8));
>Iterator it = new AbstractIterator()
>{
>protected String computeNext()
>{
>try
>{
>String s;
>while ((s = reader.readLine()) != null)
>{
>Matcher m = regex.matcher(s);
>if (m.find())
>return s;
>}
>reader.close();
>return endOfData();
>}
>catch (IOException e)
>{
>Closeables.closeQuietly(reader);
>throw new UncheckedIOException(e);
>}
>}
>};
>return 

[jira] [Commented] (CASSANDRA-16120) Add ability for jvm-dtest to grep instance logs

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208470#comment-17208470
 ] 

David Capwell commented on CASSANDRA-16120:
---

Starting commit (pending):

CI Results:

Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/617/workflows/cdc809fc-f5eb-444e-b155-3ad98e926b9e
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/68/

> Add ability for jvm-dtest to grep instance logs
> ---
>
> Key: CASSANDRA-16120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16120
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> One of the main gaps between python dtest and jvm dtest is python dtest 
> supports the ability to grep the logs of an instance; we need this capability 
> as some tests require validating logs were triggered.
> Pydocs for common log methods 
> {code}
> |  grep_log(self, expr, filename='system.log', from_mark=None)
> |  Returns a list of lines matching the regular expression in parameter
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors(self, filename='system.log')
> |  Returns a list of errors with stack traces
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors_from(self, filename='system.log', seek_start=0)
> {code}
> {code}
> |  watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log')
> |  Watch the log until one or more (regular) expression are found.
> |  This methods when all the expressions have been found or the method
> |  timeouts (a TimeoutError is then raised). On successful completion,
> |  a list of pair (line matched, match object) is returned.
> {code}
> Below is a POC showing a way to do such logic
> {code}
> package org.apache.cassandra.distributed.test;
> import java.io.BufferedReader;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.io.UncheckedIOException;
> import java.nio.charset.StandardCharsets;
> import java.util.Iterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.regex.Matcher;
> import java.util.regex.Pattern;
> import java.util.stream.Stream;
> import java.util.stream.StreamSupport;
> import com.google.common.io.Closeables;
> import org.junit.Test;
> import org.apache.cassandra.distributed.Cluster;
> import org.apache.cassandra.utils.AbstractIterator;
> public class AllTheLogs extends TestBaseImpl
> {
>@Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>String log = String.format("build/test/logs/%s/TEST-%s.log", tag, 
> suite);
>grep(log, "Enqueuing flush of tables").forEach(l -> 
> System.out.println("I found the thing: " + l));
>}
>}
>private static Stream grep(String file, String regex) throws 
> IOException
>{
>return grep(file, Pattern.compile(regex));
>}
>private static Stream grep(String file, Pattern regex) throws 
> IOException
>{
>BufferedReader reader = new BufferedReader(new InputStreamReader(new 
> FileInputStream(file), StandardCharsets.UTF_8));
>Iterator it = new AbstractIterator()
>{
>protected String computeNext()
>{
>try
>{
>String s;
>while ((s = reader.readLine()) != null)
>{
>Matcher m = regex.matcher(s);
>if (m.find())
>return s;
>}
>reader.close();
>return endOfData();
>}
>catch (IOException e)
>{
>Closeables.closeQuietly(reader);
>throw new UncheckedIOException(e);
>}
>}
>};
>return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, 
> Spliterator.ORDERED), false);
>}
> }
> {code}
> And
> {code}
> @Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>  

[jira] [Commented] (CASSANDRA-16120) Add ability for jvm-dtest to grep instance logs

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208459#comment-17208459
 ] 

David Capwell commented on CASSANDRA-16120:
---

[~yifanc] I believe I addressed your issue by moving away from generation 
(count of instances created in the JVM) to a random uuid for the cluster; here 
is a sample of what I see in circle ci when it uploads the logs

{code}
Uploading 
/tmp/cassandra/build/test/logs/org.apache.cassandra.distributed.test.RepairTest///system.log
 (584 kB): DONE
Uploading 
/tmp/cassandra/build/test/logs/org.apache.cassandra.distributed.test.RepairTest/cluster-6b478068-65ae-4621-b759-11208d67d0ea/node1/system.log
 (60 MB): DONE
Uploading 
/tmp/cassandra/build/test/logs/org.apache.cassandra.distributed.test.RepairTest/cluster-6b478068-65ae-4621-b759-11208d67d0ea/node2/system.log
 (1.4 MB): DONE
Uploading 
/tmp/cassandra/build/test/logs/org.apache.cassandra.distributed.test.RepairTest/cluster-6b478068-65ae-4621-b759-11208d67d0ea/node3/system.log
 (1.1 MB): DONE
{code}

Here we see that the logs are scoped to the test, then cluster-, 
then node, we also have the test logs in the main directory (non jvm-dtest 
class loader).

> Add ability for jvm-dtest to grep instance logs
> ---
>
> Key: CASSANDRA-16120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16120
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> One of the main gaps between python dtest and jvm dtest is python dtest 
> supports the ability to grep the logs of an instance; we need this capability 
> as some tests require validating logs were triggered.
> Pydocs for common log methods 
> {code}
> |  grep_log(self, expr, filename='system.log', from_mark=None)
> |  Returns a list of lines matching the regular expression in parameter
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors(self, filename='system.log')
> |  Returns a list of errors with stack traces
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors_from(self, filename='system.log', seek_start=0)
> {code}
> {code}
> |  watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log')
> |  Watch the log until one or more (regular) expression are found.
> |  This methods when all the expressions have been found or the method
> |  timeouts (a TimeoutError is then raised). On successful completion,
> |  a list of pair (line matched, match object) is returned.
> {code}
> Below is a POC showing a way to do such logic
> {code}
> package org.apache.cassandra.distributed.test;
> import java.io.BufferedReader;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.io.UncheckedIOException;
> import java.nio.charset.StandardCharsets;
> import java.util.Iterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.regex.Matcher;
> import java.util.regex.Pattern;
> import java.util.stream.Stream;
> import java.util.stream.StreamSupport;
> import com.google.common.io.Closeables;
> import org.junit.Test;
> import org.apache.cassandra.distributed.Cluster;
> import org.apache.cassandra.utils.AbstractIterator;
> public class AllTheLogs extends TestBaseImpl
> {
>@Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>String log = String.format("build/test/logs/%s/TEST-%s.log", tag, 
> suite);
>grep(log, "Enqueuing flush of tables").forEach(l -> 
> System.out.println("I found the thing: " + l));
>}
>}
>private static Stream grep(String file, String regex) throws 
> IOException
>{
>return grep(file, Pattern.compile(regex));
>}
>private static Stream grep(String file, Pattern regex) throws 
> IOException
>{
>BufferedReader reader = new BufferedReader(new InputStreamReader(new 
> FileInputStream(file), StandardCharsets.UTF_8));
>Iterator it = new AbstractIterator()
>{
>protected String computeNext()
>{
>try
>{
>String s;
>while ((s = reader.readLine()) != null)
>{
>Matcher m = regex.matcher(s);
>   

[jira] [Commented] (CASSANDRA-16101) Make sure we don't throw any uncaught exceptions during in-jvm dtests

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208455#comment-17208455
 ] 

David Capwell commented on CASSANDRA-16101:
---

Committed, see 
https://issues.apache.org/jira/browse/CASSANDRA-16109?focusedCommentId=17208415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17208415

> Make sure we don't throw any uncaught exceptions during in-jvm dtests
> -
>
> Key: CASSANDRA-16101
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16101
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>  Labels: pull-request-available
>
> We should assert that we don't throw any uncaught exceptions when running 
> in-jvm dtests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16109) Don't adjust nodeCount when setting node id topology in in-jvm dtests

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208415#comment-17208415
 ] 

David Capwell edited comment on CASSANDRA-16109 at 10/6/20, 2:08 AM:
-

committed (upgrade tests use these branches since mixed version was failing); 
includes CASSANDRA-16109 & CASSANDRA-16101 ; 
https://github.com/apache/cassandra/commit/b3013a4ac5ee816cafe7492775126d1fa72ced75
CI results

2.2
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-2.2-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/64/

3.0
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-3.0-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/65/

3.11
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-3.11-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/66/

trunk
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-trunk-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/67/

CI results are Yellow with expected failures and flakey tests.


was (Author: dcapwell):
starting commit (upgrade tests use these branches since mixed version was 
failing); includes CASSANDRA-16109 & CASSANDRA-16101
CI results (pending)

2.2
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-2.2-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/64/

3.0
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-3.0-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/65/

3.11
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-3.11-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/66/

trunk
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-trunk-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/67/

> Don't adjust nodeCount when setting node id topology in in-jvm dtests
> -
>
> Key: CASSANDRA-16109
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16109
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
>  Labels: pull-request-available
>
> We update the node count when setting the node id topology in in-jvm dtests, 
> this should only happen if node count is smaller than the node id topology, 
> otherwise bootstrap tests error out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk

2020-10-05 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 21a363becc8e2a2c1b3042b9544f1c47606f5bab
Merge: 7a63cc2 b1efb8e
Author: David Capwell 
AuthorDate: Mon Oct 5 19:06:01 2020 -0700

Merge branch 'cassandra-3.11' into trunk

 build.xml  |  2 +-
 .../distributed/impl/AbstractCluster.java  | 37 --
 .../impl/DelegatingInvokableInstance.java  |  1 +
 .../cassandra/distributed/impl/Instance.java   |  3 +-
 .../cassandra/distributed/impl/InstanceConfig.java | 14 ++--
 .../ShutdownException.java}| 18 ---
 .../distributed/test/FailingRepairTest.java|  6 
 .../test/FullRepairCoordinatorFastTest.java|  1 +
 .../test/IncrementalRepairCoordinatorFastTest.java |  1 +
 .../distributed/test/NetworkTopologyTest.java  | 15 +
 .../test/PreviewRepairCoordinatorFastTest.java |  2 ++
 .../cassandra/distributed/test/StreamingTest.java  |  5 ++-
 .../cassandra/distributed/upgrade/UpgradeTest.java |  6 +++-
 13 files changed, 86 insertions(+), 25 deletions(-)

diff --cc build.xml
index 40729a6,82c35e9..6961cf9
--- a/build.xml
+++ b/build.xml
@@@ -582,14 -412,19 +582,14 @@@



 -  
 +  
  
 -  
 -
 -
 -  
 -  
 -  
 -   
 -  
 -  
 +  
 +  

 +  
 +  
-   
+   

   

diff --cc 
test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
index a28c935,a6a6336..b6f359a
--- 
a/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
+++ 
b/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
@@@ -31,9 -33,10 +33,11 @@@ import java.util.concurrent.Future
  import java.util.concurrent.TimeUnit;
  import java.util.concurrent.atomic.AtomicInteger;
  import java.util.function.BiConsumer;
+ import java.util.function.BiPredicate;
  import java.util.function.Consumer;
+ import java.util.function.Predicate;
  import java.util.stream.Collectors;
 +import java.util.stream.IntStream;
  import java.util.stream.Stream;
  
  import com.google.common.collect.Sets;
@@@ -61,9 -64,11 +65,10 @@@ import org.apache.cassandra.distributed
  import org.apache.cassandra.distributed.shared.InstanceClassLoader;
  import org.apache.cassandra.distributed.shared.MessageFilters;
  import org.apache.cassandra.distributed.shared.NetworkTopology;
+ import org.apache.cassandra.distributed.shared.ShutdownException;
  import org.apache.cassandra.distributed.shared.Versions;
  import org.apache.cassandra.io.util.FileUtils;
 -import org.apache.cassandra.net.MessagingService;
 +import org.apache.cassandra.net.Verb;
  import org.apache.cassandra.utils.FBUtilities;
  import org.apache.cassandra.utils.concurrent.SimpleCondition;
  
@@@ -119,30 -122,12 +124,33 @@@ public abstract class AbstractCluster<
  
  // mutated by user-facing API
  private final MessageFilters filters;
 +private final INodeProvisionStrategy.Strategy nodeProvisionStrategy;
  private final BiConsumer instanceInitializer;
+ private final int datadirCount;
 +private volatile Thread.UncaughtExceptionHandler previousHandler = null;
+ private volatile BiPredicate ignoreUncaughtThrowable 
= null;
+ private final List uncaughtExceptions = new 
CopyOnWriteArrayList<>();
  
 -private volatile Thread.UncaughtExceptionHandler previousHandler = null;
 +/**
 + * Common builder, add methods that are applicable to both Cluster and 
Upgradable cluster here.
 + */
 +public static abstract class AbstractBuilder>
 +extends org.apache.cassandra.distributed.shared.AbstractBuilder
 +{
 +private INodeProvisionStrategy.Strategy nodeProvisionStrategy = 
INodeProvisionStrategy.Strategy.MultipleNetworkInterfaces;
 +
 +public AbstractBuilder(Factory factory)
 +{
 +super(factory);
 +}
 +
 +public B withNodeProvisionStrategy(INodeProvisionStrategy.Strategy 
nodeProvisionStrategy)
 +{
 +this.nodeProvisionStrategy = nodeProvisionStrategy;
 +return (B) this;
 +}
 +}
 +
  
  protected class Wrapper extends DelegatingInvokableInstance implements 
IUpgradeableInstance
  {
@@@ -321,10 -299,14 +330,10 @@@
  
  private InstanceConfig createInstanceConfig(int nodeNum)
  {
 -String ipPrefix = "127.0." + subnet + ".";
 -String seedIp = ipPrefix + "1";
 -String ipAddress = ipPrefix + nodeNum;
 +INodeProvisionStrategy provisionStrategy = 
nodeProvisionStrategy.create(subnet);
  long token = tokenSupplier.token(nodeNum);
 -
 -NetworkTopology topology = 

[cassandra] 01/01: Merge branch 'cassandra-3.0' into cassandra-3.11

2020-10-05 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit b1efb8e4188eba5661ad7be7d915270fc191c2d5
Merge: 9bf1ab1 5a2b5a4
Author: David Capwell 
AuthorDate: Mon Oct 5 19:04:59 2020 -0700

Merge branch 'cassandra-3.0' into cassandra-3.11

 build.xml  |  2 +-
 .../distributed/impl/AbstractCluster.java  | 36 +-
 .../impl/DelegatingInvokableInstance.java  |  1 +
 .../cassandra/distributed/impl/Instance.java   |  3 +-
 .../cassandra/distributed/impl/InstanceConfig.java | 13 ++--
 .../distributed/shared/ShutdownException.java  | 30 ++
 .../distributed/test/NetworkTopologyTest.java  | 15 +
 7 files changed, 88 insertions(+), 12 deletions(-)

diff --cc test/distributed/org/apache/cassandra/distributed/impl/Instance.java
index b210b1d,cf66d8f..81c501c
--- a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
+++ b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
@@@ -267,8 -269,8 +267,7 @@@ public class Instance extends IsolatedE
  int fromNum = from.config().num();
  int toNum = config().num();
  
--
- IMessage msg = serializeMessage(message, id, 
from.broadcastAddress(), broadcastAddress());
+ IMessage msg = serializeMessage(message, id, 
from.config().broadcastAddress(), broadcastAddress());
  
  return cluster.filters().permitInbound(fromNum, toNum, msg);
  }
diff --cc 
test/distributed/org/apache/cassandra/distributed/impl/InstanceConfig.java
index 212fcc4,ea8d0f8..1f39f76
--- a/test/distributed/org/apache/cassandra/distributed/impl/InstanceConfig.java
+++ b/test/distributed/org/apache/cassandra/distributed/impl/InstanceConfig.java
@@@ -271,10 -271,10 +271,10 @@@ public class InstanceConfig implements 
ipAddress,
seedIp,
String.format("%s/node%d/saved_caches", 
root, nodeNum),
-   new String[] { 
String.format("%s/node%d/data", root, nodeNum) },
+   datadirs(datadirCount, root, nodeNum),
String.format("%s/node%d/commitlog", root, 
nodeNum),
String.format("%s/node%d/hints", root, 
nodeNum),
 -//  String.format("%s/node%d/cdc", root, 
nodeNum),
 +  String.format("%s/node%d/cdc", root, 
nodeNum),
token);
  }
  


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-2.2 updated: Don't adjust nodeCount when setting node id topology in in-jvm dtests. Make sure we don't throw any uncaught exceptions during in-jvm dtests.

2020-10-05 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch cassandra-2.2
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-2.2 by this push:
 new 4d173e0  Don't adjust nodeCount when setting node id topology in 
in-jvm dtests. Make sure we don't throw any uncaught exceptions during in-jvm 
dtests.
4d173e0 is described below

commit 4d173e0a3f97b68b2ce0fb72befe2912efd31102
Author: Marcus Eriksson 
AuthorDate: Mon Oct 5 16:25:56 2020 -0700

Don't adjust nodeCount when setting node id topology in in-jvm dtests.
Make sure we don't throw any uncaught exceptions during in-jvm dtests.

patch by Marcus Eriksson; reviewed by Alex Petrov, David Capwell for 
CASSANDRA-16109,CASSANDRA-16101
---
 build.xml  |  2 +-
 .../distributed/impl/AbstractCluster.java  | 36 +-
 .../impl/DelegatingInvokableInstance.java  |  1 +
 .../cassandra/distributed/impl/Instance.java   |  3 +-
 .../cassandra/distributed/impl/InstanceConfig.java | 13 ++--
 .../distributed/shared/ShutdownException.java  | 30 ++
 .../distributed/test/NetworkTopologyTest.java  | 15 +
 7 files changed, 89 insertions(+), 11 deletions(-)

diff --git a/build.xml b/build.xml
index 693cc8f..d003edf 100644
--- a/build.xml
+++ b/build.xml
@@ -396,7 +396,7 @@
   
   
   
-  
+  
   
  
   
diff --git 
a/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java 
b/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
index 0085f1c..9793add 100644
--- 
a/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
+++ 
b/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
@@ -22,16 +22,20 @@ import java.io.File;
 import java.net.InetSocketAddress;
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collection;
 import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.concurrent.CopyOnWriteArrayList;
 import java.util.concurrent.Future;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.function.BiConsumer;
+import java.util.function.BiPredicate;
 import java.util.function.Consumer;
+import java.util.function.Predicate;
 import java.util.stream.Collectors;
 import java.util.stream.Stream;
 
@@ -61,6 +65,7 @@ import 
org.apache.cassandra.distributed.shared.AbstractBuilder;
 import org.apache.cassandra.distributed.shared.InstanceClassLoader;
 import org.apache.cassandra.distributed.shared.MessageFilters;
 import org.apache.cassandra.distributed.shared.NetworkTopology;
+import org.apache.cassandra.distributed.shared.ShutdownException;
 import org.apache.cassandra.distributed.shared.Versions;
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.net.MessagingService;
@@ -118,6 +123,9 @@ public abstract class AbstractCluster 
implements ICluster instanceInitializer;
+private final int datadirCount;
+private volatile BiPredicate ignoreUncaughtThrowable = 
null;
+private final List uncaughtExceptions = new 
CopyOnWriteArrayList<>();
 
 private volatile Thread.UncaughtExceptionHandler previousHandler = null;
 
@@ -267,6 +275,7 @@ public abstract class AbstractCluster 
implements ICluster 
implements ICluster 
implements ICluster ignore = ignoreUncaughtThrowable;
+I instance = get(cl.getInstanceId());
+if ((ignore == null || !ignore.test(cl.getInstanceId(), error)) && 
instance != null && !instance.isShutdown())
+uncaughtExceptions.add(error);
+}
+
+@Override
+public void setUncaughtExceptionsFilter(BiPredicate 
ignoreUncaughtThrowable)
+{
+this.ignoreUncaughtThrowable = ignoreUncaughtThrowable;
 }
 
 @Override
@@ -630,10 +651,23 @@ public abstract class AbstractCluster implements ICluster drain = new ArrayList<>(uncaughtExceptions.size());
+uncaughtExceptions.removeIf(e -> {
+drain.add(e);
+return true;
+});
+if (!drain.isEmpty())
+throw new ShutdownException(drain);
+}
+
 // We do not want this check to run every time until we fix problems with 
tread stops
 private void withThreadLeakCheck(List> futures)
 {
diff --git 
a/test/distributed/org/apache/cassandra/distributed/impl/DelegatingInvokableInstance.java
 
b/test/distributed/org/apache/cassandra/distributed/impl/DelegatingInvokableInstance.java
index 690e503..262da7a 100644
--- 
a/test/distributed/org/apache/cassandra/distributed/impl/DelegatingInvokableInstance.java
+++ 
b/test/distributed/org/apache/cassandra/distributed/impl/DelegatingInvokableInstance.java
@@ -20,6 +20,7 @@ package 

[cassandra] branch cassandra-3.11 updated (9bf1ab1 -> b1efb8e)

2020-10-05 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 9bf1ab1  Merge branch 'cassandra-3.0' into cassandra-3.11
 new 4d173e0  Don't adjust nodeCount when setting node id topology in 
in-jvm dtests. Make sure we don't throw any uncaught exceptions during in-jvm 
dtests.
 new 5a2b5a4  Merge branch 'cassandra-2.2' into cassandra-3.0
 new b1efb8e  Merge branch 'cassandra-3.0' into cassandra-3.11

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 build.xml  |  2 +-
 .../distributed/impl/AbstractCluster.java  | 36 +-
 .../impl/DelegatingInvokableInstance.java  |  1 +
 .../cassandra/distributed/impl/Instance.java   |  3 +-
 .../cassandra/distributed/impl/InstanceConfig.java | 13 ++--
 .../distributed/shared/ShutdownException.java  | 19 +---
 .../distributed/test/NetworkTopologyTest.java  | 15 +
 7 files changed, 65 insertions(+), 24 deletions(-)
 copy src/java/org/apache/cassandra/repair/RepairResult.java => 
test/distributed/org/apache/cassandra/distributed/shared/ShutdownException.java 
(74%)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated (7a63cc2 -> 21a363b)

2020-10-05 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 7a63cc2  Merge branch 'cassandra-3.11' into trunk
 new 4d173e0  Don't adjust nodeCount when setting node id topology in 
in-jvm dtests. Make sure we don't throw any uncaught exceptions during in-jvm 
dtests.
 new 5a2b5a4  Merge branch 'cassandra-2.2' into cassandra-3.0
 new b1efb8e  Merge branch 'cassandra-3.0' into cassandra-3.11
 new 21a363b  Merge branch 'cassandra-3.11' into trunk

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 build.xml  |  2 +-
 .../distributed/impl/AbstractCluster.java  | 37 --
 .../impl/DelegatingInvokableInstance.java  |  1 +
 .../cassandra/distributed/impl/Instance.java   |  3 +-
 .../cassandra/distributed/impl/InstanceConfig.java | 14 ++--
 .../{RepairResult.java => ShutdownException.java}  | 15 -
 .../distributed/test/FailingRepairTest.java|  6 
 .../test/FullRepairCoordinatorFastTest.java|  1 +
 .../test/IncrementalRepairCoordinatorFastTest.java |  1 +
 .../distributed/test/NetworkTopologyTest.java  | 15 +
 .../test/PreviewRepairCoordinatorFastTest.java |  2 ++
 .../cassandra/distributed/test/StreamingTest.java  |  5 ++-
 .../cassandra/distributed/upgrade/UpgradeTest.java |  6 +++-
 13 files changed, 86 insertions(+), 22 deletions(-)
 copy 
test/distributed/org/apache/cassandra/distributed/shared/{RepairResult.java => 
ShutdownException.java} (76%)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-2.2' into cassandra-3.0

2020-10-05 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 5a2b5a4e5eb768b459e3795b6f814baefad9c0f1
Merge: 31b9078 4d173e0
Author: David Capwell 
AuthorDate: Mon Oct 5 19:03:32 2020 -0700

Merge branch 'cassandra-2.2' into cassandra-3.0

 build.xml  |  2 +-
 .../distributed/impl/AbstractCluster.java  | 36 +-
 .../impl/DelegatingInvokableInstance.java  |  1 +
 .../cassandra/distributed/impl/Instance.java   |  2 +-
 .../cassandra/distributed/impl/InstanceConfig.java | 13 ++--
 .../distributed/shared/ShutdownException.java  | 30 ++
 .../distributed/test/NetworkTopologyTest.java  | 15 +
 7 files changed, 88 insertions(+), 11 deletions(-)

diff --cc 
test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
index 3cb8dac,9793add..4b880e4
--- 
a/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
+++ 
b/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java
@@@ -610,10 -619,22 +619,22 @@@ public abstract class AbstractCluster<
  handler.uncaughtException(thread, error);
  return;
  }
+ 
  InstanceClassLoader cl = (InstanceClassLoader) 
thread.getContextClassLoader();
  get(cl.getInstanceId()).uncaughtException(thread, error);
+ 
+ BiPredicate ignore = ignoreUncaughtThrowable;
+ I instance = get(cl.getInstanceId());
+ if ((ignore == null || !ignore.test(cl.getInstanceId(), error)) && 
instance != null && !instance.isShutdown())
+ uncaughtExceptions.add(error);
+ }
+ 
+ @Override
+ public void setUncaughtExceptionsFilter(BiPredicate 
ignoreUncaughtThrowable)
+ {
+ this.ignoreUncaughtThrowable = ignoreUncaughtThrowable;
  }
 -
 +
  @Override
  public void close()
  {
diff --cc 
test/distributed/org/apache/cassandra/distributed/impl/InstanceConfig.java
index cfdcc80,4e8a782..ea8d0f8
--- a/test/distributed/org/apache/cassandra/distributed/impl/InstanceConfig.java
+++ b/test/distributed/org/apache/cassandra/distributed/impl/InstanceConfig.java
@@@ -271,9 -271,9 +271,9 @@@ public class InstanceConfig implements 
ipAddress,
seedIp,
String.format("%s/node%d/saved_caches", 
root, nodeNum),
-   new String[] { 
String.format("%s/node%d/data", root, nodeNum) },
+   datadirs(datadirCount, root, nodeNum),
String.format("%s/node%d/commitlog", root, 
nodeNum),
 -//  String.format("%s/node%d/hints", root, 
nodeNum),
 +  String.format("%s/node%d/hints", root, 
nodeNum),
  //  String.format("%s/node%d/cdc", root, 
nodeNum),
token);
  }


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.0 updated (31b9078 -> 5a2b5a4)

2020-10-05 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a change to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 31b9078  Handle unexpected columns due to schema races
 new 4d173e0  Don't adjust nodeCount when setting node id topology in 
in-jvm dtests. Make sure we don't throw any uncaught exceptions during in-jvm 
dtests.
 new 5a2b5a4  Merge branch 'cassandra-2.2' into cassandra-3.0

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 build.xml  |  2 +-
 .../distributed/impl/AbstractCluster.java  | 36 +-
 .../impl/DelegatingInvokableInstance.java  |  1 +
 .../cassandra/distributed/impl/Instance.java   |  2 +-
 .../cassandra/distributed/impl/InstanceConfig.java | 13 ++--
 .../distributed/shared/ShutdownException.java  | 19 +---
 .../distributed/test/NetworkTopologyTest.java  | 15 +
 7 files changed, 65 insertions(+), 23 deletions(-)
 copy src/java/org/apache/cassandra/repair/RepairResult.java => 
test/distributed/org/apache/cassandra/distributed/shared/ShutdownException.java 
(74%)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16109) Don't adjust nodeCount when setting node id topology in in-jvm dtests

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208415#comment-17208415
 ] 

David Capwell commented on CASSANDRA-16109:
---

starting commit (upgrade tests use these branches since mixed version was 
failing); includes CASSANDRA-16109 & CASSANDRA-16101
CI results (pending)

2.2
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-2.2-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/64/

3.0
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-3.0-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/65/

3.11
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-cassandra-3.11-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/66/

trunk
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16109,CASSANDRA-16101-trunk-6D5320BB-90DB-484A-85D6-353EB834A723
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/67/

> Don't adjust nodeCount when setting node id topology in in-jvm dtests
> -
>
> Key: CASSANDRA-16109
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16109
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
>  Labels: pull-request-available
>
> We update the node count when setting the node id topology in in-jvm dtests, 
> this should only happen if node count is smaller than the node id topology, 
> otherwise bootstrap tests error out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16159) Reduce the Severity of Errors Reported in FailureDetector#isAlive()

2020-10-05 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208407#comment-17208407
 ] 

Caleb Rackliffe commented on CASSANDRA-16159:
-

Note: If we're going to bypass the {{isAlive()}} check for quarantined nodes 
(since they're trivially not alive in that case), we should probably do it in 
`StorageService#handleStateBootreplacing()` to avoid the extra overhead for 
other callers.

> Reduce the Severity of Errors Reported in FailureDetector#isAlive()
> ---
>
> Key: CASSANDRA-16159
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16159
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Caleb Rackliffe
>Assignee: Uchenna
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Noticed the following error in the failure detector during a host replacement:
> {noformat}
> java.lang.IllegalArgumentException: Unknown endpoint: 10.38.178.98:7000
>   at 
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:281)
>   at 
> org.apache.cassandra.service.StorageService.handleStateBootreplacing(StorageService.java:2502)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2182)
>   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:3145)
>   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1242)
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1368)
>   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
>   at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
>   at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
>   at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
>   at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:884)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> {noformat}
> This particular error looks benign, given that even if it occurs, the node 
> continues to handle the {{BOOT_REPLACE}} state. There are two things we might 
> be able to do to improve {{FailureDetector#isAlive()}} though:
> 1.) We don’t short circuit in the case that the endpoint in question is in 
> quarantine after being removed. It may be useful to check for this so we can 
> avoid logging an ERROR when the endpoint is clearly doomed/dead. (Quarantine 
> works great when the gossip message is _from_ a quarantined endpoint, but in 
> this case, that would be the new/replacing and not the old/replaced one.)
> 2.) We can reduce the severity of the logging from ERROR to WARN and provide 
> better context around how to determine whether or not there’s actually a 
> problem. (ex. “If this occurs while trying to determine liveness for a node 
> that is currently being replaced, it can be safely ignored.”)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16159) Reduce the Severity of Errors Reported in FailureDetector#isAlive()

2020-10-05 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208407#comment-17208407
 ] 

Caleb Rackliffe edited comment on CASSANDRA-16159 at 10/5/20, 11:13 PM:


Note: If we're going to bypass the {{isAlive()}} check for quarantined nodes 
(since they're trivially not alive in that case), we should probably do it in 
{{StorageService#handleStateBootreplacing()}} to avoid the extra overhead for 
other callers.


was (Author: maedhroz):
Note: If we're going to bypass the {{isAlive()}} check for quarantined nodes 
(since they're trivially not alive in that case), we should probably do it in 
`StorageService#handleStateBootreplacing()` to avoid the extra overhead for 
other callers.

> Reduce the Severity of Errors Reported in FailureDetector#isAlive()
> ---
>
> Key: CASSANDRA-16159
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16159
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Caleb Rackliffe
>Assignee: Uchenna
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Noticed the following error in the failure detector during a host replacement:
> {noformat}
> java.lang.IllegalArgumentException: Unknown endpoint: 10.38.178.98:7000
>   at 
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:281)
>   at 
> org.apache.cassandra.service.StorageService.handleStateBootreplacing(StorageService.java:2502)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2182)
>   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:3145)
>   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1242)
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1368)
>   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
>   at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
>   at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
>   at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
>   at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:884)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> {noformat}
> This particular error looks benign, given that even if it occurs, the node 
> continues to handle the {{BOOT_REPLACE}} state. There are two things we might 
> be able to do to improve {{FailureDetector#isAlive()}} though:
> 1.) We don’t short circuit in the case that the endpoint in question is in 
> quarantine after being removed. It may be useful to check for this so we can 
> avoid logging an ERROR when the endpoint is clearly doomed/dead. (Quarantine 
> works great when the gossip message is _from_ a quarantined endpoint, but in 
> this case, that would be the new/replacing and not the old/replaced one.)
> 2.) We can reduce the severity of the logging from ERROR to WARN and provide 
> better context around how to determine whether or not there’s actually a 
> problem. (ex. “If this occurs while trying to determine liveness for a node 
> that is currently being replaced, it can be safely ignored.”)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208394#comment-17208394
 ] 

Benedict Elliott Smith commented on CASSANDRA-16182:


I agree with Brandon that the situation occurring at all is bad, i.e. the 
operator should not attempt to replace C until it has isolated it from the 
cluster. As ownership works today, this split brain introduces a risk of 
consistency violations.

That said, I'm unconvinced there is much to be gained from refusing to process 
the replacement by A and B. C' has already unilaterally announced its status as 
the new owner, and this will eventually come to pass in some places in the 
cluster. Even if C was never intended to be replaced, at this point the cluster 
will enter a split brain until C is taken down or C' is assassinated, since 
other nodes presumably also were unaware of C being alive else C' would not 
witness it as down. If instead C's self-nomination wins on other nodes, the 
situation resolves eventually without operator input. Either approach 
introduces potential consistency violations, but the sooner the inconsistency 
resolves the smaller the window for problems.

However, even if this isn't preferred, at the very least nodes should apply the 
C' state once C is taken offline, after confirming it is still valid.

 

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13701) Lower default num_tokens

2020-10-05 Thread Jeremy Hanna (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-13701:
-
Fix Version/s: (was: 4.0-triage)

> Lower default num_tokens
> 
>
> Key: CASSANDRA-13701
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13701
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Chris Lohfink
>Assignee: Alexander Dejanovski
>Priority: Low
> Fix For: 4.0-alpha
>
>
> For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not 
> necessary. It is very expensive for operations processes and scanning. Its 
> come up a lot and its pretty standard and known now to always reduce the 
> num_tokens within the community. We should just lower the defaults.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208375#comment-17208375
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

bq.  I kind of agree that this seems hacky to increment generation # for this 
purpose,

Definitely.

bq. Given that this node C' makes itself available for reads worries me of the 
consequences

Only if clients explicitly connect to it (they won't be notified about it) and 
read at [LOCAL_]ONE.  But that's the contract accepted at ONE anyway.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208370#comment-17208370
 ] 

Sumanth Pasupuleti edited comment on CASSANDRA-16182 at 10/5/20, 10:17 PM:
---

Thanks for the clarification [~brandon.williams]. I should have been clearer; I 
meant increasing the generation # to "force update" the already communicated 
state. I kind of agree that this seems hacky to increment generation # for this 
purpose, but I was also thinking this is a potentially rare/isolated scenario 
and given that this scenario can be detected deterministically (by hearing 
about a node that owns the same token through gossip as yourself, and a node 
that you cannot reach). 

Given that this node C' makes itself available for reads worries me of the 
consequences, and makes me think we should attempt to make the cluster self 
heal from this situation, so long as there are no dire consequences of 
incrementing generation # (other than forcing peers to update their in memory 
state)




was (Author: sumanth.pasupuleti):
Thanks for the clarification [~brandon.williams]. I should have been clearer; I 
meant increasing the generation # to "force update" the already communicated 
state. I kind of agree that this seems hacky to increment generation # for this 
purpose, but I was also thinking this is a potentially rare/isolated scenario 
and given that this scenario can be detected deterministically (by hearing 
about a node that owns the same token through gossip as yourself, and a node 
that you cannot reach). 
Given that this node C' makes itself available for reads worries me of the 
consequences, and makes me think we should attempt to make the cluster self 
heal from this situation.



> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was 

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208370#comment-17208370
 ] 

Sumanth Pasupuleti commented on CASSANDRA-16182:


Thanks for the clarification [~brandon.williams]. I should have been clearer; I 
meant increasing the generation # to "force update" the already communicated 
state. I kind of agree that this seems hacky to increment generation # for this 
purpose, but I was also thinking this is a potentially rare/isolated scenario 
and given that this scenario can be detected deterministically (by hearing 
about a node that owns the same token through gossip as yourself, and a node 
that you cannot reach). 
Given that this node C' makes itself available for reads worries me of the 
consequences, and makes me think we should attempt to make the cluster self 
heal from this situation.



> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16120) Add ability for jvm-dtest to grep instance logs

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208365#comment-17208365
 ] 

David Capwell commented on CASSANDRA-16120:
---

Good feedback, I can add some identifies to make each cluster unique even after 
JVM restart.

> Add ability for jvm-dtest to grep instance logs
> ---
>
> Key: CASSANDRA-16120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16120
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> One of the main gaps between python dtest and jvm dtest is python dtest 
> supports the ability to grep the logs of an instance; we need this capability 
> as some tests require validating logs were triggered.
> Pydocs for common log methods 
> {code}
> |  grep_log(self, expr, filename='system.log', from_mark=None)
> |  Returns a list of lines matching the regular expression in parameter
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors(self, filename='system.log')
> |  Returns a list of errors with stack traces
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors_from(self, filename='system.log', seek_start=0)
> {code}
> {code}
> |  watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log')
> |  Watch the log until one or more (regular) expression are found.
> |  This methods when all the expressions have been found or the method
> |  timeouts (a TimeoutError is then raised). On successful completion,
> |  a list of pair (line matched, match object) is returned.
> {code}
> Below is a POC showing a way to do such logic
> {code}
> package org.apache.cassandra.distributed.test;
> import java.io.BufferedReader;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.io.UncheckedIOException;
> import java.nio.charset.StandardCharsets;
> import java.util.Iterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.regex.Matcher;
> import java.util.regex.Pattern;
> import java.util.stream.Stream;
> import java.util.stream.StreamSupport;
> import com.google.common.io.Closeables;
> import org.junit.Test;
> import org.apache.cassandra.distributed.Cluster;
> import org.apache.cassandra.utils.AbstractIterator;
> public class AllTheLogs extends TestBaseImpl
> {
>@Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>String log = String.format("build/test/logs/%s/TEST-%s.log", tag, 
> suite);
>grep(log, "Enqueuing flush of tables").forEach(l -> 
> System.out.println("I found the thing: " + l));
>}
>}
>private static Stream grep(String file, String regex) throws 
> IOException
>{
>return grep(file, Pattern.compile(regex));
>}
>private static Stream grep(String file, Pattern regex) throws 
> IOException
>{
>BufferedReader reader = new BufferedReader(new InputStreamReader(new 
> FileInputStream(file), StandardCharsets.UTF_8));
>Iterator it = new AbstractIterator()
>{
>protected String computeNext()
>{
>try
>{
>String s;
>while ((s = reader.readLine()) != null)
>{
>Matcher m = regex.matcher(s);
>if (m.find())
>return s;
>}
>reader.close();
>return endOfData();
>}
>catch (IOException e)
>{
>Closeables.closeQuietly(reader);
>throw new UncheckedIOException(e);
>}
>}
>};
>return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, 
> Spliterator.ORDERED), false);
>}
> }
> {code}
> And
> {code}
> @Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>//TODO missing way to get node id
> //   

[jira] [Commented] (CASSANDRA-16152) In-JVM dtest - modify schema with stopped nodes and use yaml fragments for config

2020-10-05 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208362#comment-17208362
 ] 

Yifan Cai commented on CASSANDRA-16152:
---

Overall LGTM. 

2 nits
* the ymal deserializing code can be reused in the {{YamlConfigurationLoader}}. 
I added this 
[commit|https://github.com/yifan-c/cassandra/commit/2b97ddfe5afd363aede1e48f75ee7d614a3d289c].
 Please see if you like it. 
* add a simple test to show the ignore stopped instance is working just like 
what you did for {{useYamlFragmentInConfigTest}}

The dtest failures in the CI do not look related to the patch. The patch mainly 
enhances the JVM Dtest. 

> In-JVM dtest - modify schema with stopped nodes and use yaml fragments for 
> config
> -
>
> Key: CASSANDRA-16152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16152
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> Some convenience improvements to in-JVM dtest that are useful across versions 
> that I needed while working on CASSANDRA-16144
> * Add support for changing schema with stopped nodes.
> * Make it simpler to modify nested configuration items by specifying Yaml 
> fragments 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15899) Dropping a column can break queries until the schema is fully propagated

2020-10-05 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208339#comment-17208339
 ] 

Blake Eggleston commented on CASSANDRA-15899:
-

committed to 3.0 and merged up. I can't mark the ticked as committed until 
INFRA-20942 is resolved, but in the meantime, here's the github link: 
https://github.com/apache/cassandra/commit/31b9078a691a6f93b104cc6b3f72fe2fbf6557f6

> Dropping a column can break queries until the schema is fully propagated
> 
>
> Key: CASSANDRA-15899
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15899
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Marcus Eriksson
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 3.0.x
>
>
> With a table like:
> {code}
> CREATE TABLE ks.tbl (id int primary key, v1 int, v2 int, v3 int)
> {code}
> and we drop {{v2}}, we get this exception on the replicas which haven't seen 
> the schema change:
> {code}
> ERROR [SharedPool-Worker-1] node2 2020-06-24 09:49:08,107 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,node2]
> java.lang.IllegalStateException: [ColumnDefinition{name=v1, 
> type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, position=-1}, 
> ColumnDefinition{name=v2, type=org.apache.cassandra.db.marshal.Int32Type, 
> kind=REGULAR, position=-1}, ColumnDefinition{name=v3, 
> type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, position=-1}] 
> is not a subset of [v1 v3]
>   at 
> org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:546) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:478) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:184)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:114)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:102)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>  ~[main/:na]
> ...
> {code}
> Note that it doesn't matter if we {{SELECT *}} or {{SELECT id, v1}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk

2020-10-05 Thread bdeggleston
This is an automated email from the ASF dual-hosted git repository.

bdeggleston pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 7a63cc2c3319994dd1c6197a61553ab339c238d0
Merge: ba63fa3 9bf1ab1
Author: Blake Eggleston 
AuthorDate: Mon Oct 5 14:21:08 2020 -0700

Merge branch 'cassandra-3.11' into trunk

 CHANGES.txt|   1 +
 src/java/org/apache/cassandra/db/Columns.java  |  19 +++-
 .../apache/cassandra/db/SerializationHeader.java   |   4 +-
 .../apache/cassandra/db/filter/ColumnFilter.java   |   8 +-
 .../cassandra/db/partitions/PartitionUpdate.java   |   9 ++
 .../cassandra/db/rows/SerializationHelper.java |  12 +++
 .../cassandra/db/rows/UnfilteredSerializer.java|  23 ++--
 .../apache/cassandra/schema/ColumnMetadata.java|  25 -
 .../utils/btree/LeafBTreeSearchIterator.java   |   2 +-
 .../cassandra/distributed/test/SchemaTest.java | 117 +
 .../distributed/test/SimpleReadWriteTest.java  |  91 +---
 test/unit/org/apache/cassandra/db/ColumnsTest.java |   2 +-
 12 files changed, 279 insertions(+), 34 deletions(-)

diff --cc CHANGES.txt
index f576dbf,99369fa..a990fb0
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,58 -1,18 +1,59 @@@
 -3.11.9
 +4.0-beta3
 + * Use unsigned short in ValueAccessor.sliceWithShortLength (CASSANDRA-16147)
 + * Abort repairs when getting a truncation request (CASSANDRA-15854)
 + * Remove bad assert when getting active compactions for an sstable 
(CASSANDRA-15457)
 + * Avoid failing compactions with very large partitions (CASSANDRA-15164)
 + * Prevent NPE in StreamMessage in type lookup (CASSANDRA-16131)
 + * Avoid invalid state transition exception during incremental repair 
(CASSANDRA-16067)
 + * Allow zero padding in timestamp serialization (CASSANDRA-16105)
 + * Add byte array backed cells (CASSANDRA-15393)
 + * Correctly handle pending ranges with adjacent range movements 
(CASSANDRA-14801)
 + * Avoid adding locahost when streaming trivial ranges (CASSANDRA-16099)
 + * Add nodetool getfullquerylog (CASSANDRA-15988)
 + * Fix yaml format and alignment in tpstats (CASSANDRA-11402)
 + * Avoid trying to keep track of RTs for endpoints we won't write to during 
read repair (CASSANDRA-16084)
 + * When compaction gets interrupted, the exception should include the 
compactionId (CASSANDRA-15954)
 + * Make Table/Keyspace Metric Names Consistent With Each Other 
(CASSANDRA-15909)
 + * Mutating sstable component may race with entire-sstable-streaming(ZCS) 
causing checksum validation failure (CASSANDRA-15861)
 + * NPE thrown while updating speculative execution time if keyspace is 
removed during task execution (CASSANDRA-15949)
 + * Show the progress of data streaming and index build (CASSANDRA-15406)
 + * Add flag to disable chunk cache and disable by default (CASSANDRA-16036)
 +Merged from 3.11:
   * Fix memory leak in CompressedChunkReader (CASSANDRA-15880)
   * Don't attempt value skipping with mixed version cluster (CASSANDRA-15833)
 - * Avoid failing compactions with very large partitions (CASSANDRA-15164)
 + * Use IF NOT EXISTS for index and UDT create statements in snapshot schema 
files (CASSANDRA-13935)
   * Make sure LCS handles duplicate sstable added/removed notifications 
correctly (CASSANDRA-14103)
  Merged from 3.0:
+  * Handle unexpected columns due to schema races (CASSANDRA-15899)
   * Add flag to ignore unreplicated keyspaces during repair (CASSANDRA-15160)
  
 -3.11.8
 +4.0-beta2
 + * Add addition incremental repair visibility to nodetool repair_admin 
(CASSANDRA-14939)
 + * Always access system properties and environment variables via the new 
CassandraRelevantProperties and CassandraRelevantEnv classes (CASSANDRA-15876)
 + * Remove deprecated HintedHandOffManager (CASSANDRA-15939)
 + * Prevent repair from overrunning compaction (CASSANDRA-15817)
 + * fix cqlsh COPY functions in Python 3.8 on Mac (CASSANDRA-16053)
 + * Strip comment blocks from cqlsh input before processing statements 
(CASSANDRA-15802)
 + * Fix unicode chars error input (CASSANDRA-15990)
 + * Improved testability for CacheMetrics and ChunkCacheMetrics 
(CASSANDRA-15788)
 + * Handle errors in StreamSession#prepare (CASSANDRA-15852)
 + * FQL replay should have options to ignore DDL statements (CASSANDRA-16039)
 + * Remove COMPACT STORAGE internals (CASSANDRA-13994)
 + * Make TimestampSerializer accept fractional seconds of varying precision 
(CASSANDRA-15976)
 + * Improve cassandra-stress logging when using a profile file that doesn't 
exist (CASSANDRA-14425)
 + * Improve logging for socket connection/disconnection (CASSANDRA-15980)
 + * Throw FSWriteError upon write failures in order to apply DiskFailurePolicy 
(CASSANDRA-15928)
 + * Forbid altering UDTs used in partition keys (CASSANDRA-15933)
 + * Fix version parsing logic when upgrading from 3.0 (CASSANDRA-15973)
 + * Optimize NoSpamLogger use in hot paths (CASSANDRA-15766)
 + * Verify sstable 

[cassandra] branch trunk updated (ba63fa3 -> 7a63cc2)

2020-10-05 Thread bdeggleston
This is an automated email from the ASF dual-hosted git repository.

bdeggleston pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from ba63fa3  Fix flaky test ConnectionTest.testMessagePurging
 new 31b9078  Handle unexpected columns due to schema races
 new 9bf1ab1  Merge branch 'cassandra-3.0' into cassandra-3.11
 new 7a63cc2  Merge branch 'cassandra-3.11' into trunk

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt|   1 +
 src/java/org/apache/cassandra/db/Columns.java  |  19 +++-
 .../apache/cassandra/db/SerializationHeader.java   |   4 +-
 .../apache/cassandra/db/filter/ColumnFilter.java   |   8 +-
 .../cassandra/db/partitions/PartitionUpdate.java   |   9 ++
 .../cassandra/db/rows/SerializationHelper.java |  12 +++
 .../cassandra/db/rows/UnfilteredSerializer.java|  23 ++--
 .../apache/cassandra/schema/ColumnMetadata.java|  25 -
 .../utils/btree/LeafBTreeSearchIterator.java   |   2 +-
 .../cassandra/distributed/test/SchemaTest.java | 117 +
 .../distributed/test/SimpleReadWriteTest.java  |  91 +---
 test/unit/org/apache/cassandra/db/ColumnsTest.java |   2 +-
 12 files changed, 279 insertions(+), 34 deletions(-)
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/SchemaTest.java


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.0 updated: Handle unexpected columns due to schema races

2020-10-05 Thread bdeggleston
This is an automated email from the ASF dual-hosted git repository.

bdeggleston pushed a commit to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-3.0 by this push:
 new 31b9078  Handle unexpected columns due to schema races
31b9078 is described below

commit 31b9078a691a6f93b104cc6b3f72fe2fbf6557f6
Author: Blake Eggleston 
AuthorDate: Mon Oct 5 14:17:38 2020 -0700

Handle unexpected columns due to schema races

Patch by Blake Eggleston; Reviewed by Sam Tunnicliffe for CASSANDRA-15899
---
 CHANGES.txt|   1 +
 .../apache/cassandra/config/ColumnDefinition.java  |  23 
 src/java/org/apache/cassandra/db/Columns.java  |  19 +++-
 .../apache/cassandra/db/SerializationHeader.java   |  17 ++-
 .../cassandra/db/UnknownColumnException.java   |  12 ++-
 .../apache/cassandra/db/filter/ColumnFilter.java   |   8 +-
 .../cassandra/db/partitions/PartitionUpdate.java   |   7 ++
 .../cassandra/db/rows/UnfilteredSerializer.java|  19 +++-
 .../cassandra/distributed/test/SchemaTest.java | 117 +
 .../distributed/test/SimpleReadWriteTest.java  |  86 ---
 test/unit/org/apache/cassandra/db/ColumnsTest.java |   2 +-
 11 files changed, 278 insertions(+), 33 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 5f326ce..1ea5184 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.23:
+ * Handle unexpected columns due to schema races (CASSANDRA-15899)
  * Avoid failing compactions with very large partitions (CASSANDRA-15164)
  * Use IF NOT EXISTS for index and UDT create statements in snapshot schema 
files (CASSANDRA-13935)
  * Add flag to ignore unreplicated keyspaces during repair (CASSANDRA-15160)
diff --git a/src/java/org/apache/cassandra/config/ColumnDefinition.java 
b/src/java/org/apache/cassandra/config/ColumnDefinition.java
index 6f7f749..93c89b5 100644
--- a/src/java/org/apache/cassandra/config/ColumnDefinition.java
+++ b/src/java/org/apache/cassandra/config/ColumnDefinition.java
@@ -190,6 +190,29 @@ public class ColumnDefinition extends ColumnSpecification 
implements Comparable<
 };
 }
 
+private static class Placeholder extends ColumnDefinition
+{
+Placeholder(CFMetaData table, ByteBuffer name, AbstractType type, 
int position, Kind kind)
+{
+super(table, name, type, position, kind);
+}
+
+public boolean isPlaceholder()
+{
+return true;
+}
+}
+
+public static ColumnDefinition placeholder(CFMetaData table, ByteBuffer 
name, boolean isStatic)
+{
+return new Placeholder(table, name, EmptyType.instance, NO_POSITION, 
isStatic ? Kind.STATIC : Kind.REGULAR);
+}
+
+public boolean isPlaceholder()
+{
+return false;
+}
+
 public ColumnDefinition copy()
 {
 return new ColumnDefinition(ksName, cfName, name, type, position, 
kind);
diff --git a/src/java/org/apache/cassandra/db/Columns.java 
b/src/java/org/apache/cassandra/db/Columns.java
index 18e17d7..ef32fe0 100644
--- a/src/java/org/apache/cassandra/db/Columns.java
+++ b/src/java/org/apache/cassandra/db/Columns.java
@@ -425,7 +425,7 @@ public class Columns extends 
AbstractCollection implements Col
 return size;
 }
 
-public Columns deserialize(DataInputPlus in, CFMetaData metadata) 
throws IOException
+public Columns deserialize(DataInputPlus in, CFMetaData metadata, 
boolean isStatic) throws IOException
 {
 int length = (int)in.readUnsignedVInt();
 BTree.Builder builder = 
BTree.builder(Comparator.naturalOrder());
@@ -441,14 +441,29 @@ public class Columns extends 
AbstractCollection implements Col
 // fail deserialization because of that. So we grab a 
"fake" ColumnDefinition that ensure proper
 // deserialization. The column will be ignore later on 
anyway.
 column = metadata.getDroppedColumnDefinition(name);
+
+// If there's no dropped column, it may be for a column we 
haven't received a schema update for yet
+// so we create a placeholder column. If this is a read, 
the placeholder column will let the response
+// serializer know we're not serializing all requested 
columns when it writes the row flags, but it
+// will cause mutations that try to write values for this 
column to fail.
 if (column == null)
-throw new RuntimeException("Unknown column " + 
UTF8Type.instance.getString(name) + " during deserialization");
+column = ColumnDefinition.placeholder(metadata, name, 
isStatic);
 }
 builder.add(column);
 }
 return new Columns(builder.build());
 }
 
+public 

[cassandra] branch cassandra-3.11 updated (3f73c16 -> 9bf1ab1)

2020-10-05 Thread bdeggleston
This is an automated email from the ASF dual-hosted git repository.

bdeggleston pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 3f73c16  Fix memory leak in CompressedChunkReader
 new 31b9078  Handle unexpected columns due to schema races
 new 9bf1ab1  Merge branch 'cassandra-3.0' into cassandra-3.11

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt|   1 +
 .../apache/cassandra/config/ColumnDefinition.java  |  23 
 src/java/org/apache/cassandra/db/Columns.java  |  19 +++-
 .../apache/cassandra/db/SerializationHeader.java   |  17 ++-
 .../cassandra/db/UnknownColumnException.java   |  12 ++-
 .../apache/cassandra/db/filter/ColumnFilter.java   |   8 +-
 .../cassandra/db/partitions/PartitionUpdate.java   |   7 ++
 .../cassandra/db/rows/UnfilteredSerializer.java|  20 +++-
 .../cassandra/distributed/test/SchemaTest.java | 117 +
 .../distributed/test/SimpleReadWriteTest.java  |  87 ---
 test/unit/org/apache/cassandra/db/ColumnsTest.java |   2 +-
 11 files changed, 280 insertions(+), 33 deletions(-)
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/SchemaTest.java


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-3.0' into cassandra-3.11

2020-10-05 Thread bdeggleston
This is an automated email from the ASF dual-hosted git repository.

bdeggleston pushed a commit to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 9bf1ab1a6a8393a1e5cc67041f6e85dd9065b9f4
Merge: 3f73c16 31b9078
Author: Blake Eggleston 
AuthorDate: Mon Oct 5 14:20:16 2020 -0700

Merge branch 'cassandra-3.0' into cassandra-3.11

 CHANGES.txt|   1 +
 .../apache/cassandra/config/ColumnDefinition.java  |  23 
 src/java/org/apache/cassandra/db/Columns.java  |  19 +++-
 .../apache/cassandra/db/SerializationHeader.java   |  17 ++-
 .../cassandra/db/UnknownColumnException.java   |  12 ++-
 .../apache/cassandra/db/filter/ColumnFilter.java   |   8 +-
 .../cassandra/db/partitions/PartitionUpdate.java   |   7 ++
 .../cassandra/db/rows/UnfilteredSerializer.java|  20 +++-
 .../cassandra/distributed/test/SchemaTest.java | 117 +
 .../distributed/test/SimpleReadWriteTest.java  |  87 ---
 test/unit/org/apache/cassandra/db/ColumnsTest.java |   2 +-
 11 files changed, 280 insertions(+), 33 deletions(-)

diff --cc CHANGES.txt
index b735ba5,1ea5184..99369fa
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,17 -1,10 +1,18 @@@
 -3.0.23:
 - * Handle unexpected columns due to schema races (CASSANDRA-15899)
 +3.11.9
 + * Fix memory leak in CompressedChunkReader (CASSANDRA-15880)
 + * Don't attempt value skipping with mixed version cluster (CASSANDRA-15833)
   * Avoid failing compactions with very large partitions (CASSANDRA-15164)
 - * Use IF NOT EXISTS for index and UDT create statements in snapshot schema 
files (CASSANDRA-13935)
 + * Make sure LCS handles duplicate sstable added/removed notifications 
correctly (CASSANDRA-14103)
 +Merged from 3.0:
++ * Handle unexpected columns due to schema races (CASSANDRA-15899)
   * Add flag to ignore unreplicated keyspaces during repair (CASSANDRA-15160)
  
 -3.0.22:
 +3.11.8
 + * Correctly interpret SASI's `max_compaction_flush_memory_in_mb` setting in 
megabytes not bytes (CASSANDRA-16071)
 + * Fix short read protection for GROUP BY queries (CASSANDRA-15459)
 + * Frozen RawTuple is not annotated with frozen in the toString method 
(CASSANDRA-15857)
 +Merged from 3.0:
 + * Use IF NOT EXISTS for index and UDT create statements in snapshot schema 
files (CASSANDRA-13935)
   * Fix gossip shutdown order (CASSANDRA-15816)
   * Remove broken 'defrag-on-read' optimization (CASSANDRA-15432)
   * Check for endpoint collision with hibernating nodes (CASSANDRA-14599)
diff --cc src/java/org/apache/cassandra/db/filter/ColumnFilter.java
index 57ff729,858c944..3c79539
--- a/src/java/org/apache/cassandra/db/filter/ColumnFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/ColumnFilter.java
@@@ -481,11 -441,11 +481,11 @@@ public class ColumnFilte
  }
  }
  
 -if (hasSelection)
 +if (hasQueried)
  {
- Columns statics = Columns.serializer.deserialize(in, 
metadata);
- Columns regulars = Columns.serializer.deserialize(in, 
metadata);
+ Columns statics = Columns.serializer.deserializeStatics(in, 
metadata);
+ Columns regulars = Columns.serializer.deserializeRegulars(in, 
metadata);
 -selection = new PartitionColumns(statics, regulars);
 +queried = new PartitionColumns(statics, regulars);
  }
  
  SortedSetMultimap 
subSelections = null;
diff --cc src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
index 926f3ef,9e11f94..0890611
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
@@@ -19,18 -19,13 +19,18 @@@ package org.apache.cassandra.db.rows
  
  import java.io.IOException;
  
- import com.google.common.collect.Collections2;
  
 +import net.nicoulaj.compilecommand.annotations.Inline;
  import org.apache.cassandra.config.ColumnDefinition;
+ import org.apache.cassandra.db.marshal.UTF8Type;
  import org.apache.cassandra.db.*;
 +import org.apache.cassandra.db.rows.Row.Deletion;
  import org.apache.cassandra.io.util.DataInputPlus;
 +import org.apache.cassandra.io.util.DataOutputBuffer;
  import org.apache.cassandra.io.util.DataOutputPlus;
 +import org.apache.cassandra.io.util.FileDataInput;
  import org.apache.cassandra.utils.SearchIterator;
 +import org.apache.cassandra.utils.WrappedException;
  
  /**
   * Serialize/deserialize a single Unfiltered (both on-wire and on-disk).
@@@ -230,37 -184,25 +230,42 @@@ public class UnfilteredSerialize
  Columns.serializer.serializeSubset(row.columns(), headerColumns, 
out);
  
  SearchIterator si = 
headerColumns.iterator();
 -for (ColumnData data : row)
 -{
 -// We can obtain the column for data directly from data.column(). 
However, if the cell/complex data
 -// originates from a sstable, the 

[jira] [Assigned] (CASSANDRA-15703) When CDC is disabled bootstrapping breaks

2020-10-05 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova reassigned CASSANDRA-15703:
---

Assignee: Ekaterina Dimitrova

> When CDC is disabled bootstrapping breaks
> -
>
> Key: CASSANDRA-15703
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15703
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: T Jake Luciani
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> Related to CASSANDRA-12697
> There is an edge case left over.  If a cluster had enabled CDC on a table 
> then subsequently set cdc=false, subsequent bootstraps break. 
>  
> This is because the cdc column is false on the existing nodes but null on the 
> bootstrapping node, causing the schema sha to never match.
>  
> There are a couple possible fixes:
>   1.  Since 12697 was only about upgrades we can serialize the cdc column IFF 
> the cluster nodes are all on the same version.
>   2.  We can force cdc=false on all tables where it's null.
>  
> I think #1 is probably simpler. #2 would probably cause more of the same 
> problem if nodes are not all updated with the fix.
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14157) [DTEST] [TRUNK] test_tracing_does_not_interfere_with_digest_calculation - cql_tracing_test.TestCqlTracing failed once : AssertionError: assert 0 == 1

2020-10-05 Thread Adam Holmberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holmberg reassigned CASSANDRA-14157:
-

Assignee: Adam Holmberg

> [DTEST] [TRUNK] test_tracing_does_not_interfere_with_digest_calculation - 
> cql_tracing_test.TestCqlTracing failed once : AssertionError: assert 0 == 1
> -
>
> Key: CASSANDRA-14157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Michael Kjellman
>Assignee: Adam Holmberg
>Priority: Normal
>  Labels: dtest
> Fix For: 4.0-beta3, 4.0-triage
>
>
> test_tracing_does_not_interfere_with_digest_calculation - 
> cql_tracing_test.TestCqlTracing failed it's assertion once today in a 
> circleci run. the dtests were running against trunk.
> Although it has failed once so far, a quick read of the comments in the test 
> seems to indicate that the assertion failing this way might mean that 
> CASSANDRA-13964 didn't fully fix the issue.
> {code:python}
> if jmx.has_mbean(rr_count):
> # expect 0 digest mismatches
> >   assert 0 == jmx.read_attribute(rr_count, 'Count')
> E   AssertionError: assert 0 == 1
> E+  where 1 =   0x7f62d4156898>>('org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking',
>  'Count')
> E+where  > = 
> .read_attribute
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16152) In-JVM dtest - modify schema with stopped nodes and use yaml fragments for config

2020-10-05 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208311#comment-17208311
 ] 

Yifan Cai commented on CASSANDRA-16152:
---

Reviewing the patch in {{trunk}} as the first step. 

> In-JVM dtest - modify schema with stopped nodes and use yaml fragments for 
> config
> -
>
> Key: CASSANDRA-16152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16152
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> Some convenience improvements to in-JVM dtest that are useful across versions 
> that I needed while working on CASSANDRA-16144
> * Add support for changing schema with stopped nodes.
> * Make it simpler to modify nested configuration items by specifying Yaml 
> fragments 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16120) Add ability for jvm-dtest to grep instance logs

2020-10-05 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208306#comment-17208306
 ] 

Yifan Cai commented on CASSANDRA-16120:
---

I just wrote a test that uses the cool log grepping utility, and realized that 
it would be good to always truncate the log files on starting up a new 
clusters. 
Especially, when running a test several times within IDE (no "{{ant clean}}" in 
between test runs). The log files from the previous run pollutes the next ones. 

> Add ability for jvm-dtest to grep instance logs
> ---
>
> Key: CASSANDRA-16120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16120
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> One of the main gaps between python dtest and jvm dtest is python dtest 
> supports the ability to grep the logs of an instance; we need this capability 
> as some tests require validating logs were triggered.
> Pydocs for common log methods 
> {code}
> |  grep_log(self, expr, filename='system.log', from_mark=None)
> |  Returns a list of lines matching the regular expression in parameter
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors(self, filename='system.log')
> |  Returns a list of errors with stack traces
> |  in the Cassandra log of this node
> |
> |  grep_log_for_errors_from(self, filename='system.log', seek_start=0)
> {code}
> {code}
> |  watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log')
> |  Watch the log until one or more (regular) expression are found.
> |  This methods when all the expressions have been found or the method
> |  timeouts (a TimeoutError is then raised). On successful completion,
> |  a list of pair (line matched, match object) is returned.
> {code}
> Below is a POC showing a way to do such logic
> {code}
> package org.apache.cassandra.distributed.test;
> import java.io.BufferedReader;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.io.UncheckedIOException;
> import java.nio.charset.StandardCharsets;
> import java.util.Iterator;
> import java.util.Spliterator;
> import java.util.Spliterators;
> import java.util.regex.Matcher;
> import java.util.regex.Pattern;
> import java.util.stream.Stream;
> import java.util.stream.StreamSupport;
> import com.google.common.io.Closeables;
> import org.junit.Test;
> import org.apache.cassandra.distributed.Cluster;
> import org.apache.cassandra.utils.AbstractIterator;
> public class AllTheLogs extends TestBaseImpl
> {
>@Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag = System.getProperty("cassandra.testtag", 
> "cassandra.testtag_IS_UNDEFINED");
>String suite = System.getProperty("suitename", 
> "suitename_IS_UNDEFINED");
>String log = String.format("build/test/logs/%s/TEST-%s.log", tag, 
> suite);
>grep(log, "Enqueuing flush of tables").forEach(l -> 
> System.out.println("I found the thing: " + l));
>}
>}
>private static Stream grep(String file, String regex) throws 
> IOException
>{
>return grep(file, Pattern.compile(regex));
>}
>private static Stream grep(String file, Pattern regex) throws 
> IOException
>{
>BufferedReader reader = new BufferedReader(new InputStreamReader(new 
> FileInputStream(file), StandardCharsets.UTF_8));
>Iterator it = new AbstractIterator()
>{
>protected String computeNext()
>{
>try
>{
>String s;
>while ((s = reader.readLine()) != null)
>{
>Matcher m = regex.matcher(s);
>if (m.find())
>return s;
>}
>reader.close();
>return endOfData();
>}
>catch (IOException e)
>{
>Closeables.closeQuietly(reader);
>throw new UncheckedIOException(e);
>}
>}
>};
>return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, 
> Spliterator.ORDERED), false);
>}
> }
> {code}
> And
> {code}
> @Test
>public void test() throws IOException
>{
>try (final Cluster cluster = init(Cluster.build(1).start()))
>{
>String tag 

[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown

2020-10-05 Thread Benedict Elliott Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-15214:
---
Fix Version/s: (was: 4.0-triage)

> OOMs caught and not rethrown
> 
>
> Key: CASSANDRA-15214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0, 4.0-rc
>
> Attachments: oom-experiments.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a 
> single thread spawned at startup that waits for any exceptions we must 
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed 
> future proof approach, it may be worth paying the cost of a single thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks

2020-10-05 Thread Benedict Elliott Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-15229:
---
Fix Version/s: (was: 4.0-triage)

> Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed 
> Chunks
> 
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.
> -
> Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When 
> local pool is full, one of its chunks will be evicted and only put back to 
> global pool when all buffers in the evicted chunk are released. But due to 
> chunk cache, buffers can be held for long period of time, preventing evicted 
> chunk to be recycled even though most of space in the evicted chunk are free.
> There two things need to be improved:
> 1. Evicted chunk with free space should be recycled to global pool, even if 
> it's not fully free. It's doable in 4.0.
> 2. Reduce fragmentation caused by different buffer size. With #1, partially 
> freed chunk will be available for allocation, but "holes" in the partially 
> freed chunk are with different sizes. We should consider allocating fixed 
> buffer size which is unlikely to fit in 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14030) disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: Missing: ['127.0.0.5.* now UP']:

2020-10-05 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208280#comment-17208280
 ] 

Adam Holmberg commented on CASSANDRA-14030:
---

Thanks for the input. I was looking into it to make sure it was understood as 
either an actual issue, or a resource problem. At this point I have not found 
any actual functional problems, and I'm probably just diagnosing the 
underprovisioned system.

Taking the age of the original request, and the pristine state of current CI, I 
will close this for now.

If it comes up again, we can reopen. Please let me know if anybody disagrees.

> disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: 
> Missing: ['127.0.0.5.* now UP']:
> ---
>
> Key: CASSANDRA-14030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14030
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Kjellman
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta, 4.0-triage
>
>
> disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: 
> Missing: ['127.0.0.5.* now UP']:
> 15 Nov 2017 11:28:03 [node4] Missing: ['127.0.0.5.* now UP']:
> .
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-NZzhNb
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/disk_balance_test.py", line 44, in 
> disk_balance_bootstrap_test
> node5.start(wait_for_binary_proto=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 706, in start
> node.watch_log_for_alive(self, from_mark=mark)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 520, in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 488, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".\nSee {} for remainder".format(filename))
> "15 Nov 2017 11:28:03 [node4] Missing: ['127.0.0.5.* now UP']:\n.\nSee 
> system.log for remainder\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-NZzhNb\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208278#comment-17208278
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

bq. Aliveness would have to be determined by C' itself, vs via A or B

Determined by that node itself, but it can learn about newer heartbeats from C 
via A or B, it doesn't need direct communication with C for that.

bq. Curious to know your thoughts on the proposed fix

Re-emitting our state is a hack, and, actually, won't matter here since it's 
already been sent and processed; there is no change if we set our state again.  
On restart what is happening is a newer generation from the restart is causing 
our state to be processed as new.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks

2020-10-05 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208267#comment-17208267
 ] 

Caleb Rackliffe commented on CASSANDRA-15229:
-

[~jasonstack] The only thing left to resolve seems like the discussion 
[here|https://github.com/apache/cassandra/pull/535#discussion_r497796381]. 
Otherwise, LGTM

> Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed 
> Chunks
> 
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 4.0, 4.0-beta, 4.0-triage
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.
> -
> Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When 
> local pool is full, one of its chunks will be evicted and only put back to 
> global pool when all buffers in the evicted chunk are released. But due to 
> chunk cache, buffers can be held for long period of time, preventing evicted 
> chunk to be recycled even though most of space in the evicted chunk are free.
> There two things need to be improved:
> 1. Evicted chunk with free space should be recycled to global pool, even if 
> it's not fully free. It's doable in 4.0.
> 2. Reduce fragmentation caused by different buffer size. With #1, partially 
> freed chunk will be available for allocation, but "holes" in the partially 
> freed chunk are with different sizes. We should consider allocating fixed 
> buffer size which is unlikely to fit in 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15804) system_schema keyspace complain of schema mismatch during upgrade

2020-10-05 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic reassigned CASSANDRA-15804:
-

Assignee: (was: Stefan Miklosovic)

> system_schema keyspace complain of schema mismatch during upgrade
> -
>
> Key: CASSANDRA-15804
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15804
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Pedro Gordo
>Priority: Low
> Fix For: 3.11.x, 4.0-beta, 4.0-triage
>
>
> When upgrading from 3.11.4 to 3.11.6, we got the following error:
> {code:Plain Text}
> ERROR [MessagingService-Incoming-/10.20.11.59] 2020-05-07 13:53:52,627 
> CassandraDaemon.java:228 - Exception in thread 
> Thread[MessagingService-Incoming-/10.20.11.59,5,main]
> java.lang.RuntimeException: Unknown column kind during deserialization
> at 
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:464) 
> ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:419)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:195)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:851)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:839)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:425)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:434)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:675)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:658)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) 
> ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> {code}
> I've noticed that system_schema.dropped_columns has a new column called 
> "kind".
> No issues arise from this error message, and the error disappeared after 
> upgrading all nodes. But it still caused concerns due to the ERROR logging 
> level, although "nodetool describecluster" reported only one schema version.
> It makes sense for the system keyspaces to not be included for the 
> "describecluster" schema version check, but it seems to me that these 
> internal schema mismatches should be ignored if the versions are different 
> between the nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15969) jvm dtest execute APIs do not support collections

2020-10-05 Thread Uchenna (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uchenna reassigned CASSANDRA-15969:
---

Assignee: Uchenna

> jvm dtest execute APIs do not support collections
> -
>
> Key: CASSANDRA-15969
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15969
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Uchenna
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0-beta, 4.0-triage
>
>
> If you use a collection type they will be transferred to the instance and we 
> call org.apache.cassandra.utils.ByteBufferUtil#objectToBytes to convert to 
> ByteBuffers; this doesn’t support collections.  If you try to work around 
> this by converting before sending, it will fail since that method doesn’t 
> support ByteBuffer as input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16159) Reduce the Severity of Errors Reported in FailureDetector#isAlive()

2020-10-05 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16159:

Reviewers: Caleb Rackliffe

> Reduce the Severity of Errors Reported in FailureDetector#isAlive()
> ---
>
> Key: CASSANDRA-16159
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16159
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Caleb Rackliffe
>Assignee: Uchenna
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Noticed the following error in the failure detector during a host replacement:
> {noformat}
> java.lang.IllegalArgumentException: Unknown endpoint: 10.38.178.98:7000
>   at 
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:281)
>   at 
> org.apache.cassandra.service.StorageService.handleStateBootreplacing(StorageService.java:2502)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2182)
>   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:3145)
>   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1242)
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1368)
>   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
>   at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
>   at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
>   at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
>   at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:884)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> {noformat}
> This particular error looks benign, given that even if it occurs, the node 
> continues to handle the {{BOOT_REPLACE}} state. There are two things we might 
> be able to do to improve {{FailureDetector#isAlive()}} though:
> 1.) We don’t short circuit in the case that the endpoint in question is in 
> quarantine after being removed. It may be useful to check for this so we can 
> avoid logging an ERROR when the endpoint is clearly doomed/dead. (Quarantine 
> works great when the gossip message is _from_ a quarantined endpoint, but in 
> this case, that would be the new/replacing and not the old/replaced one.)
> 2.) We can reduce the severity of the logging from ERROR to WARN and provide 
> better context around how to determine whether or not there’s actually a 
> problem. (ex. “If this occurs while trying to determine liveness for a node 
> that is currently being replaced, it can be safely ignored.”)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16151) Package tools/bin scripts as executable

2020-10-05 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-16151:
--
Fix Version/s: (was: 4.0-triage)

> Package tools/bin scripts as executable
> ---
>
> Key: CASSANDRA-16151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16151
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Angelo Polo
>Assignee: Angelo Polo
>Priority: Normal
>  Labels: patch
> Fix For: 4.0-beta, 3.11.9
>
> Attachments: 3.11-Package-tools-bin-scripts-as-executable.patch, 
> trunk-Package-tools-bin-scripts-as-executable.patch
>
>
> The tools/bin scripts aren't packaged as executable in the source 
> distributions, though in the repository the scripts have the right bits.
> This causes, on 3.11.8 for example, the tests in 
> org.apache.cassandra.cql3.EmptyValuesTest to fail:
> {{java.io.IOException: Cannot run program "tools/bin/sstabledump": error=13, 
> Permission denied}}
> {{[junit-timeout] junit.framework.AssertionFailedError: java.io.IOException}}
> {{[junit-timeout]         at 
> org.apache.cassandra.cql3.EmptyValuesTest.verify(EmptyValuesTest.java:85)}}
> {{[junit-timeout]         at 
> org.apache.cassandra.cql3.EmptyValuesTest.verifyJsonInsert(EmptyValuesTest.java:112)}}
> See attached patch of build.xml for the trunk and cassandra-3.11 branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16159) Reduce the Severity of Errors Reported in FailureDetector#isAlive()

2020-10-05 Thread Uchenna (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uchenna reassigned CASSANDRA-16159:
---

Assignee: Uchenna

> Reduce the Severity of Errors Reported in FailureDetector#isAlive()
> ---
>
> Key: CASSANDRA-16159
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16159
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Caleb Rackliffe
>Assignee: Uchenna
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Noticed the following error in the failure detector during a host replacement:
> {noformat}
> java.lang.IllegalArgumentException: Unknown endpoint: 10.38.178.98:7000
>   at 
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:281)
>   at 
> org.apache.cassandra.service.StorageService.handleStateBootreplacing(StorageService.java:2502)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2182)
>   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:3145)
>   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1242)
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1368)
>   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
>   at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
>   at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
>   at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
>   at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:884)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> {noformat}
> This particular error looks benign, given that even if it occurs, the node 
> continues to handle the {{BOOT_REPLACE}} state. There are two things we might 
> be able to do to improve {{FailureDetector#isAlive()}} though:
> 1.) We don’t short circuit in the case that the endpoint in question is in 
> quarantine after being removed. It may be useful to check for this so we can 
> avoid logging an ERROR when the endpoint is clearly doomed/dead. (Quarantine 
> works great when the gossip message is _from_ a quarantined endpoint, but in 
> this case, that would be the new/replacing and not the old/replaced one.)
> 2.) We can reduce the severity of the logging from ERROR to WARN and provide 
> better context around how to determine whether or not there’s actually a 
> problem. (ex. “If this occurs while trying to determine liveness for a node 
> that is currently being replaced, it can be safely ignored.”)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-16182:
---
Description: 
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated by cloud provider" due to health check failure and a 
replacement node C' got launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines in an endless fashion

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: 
By the time replacement node (C') finished bootstrapping and announced it's 
state to Normal, A and B were still able to communicate with the replacing node 
C (while C' was not able to with C), and hence rejected C' replacing C. C' does 
not know this and does not attempt to recommunicate its "Normal" state to rest 
of the cluster. (Worth noting that A and B marked C as down soon after)
Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)

I ended up manually fixing this by restarting Cassandra on C', which forced it 
to announce its "Normal" state via
StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
setTokens() --> setGossipTokens()
Alternately, I could have possibly achieved the same behavior if I disabled and 
enabled gossip via jmx/nodetool.

  was:
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines in an endless fashion

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: 
By the time replacement node (C') finished bootstrapping and announced it's 
state to Normal, A and B were still able to communicate with the replacing node 
C (while C' was not able to with C), and hence rejected C' replacing C. C' does 
not know this and does not attempt to recommunicate its "Normal" state to rest 
of the cluster. (Worth noting that A and B marked C as down soon after)
Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)

I ended up manually fixing this by restarting Cassandra on C', which forced it 
to announce its "Normal" state via
StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
setTokens() --> setGossipTokens()
Alternately, I could have possibly 

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208249#comment-17208249
 ] 

Sumanth Pasupuleti commented on CASSANDRA-16182:


{quote}Then it was dead to C', or the replace would've failed on that 
node{quote}

+1

{quote} That's interesting since it should have seen C alive via A or B since 
it could talk to them {quote}
Aliveness would have to be determined by C' itself, vs via A or B isn't it (my 
understanding is, Gossip would help discover cluster members but Aliveness will 
be determined by each individual nodes' FailureDetector). My hypothesis is, C 
was good enough to still hold the connections it had (to A and B), but was bad 
enough not to be able to establish new connections from newer nodes like C'.

{quote} I'm not sure what else we can do {quote}
Curious to know your thoughts on the proposed fix

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208240#comment-17208240
 ] 

Brandon Williams edited comment on CASSANDRA-16182 at 10/5/20, 6:24 PM:


bq. I believe the same (that C was still alive)

Then it was dead to C', or the replace would've failed on that node.  That's 
interesting since it should have seen C alive via A or B since it could talk to 
them.  So you had a split-brain cluster you were doing a topology change on, 
which is generally ok though not ideal, but an unexpected healing of the 
partition during the operation might produce some weird results. I'm not sure 
what else we can do, but the important thing is the cluster handled it 
deterministically


was (Author: brandon.williams):
bq. I believe the same (that C was still alive)

Then it was dead to C', or the replace would've failed on that node.  That's 
interesting since it should have seen C alive via A or B since it could take to 
them.  So you had a split-brain cluster you were doing a topology change on, 
which is generally ok though not ideal, but an unexpected healing of the 
partition during the operation might produce some weird results. I'm not sure 
what else we can do, but the important thing is the cluster handled it 
deterministically

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208240#comment-17208240
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

bq. I believe the same (that C was still alive)

Then it was dead to C', or the replace would've failed on that node.  That's 
interesting since it should have seen C alive via A or B since it could take to 
them.  So you had a split-brain cluster you were doing a topology change on, 
which is generally ok though not ideal, but an unexpected healing of the 
partition during the operation might produce some weird results. I'm not sure 
what else we can do, but the important thing is the cluster handled it 
deterministically

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15241) Virtual table to expose current running queries

2020-10-05 Thread Benedict Elliott Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-15241:
---
Fix Version/s: (was: 4.0-triage)

> Virtual table to expose current running queries
> ---
>
> Key: CASSANDRA-15241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15241
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Virtual Tables
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
> Fix For: 4.0
>
>
> Expose current running queries and their duration.
> {code}cqlsh> select * from system_views.queries;
>  thread_id| duration_micros | task
> --+-+-
>  Native-Transport-Requests-17 |6325 |  QUERY 
> select * from system_views.queries; [pageSize = 100]
>   Native-Transport-Requests-4 |   14681 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>   Native-Transport-Requests-6 |   14678 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>  ReadStage-10 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-13 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-14 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-19 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-20 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-22 |7279 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-23 |4716 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-5 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-7 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-8 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208233#comment-17208233
 ] 

Sumanth Pasupuleti commented on CASSANDRA-16182:


Yes Brandon. I believe the same (that C was still alive). C was alive long 
enough to surpass C' bootstrap completion w.r.t. timeline. As mentioned, a few 
seconds later, C could not communicate anymore and A and B failure detector 
marked it DOWN, after which, had C' (re)announced its NORMAL state, C would 
have been accepted to join the ring by the peers.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16155) ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208232#comment-17208232
 ] 

David Capwell commented on CASSANDRA-16155:
---

I left this feedback in Caleb's version of the patch; can we add a test for the 
vtable that broke?  We didn't have coverage for that/those table(s) so would be 
best to add coverage as well.

> ByteBufferAccessor cast exceptions are thrown when trying to query a virtual 
> table
> --
>
> Key: CASSANDRA-16155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Start a fresh trunk node, and try to run
> SELECT * FROM system_views.local_read_latency ;
> You’ll get: 
> {code:java}
> ERROR [Native-Transport-Requests-1] 2020-09-30 09:44:45,099 
> ErrorMessage.java:457 - Unexpected exception during request
>  java.lang.ClassCastException: 
> org.apache.cassandra.db.marshal.ByteBufferAccessor cannot be cast to 
> java.lang.String
>          at 
> org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:29)
>          at 
> org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:131) 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16083) Missing JMX objects and attributes upgrading from 3.0 to 4.0

2020-10-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208220#comment-17208220
 ] 

Brandon Williams commented on CASSANDRA-16083:
--

As one point of data, in 3.0 we deprecated HintedHandOffManagerMBean, left it 
as such in 3.11, and now have removed it in trunk.

> Missing JMX objects and attributes upgrading from 3.0 to 4.0
> 
>
> Key: CASSANDRA-16083
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16083
> Project: Cassandra
>  Issue Type: Task
>  Components: Observability/Metrics
>Reporter: David Capwell
>Assignee: Uchenna
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Using the tools added in CASSANDRA-16082, below are the list of metrics 
> missing in 4.0 but present in 3.0.  The work here is to make sure we had 
> proper deprecation for each metric, and if not to add it back.
> {code}
> $ tools/bin/jmxtool diff -f yaml cassandra-3.0-jmx.yaml trunk-jmx.yaml 
> --ignore-missing-on-left
> Objects not in right:
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_columnfamilies,name=CasPrepareLatency
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=EstimatedPartitionSizeHistogram
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=BloomFilterFalseRatio
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ReplicaFilteringProtectionRequests
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=RowCacheHitOutOfRange
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=CasPrepareLatency
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=MaxPoolSize
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=ColUpdateTimeDeltaHistogram
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=TombstoneScannedHistogram
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=ActiveTasks
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=WaitingOnFreeMemtableSpace
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_columnfamilies,name=CasCommitTotalLatency
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=MemtableOnHeapSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_aggregates,name=CasProposeLatency
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=AllMemtablesLiveDataSize
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=ViewReadTime
> org.apache.cassandra.db:type=HintedHandoffManager
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=BloomFilterDiskSpaceUsed
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=RequestResponseStage,name=PendingTasks
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=MemtableSwitchCount
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=MemtableOnHeapSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=range_xfers,name=ReplicaFilteringProtectionRowsCachedPerQuery
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=SnapshotsSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=RecentBloomFilterFalsePositives
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ColUpdateTimeDeltaHistogram
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=range_xfers,name=SpeculativeRetries
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=LiveDiskSpaceUsed
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ViewReadTime
> org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=CompletedTasks
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=AllMemtablesLiveDataSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=ViewReadTime
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=BloomFilterFalsePositives
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=range_xfers,name=CompressionMetadataOffHeapMemoryUsed
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=TotalBlockedTasks
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=LiveScannedHistogram
> 

[jira] [Commented] (CASSANDRA-14030) disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: Missing: ['127.0.0.5.* now UP']:

2020-10-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208214#comment-17208214
 ] 

Brandon Williams commented on CASSANDRA-14030:
--

The original report sounds like a timeout/load issue (not seeing some node up) 
and with the other failures fixed, and your only repro being on an 
underprovisioned system, it sounds like there's no point in pursuing this 
further if CI is happy?

> disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: 
> Missing: ['127.0.0.5.* now UP']:
> ---
>
> Key: CASSANDRA-14030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14030
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Kjellman
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta, 4.0-triage
>
>
> disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: 
> Missing: ['127.0.0.5.* now UP']:
> 15 Nov 2017 11:28:03 [node4] Missing: ['127.0.0.5.* now UP']:
> .
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-NZzhNb
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/disk_balance_test.py", line 44, in 
> disk_balance_bootstrap_test
> node5.start(wait_for_binary_proto=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 706, in start
> node.watch_log_for_alive(self, from_mark=mark)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 520, in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 488, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".\nSee {} for remainder".format(filename))
> "15 Nov 2017 11:28:03 [node4] Missing: ['127.0.0.5.* now UP']:\n.\nSee 
> system.log for remainder\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-NZzhNb\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208213#comment-17208213
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

bq. Peer nodes A and B logged "Node C' cannot complete replacement of alive 
node C "

This means either C was still alive, or there was newer gossip information 
somewhere in the cluster that these nodes had not previously seen, since they 
believed C to be alive, which is the crux of the problem here.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16083) Missing JMX objects and attributes upgrading from 3.0 to 4.0

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208212#comment-17208212
 ] 

David Capwell commented on CASSANDRA-16083:
---

C* doesn't have a process at the moment (at least that I can find [~blerer]), 
deprecation has been lacking and gets skipped some time; we should more 
formalize this.

The issue is mostly that this causes a big burden for users as 
monitoring/tooling breaks when we do this, so delays the ability to upgrade 4.0 
until tools/monitoring can adapt.

> Missing JMX objects and attributes upgrading from 3.0 to 4.0
> 
>
> Key: CASSANDRA-16083
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16083
> Project: Cassandra
>  Issue Type: Task
>  Components: Observability/Metrics
>Reporter: David Capwell
>Assignee: Uchenna
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Using the tools added in CASSANDRA-16082, below are the list of metrics 
> missing in 4.0 but present in 3.0.  The work here is to make sure we had 
> proper deprecation for each metric, and if not to add it back.
> {code}
> $ tools/bin/jmxtool diff -f yaml cassandra-3.0-jmx.yaml trunk-jmx.yaml 
> --ignore-missing-on-left
> Objects not in right:
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_columnfamilies,name=CasPrepareLatency
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=EstimatedPartitionSizeHistogram
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=BloomFilterFalseRatio
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ReplicaFilteringProtectionRequests
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=RowCacheHitOutOfRange
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=CasPrepareLatency
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=MaxPoolSize
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=ColUpdateTimeDeltaHistogram
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=TombstoneScannedHistogram
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=ActiveTasks
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=WaitingOnFreeMemtableSpace
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_columnfamilies,name=CasCommitTotalLatency
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=MemtableOnHeapSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_aggregates,name=CasProposeLatency
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=AllMemtablesLiveDataSize
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=ViewReadTime
> org.apache.cassandra.db:type=HintedHandoffManager
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=BloomFilterDiskSpaceUsed
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=RequestResponseStage,name=PendingTasks
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=MemtableSwitchCount
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=MemtableOnHeapSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=range_xfers,name=ReplicaFilteringProtectionRowsCachedPerQuery
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=SnapshotsSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=RecentBloomFilterFalsePositives
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ColUpdateTimeDeltaHistogram
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=range_xfers,name=SpeculativeRetries
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=LiveDiskSpaceUsed
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ViewReadTime
> org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=CompletedTasks
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=AllMemtablesLiveDataSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=ViewReadTime
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=BloomFilterFalsePositives
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=range_xfers,name=CompressionMetadataOffHeapMemoryUsed
> 

[jira] [Updated] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-10-05 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15299:

Reviewers: Alex Petrov, Caleb Rackliffe  (was: Alex Petrov)

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> ---
>
> Key: CASSANDRA-15299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Client
>Reporter: Aleksey Yeschenko
>Assignee: Alex Petrov
>Priority: Normal
>  Labels: protocolv5
> Fix For: 4.0-alpha, 4.0-triage
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-in-jvm-dtest-api] 01/01: Add IInstance#getReleaseVersionString

2020-10-05 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-in-jvm-dtest-api.git

commit 2f0d9321563d6c942c7cf7f7cb532ce4ce023773
Author: Jordan West 
AuthorDate: Thu Oct 1 17:15:23 2020 -0700

Add IInstance#getReleaseVersionString
---
 src/main/java/org/apache/cassandra/distributed/api/IInstance.java | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/main/java/org/apache/cassandra/distributed/api/IInstance.java 
b/src/main/java/org/apache/cassandra/distributed/api/IInstance.java
index 2230cb4..045ea00 100644
--- a/src/main/java/org/apache/cassandra/distributed/api/IInstance.java
+++ b/src/main/java/org/apache/cassandra/distributed/api/IInstance.java
@@ -86,6 +86,8 @@ public interface IInstance extends IIsolatedExecutor
 
 void setMessagingVersion(InetSocketAddress addressAndPort, int version);
 
+String getReleaseVersionString();
+
 void flush(String keyspace);
 
 void forceCompact(String keyspace, String table);


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-in-jvm-dtest-api] branch master updated (6b6e15c -> 2f0d932)

2020-10-05 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a change to branch master
in repository 
https://gitbox.apache.org/repos/asf/cassandra-in-jvm-dtest-api.git.


 discard 6b6e15c  Merge pull request #21 from jrwest/jwest/16148
omit c2a7f48  Add IInstance#getReleaseVersionString
 add 2096398  Update chanelog
 add c2780b7  [maven-release-plugin] prepare release 0.0.5
 add f900334  [maven-release-plugin] prepare for next development iteration
 new 2f0d932  Add IInstance#getReleaseVersionString

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (6b6e15c)
\
 N -- N -- N   refs/heads/master (2f0d932)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt | 10 ++
 pom.xml | 20 ++--
 2 files changed, 28 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16109) Don't adjust nodeCount when setting node id topology in in-jvm dtests

2020-10-05 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208192#comment-17208192
 ] 

David Capwell commented on CASSANDRA-16109:
---

+1 the change to check Wrapper.isShutdown so we ignore errors on shutdown (the 
most common reason to be flaky).

> Don't adjust nodeCount when setting node id topology in in-jvm dtests
> -
>
> Key: CASSANDRA-16109
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16109
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
>  Labels: pull-request-available
>
> We update the node count when setting the node id topology in in-jvm dtests, 
> this should only happen if node count is smaller than the node id topology, 
> otherwise bootstrap tests error out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-16182:
---
Description: 
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines in an endless fashion

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: 
By the time replacement node (C') finished bootstrapping and announced it's 
state to Normal, A and B were still able to communicate with the replacing node 
C (while C' was not able to with C), and hence rejected C' replacing C. C' does 
not know this and does not attempt to recommunicate its "Normal" state to rest 
of the cluster. (Worth noting that A and B marked C as down soon after)
Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)

I ended up manually fixing this by restarting Cassandra on C', which forced it 
to announce its "Normal" state via
StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
setTokens() --> setGossipTokens()
Alternately, I could have possibly achieved the same behavior if I disabled and 
enabled gossip via jmx/nodetool.

  was:
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines in an endless fashion

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)

I ended up manually fixing this by restarting Cassandra on C', which forced it 
to announce its "Normal" state via
StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
setTokens() --> setGossipTokens()
Alternately, I could have possibly achieved the same 

[jira] [Updated] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-16182:
---
Summary: A replacement node, although completed bootstrap and joined ring 
according to itself, stuck in Joining state as per the peers  (was: A 
replacement node, although completed bootstrap and joined ring according to 
itself, maybe stuck in Joining state as per the peers)

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: By the time replacement node (C') finished 
> bootstrapping and announced it's state to Normal, A and B were still able to 
> communicate with the replacing node C (while C' was not able to with C), and 
> hence rejected C' replacing C. C' does not know this and does not attempt to 
> recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
> and B marked C as down soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, maybe stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-16182:
---
Description: 
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines in an endless fashion

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)

I ended up manually fixing this by restarting Cassandra on C', which forced it 
to announce its "Normal" state via
StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
setTokens() --> setGossipTokens()
Alternately, I could have possibly achieved the same behavior if I disabled and 
enabled gossip via jmx/nodetool.

  was:
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines in an endless fashion

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)


> A replacement node, although completed bootstrap and joined ring according to 
> itself, maybe stuck in Joining state as per the peers
> ---
>
>  

[jira] [Assigned] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, maybe stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti reassigned CASSANDRA-16182:
--

Assignee: Sumanth Pasupuleti

> A replacement node, although completed bootstrap and joined ring according to 
> itself, maybe stuck in Joining state as per the peers
> ---
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: By the time replacement node (C') finished 
> bootstrapping and announced it's state to Normal, A and B were still able to 
> communicate with the replacing node C (while C' was not able to with C), and 
> hence rejected C' replacing C. C' does not know this and does not attempt to 
> recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
> and B marked C as down soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, maybe stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-16182:
---
Description: 
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines in an endless fashion

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)

  was:
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)


> A replacement node, although completed bootstrap and joined ring according to 
> itself, maybe stuck in Joining state as per the peers
> ---
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 

[jira] [Updated] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, maybe stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-16182:
---
Description: 
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged 'Node C' cannot complete replacement of alive node 
C'
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)

  was:
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged 'Node C' cannot complete replacement of alive node 
C'
# A few seconds later, A and B marked C' as DOWN

C' continued to log below lines

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)


> A replacement node, although completed bootstrap and joined ring according to 
> itself, maybe stuck in Joining state as per the peers
> ---
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what 

[jira] [Updated] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, maybe stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-16182:
---
Description: 
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged "Node C' cannot complete replacement of alive node 
C "
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)

  was:
This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged 'Node C' cannot complete replacement of alive node 
C'
# A few seconds later, A and B marked C as DOWN

C' continued to log below lines

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)


> A replacement node, although completed bootstrap and joined ring according to 
> itself, maybe stuck in Joining state as per the peers
> ---
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what 

[jira] [Created] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, maybe stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)
Sumanth Pasupuleti created CASSANDRA-16182:
--

 Summary: A replacement node, although completed bootstrap and 
joined ring according to itself, maybe stuck in Joining state as per the peers
 Key: CASSANDRA-16182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip
Reporter: Sumanth Pasupuleti


This issue occurred in a production 3.0.21 cluster.

Here is what happened
# We had, say, a three node Cassandra cluster with nodes A, B and C
# C got "terminated" due to health check failure and a replacement node C' got 
launched.
# C' started bootstrapping data from its neighbors
# Network flaw: Nodes A,B were still able to communicate with terminated node C 
and consequently still have C as alive.
# The replacement node C' learnt about C through gossip but was unable to 
communicate with C and marked C as DOWN.
# C' completed bootstrapping successfully and itself and its peers logged this 
statement "Node C' will complete replacement of C for tokens 
[-7686143363672898397]"
# C' logged the statement "Nodes C' and C have the same token 
-7686143363672898397. C' is the new owner"
# C' started listening for thrift and cql clients
# Peer nodes A and B logged 'Node C' cannot complete replacement of alive node 
C'
# A few seconds later, A and B marked C' as DOWN

C' continued to log below lines

{code:java}
Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a 
log statement fix)
FatClient C has been silent for 3ms, removing from gossip
{code}


My reasoning of what happened: By the time replacement node (C') finished 
bootstrapping and announced it's state to Normal, A and B were still able to 
communicate with the replacing node C (while C' was not able to with C), and 
hence rejected C' replacing C. C' does not know this and does not attempt to 
recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
and B marked C as down soon after)

Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
eventually based on FailureDetector. 

Proposed fix:
When C' is notified through gossip about C, and given both own the same token 
and given C' has finished bootstrapping, C' can emit its Normal state again 
which should fix this in my opinion (so long as A and B have marked C as DOWN, 
which they did eventually)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, maybe stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-16182:
---
Fix Version/s: 3.0.x

> A replacement node, although completed bootstrap and joined ring according to 
> itself, maybe stuck in Joining state as per the peers
> ---
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged 'Node C' cannot complete replacement of alive 
> node C'
> # A few seconds later, A and B marked C' as DOWN
> C' continued to log below lines
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: By the time replacement node (C') finished 
> bootstrapping and announced it's state to Normal, A and B were still able to 
> communicate with the replacing node C (while C' was not able to with C), and 
> hence rejected C' replacing C. C' does not know this and does not attempt to 
> recommunicate its "Normal" state to rest of the cluster. (Worth noting that A 
> and B marked C as down soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14030) disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: Missing: ['127.0.0.5.* now UP']:

2020-10-05 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208161#comment-17208161
 ] 

Adam Holmberg commented on CASSANDRA-14030:
---

Presently both 
[trunk|https://ci-cassandra.apache.org/job/Cassandra-trunk/lastCompletedBuild/testReport/dtest-large.disk_balance_test/TestDiskBalance/test_disk_balance_bootstrap/history/]
 and 
[3.11|https://ci-cassandra.apache.org/job/Cassandra-3.11/lastCompletedBuild/testReport/dtest-large.disk_balance_test/TestDiskBalance/test_disk_balance_bootstrap/history/]
 test histories are pristine (not being run in earlier branches).

So far have not reproduced in a realistic setup, but can get a similar error in 
a comically under-provisioned VM (bootstrapping fifth node on a two-core VM 
with 4GB). Unsure if this is still worth pursuing, but for now I'm looking into 
it. Logs make it look like the bootstrapping process hangs midway through.

> disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: 
> Missing: ['127.0.0.5.* now UP']:
> ---
>
> Key: CASSANDRA-14030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14030
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Kjellman
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta, 4.0-triage
>
>
> disk_balance_bootstrap_test - disk_balance_test.TestDiskBalance fails: 
> Missing: ['127.0.0.5.* now UP']:
> 15 Nov 2017 11:28:03 [node4] Missing: ['127.0.0.5.* now UP']:
> .
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-NZzhNb
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/disk_balance_test.py", line 44, in 
> disk_balance_bootstrap_test
> node5.start(wait_for_binary_proto=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 706, in start
> node.watch_log_for_alive(self, from_mark=mark)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 520, in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 488, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".\nSee {} for remainder".format(filename))
> "15 Nov 2017 11:28:03 [node4] Missing: ['127.0.0.5.* now UP']:\n.\nSee 
> system.log for remainder\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-NZzhNb\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-in-jvm-dtest-api] branch master updated: Add IInstance#getReleaseVersionString

2020-10-05 Thread jwest
This is an automated email from the ASF dual-hosted git repository.

jwest pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-in-jvm-dtest-api.git


The following commit(s) were added to refs/heads/master by this push:
 new c2a7f48  Add IInstance#getReleaseVersionString
 new 6b6e15c  Merge pull request #21 from jrwest/jwest/16148
c2a7f48 is described below

commit c2a7f4832bc00393396954cc0b34a23bf4c5ab42
Author: Jordan West 
AuthorDate: Thu Oct 1 17:15:23 2020 -0700

Add IInstance#getReleaseVersionString
---
 src/main/java/org/apache/cassandra/distributed/api/IInstance.java | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/main/java/org/apache/cassandra/distributed/api/IInstance.java 
b/src/main/java/org/apache/cassandra/distributed/api/IInstance.java
index 2230cb4..045ea00 100644
--- a/src/main/java/org/apache/cassandra/distributed/api/IInstance.java
+++ b/src/main/java/org/apache/cassandra/distributed/api/IInstance.java
@@ -86,6 +86,8 @@ public interface IInstance extends IIsolatedExecutor
 
 void setMessagingVersion(InetSocketAddress addressAndPort, int version);
 
+String getReleaseVersionString();
+
 void flush(String keyspace);
 
 void forceCompact(String keyspace, String table);


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15997) TestBootstrap::test_cleanup failing on unexpected number of SSTables

2020-10-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208118#comment-17208118
 ] 

Brandon Williams commented on CASSANDRA-15997:
--

Committed debugging changes here: 
https://github.com/apache/cassandra-dtest/commit/b117565b8f0096a3ed2af05fdec6e014a05788a1

Now everyone hurry up and wait.

> TestBootstrap::test_cleanup failing on unexpected number of SSTables
> 
>
> Key: CASSANDRA-15997
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15997
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Caleb Rackliffe
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-beta, 4.0-triage
>
>
> This failure has now occurred in a number of places on trunk (4.0), including 
> both Java 8 and 11 dtest runs. Nominally, there appear to be more SSTables 
> after cleanup than the test is expecting.
> {noformat}
> if len(sstables) > basecount + jobs:
> logger.debug("Current count is {}, basecount was 
> {}".format(len(sstables), basecount))
> failed.set()
> {noformat}
> Examples:
> https://app.circleci.com/pipelines/github/maedhroz/cassandra/92/workflows/c59be4f8-329e-4d76-9c59-d49c38e58dd2/jobs/448
> https://app.circleci.com/pipelines/github/jolynch/cassandra/20/workflows/9d6c3b86-6207-4ead-aa4b-79022fc84182/jobs/893



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-dtest] branch master updated: Move debugging to error in TestBootstrap::test_cleanup

2020-10-05 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/master by this push:
 new b117565  Move debugging to error in TestBootstrap::test_cleanup
b117565 is described below

commit b117565b8f0096a3ed2af05fdec6e014a05788a1
Author: Brandon Williams 
AuthorDate: Fri Oct 2 15:19:45 2020 -0500

Move debugging to error in TestBootstrap::test_cleanup

Patch by brandonwilliams, reviewed by Berenguer Blasi for
CASSANDRA-15997
---
 bootstrap_test.py | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/bootstrap_test.py b/bootstrap_test.py
index 20e2545..526992d 100644
--- a/bootstrap_test.py
+++ b/bootstrap_test.py
@@ -815,12 +815,12 @@ class TestBootstrap(Tester):
 
 def _monitor_datadir(self, node, event, basecount, jobs, failed):
 while True:
-sstables = [s for s in node.get_sstables("keyspace1", "standard1") 
if "tmplink" not in s]
-logger.debug("---")
-for sstable in sstables:
-logger.debug(sstable)
 if len(sstables) > basecount + jobs:
-logger.debug("Current count is {}, basecount was 
{}".format(len(sstables), basecount))
+sstables = [s for s in node.get_sstables("keyspace1", 
"standard1") if "tmplink" not in s]
+logger.error("---")
+for sstable in sstables:
+logger.error(sstable)
+logger.error("Current count is {}, basecount was 
{}".format(len(sstables), basecount))
 failed.set()
 return
 if event.is_set():


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16155) ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-05 Thread Chris Lohfink (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208096#comment-17208096
 ] 

Chris Lohfink commented on CASSANDRA-16155:
---

+1

> ByteBufferAccessor cast exceptions are thrown when trying to query a virtual 
> table
> --
>
> Key: CASSANDRA-16155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Start a fresh trunk node, and try to run
> SELECT * FROM system_views.local_read_latency ;
> You’ll get: 
> {code:java}
> ERROR [Native-Transport-Requests-1] 2020-09-30 09:44:45,099 
> ErrorMessage.java:457 - Unexpected exception during request
>  java.lang.ClassCastException: 
> org.apache.cassandra.db.marshal.ByteBufferAccessor cannot be cast to 
> java.lang.String
>          at 
> org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:29)
>          at 
> org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:131) 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks

2020-10-05 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208085#comment-17208085
 ] 

Aleksey Yeschenko commented on CASSANDRA-15229:
---

LGTM with most recent feedback addressed.

> Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed 
> Chunks
> 
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 4.0, 4.0-beta, 4.0-triage
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.
> -
> Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When 
> local pool is full, one of its chunks will be evicted and only put back to 
> global pool when all buffers in the evicted chunk are released. But due to 
> chunk cache, buffers can be held for long period of time, preventing evicted 
> chunk to be recycled even though most of space in the evicted chunk are free.
> There two things need to be improved:
> 1. Evicted chunk with free space should be recycled to global pool, even if 
> it's not fully free. It's doable in 4.0.
> 2. Reduce fragmentation caused by different buffer size. With #1, partially 
> freed chunk will be available for allocation, but "holes" in the partially 
> freed chunk are with different sizes. We should consider allocating fixed 
> buffer size which is unlikely to fit in 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-10-05 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208057#comment-17208057
 ] 

Josh McKenzie commented on CASSANDRA-15579:
---

[~bdeggleston] - confirming - you still have cycles to shepherd this?

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16094) Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas

2020-10-05 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201956#comment-17201956
 ] 

Marcus Eriksson edited comment on CASSANDRA-16094 at 10/5/20, 12:07 PM:


got the same error for this: 
https://app.circleci.com/pipelines/github/krummas/cassandra/543/workflows/555d874f-8c47-41b0-bf95-80aeb9a75188/jobs/3847


was (Author: krummas):
got the same error for this: 
https://app.circleci.com/pipelines/github/krummas/cassandra/543/workflows/555d874f-8c47-41b0-bf95-80aeb9a75188/jobs/3847
 but in test_repaired_tracking_with_mismatching_replicas - 
repair_tests.incremental_repair_test.TestIncRepair

> Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas
> --
>
> Key: CASSANDRA-16094
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16094
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Caleb Rackliffe
>Assignee: Marcus Eriksson
>Priority: Normal
>  Labels: dtest, incremental_repair, repair
> Fix For: 4.0-beta
>
>
> We have two recent failures for this test on trunk: 
> 1.) 
> https://app.circleci.com/pipelines/github/maedhroz/cassandra/102/workflows/37ed8dab-9da4-4730-a883-20b7a99d88b4/jobs/518/tests
>  (CASSANDRA-15909)
> 2.) 
> https://app.circleci.com/pipelines/github/jolynch/cassandra/6/workflows/41e080e0-d7ff-4256-899e-b4010c6ef5ab/jobs/716/tests
>  (CASSANDRA-15379)
> The test expects there to be mismatches and then read repair executed on a 
> following SELECT, but either those mismatches aren’t there, read repair isn’t 
> happening, or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16083) Missing JMX objects and attributes upgrading from 3.0 to 4.0

2020-10-05 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208042#comment-17208042
 ] 

Benjamin Lerer commented on CASSANDRA-16083:


[~dcapwell] Do we have a clear process on how we deprecate JMX 
objects/attributes? 

> Missing JMX objects and attributes upgrading from 3.0 to 4.0
> 
>
> Key: CASSANDRA-16083
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16083
> Project: Cassandra
>  Issue Type: Task
>  Components: Observability/Metrics
>Reporter: David Capwell
>Assignee: Uchenna
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Using the tools added in CASSANDRA-16082, below are the list of metrics 
> missing in 4.0 but present in 3.0.  The work here is to make sure we had 
> proper deprecation for each metric, and if not to add it back.
> {code}
> $ tools/bin/jmxtool diff -f yaml cassandra-3.0-jmx.yaml trunk-jmx.yaml 
> --ignore-missing-on-left
> Objects not in right:
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_columnfamilies,name=CasPrepareLatency
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=EstimatedPartitionSizeHistogram
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=BloomFilterFalseRatio
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ReplicaFilteringProtectionRequests
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=RowCacheHitOutOfRange
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=CasPrepareLatency
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=MaxPoolSize
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=ColUpdateTimeDeltaHistogram
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=TombstoneScannedHistogram
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=ActiveTasks
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=WaitingOnFreeMemtableSpace
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_columnfamilies,name=CasCommitTotalLatency
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=MemtableOnHeapSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_aggregates,name=CasProposeLatency
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=AllMemtablesLiveDataSize
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=ViewReadTime
> org.apache.cassandra.db:type=HintedHandoffManager
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=BloomFilterDiskSpaceUsed
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=RequestResponseStage,name=PendingTasks
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=MemtableSwitchCount
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=MemtableOnHeapSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=range_xfers,name=ReplicaFilteringProtectionRowsCachedPerQuery
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=SnapshotsSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=RecentBloomFilterFalsePositives
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ColUpdateTimeDeltaHistogram
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=range_xfers,name=SpeculativeRetries
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=LiveDiskSpaceUsed
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ViewReadTime
> org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=CompletedTasks
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=AllMemtablesLiveDataSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=ViewReadTime
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=BloomFilterFalsePositives
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=range_xfers,name=CompressionMetadataOffHeapMemoryUsed
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=TotalBlockedTasks
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=LiveScannedHistogram
> 

[jira] [Assigned] (CASSANDRA-16180) 4.0 quality testing: Coordination

2020-10-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña reassigned CASSANDRA-16180:
-

Assignee: Andres de la Peña

> 4.0 quality testing: Coordination
> -
>
> Key: CASSANDRA-16180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16180
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on coordination.
> I think that the main reference dtest for this is 
> [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-10-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208026#comment-17208026
 ] 

Andres de la Peña commented on CASSANDRA-15579:
---

[~jmckenzie] I have created CASSANDRA-16180 for coordination and 
CASSANDRA-16181 for replication. I can either take one of them and leave the 
other for someone else, or take both.

 

 

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Blake Eggleston*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16181) 4.0 quality testing: Replication

2020-10-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16181:
--
Fix Version/s: 4.0

> 4.0 quality testing: Replication
> 
>
> Key: CASSANDRA-16181
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16181
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on replication.
> I think that the main reference dtest for this is 
> [replication_test.py|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16155) ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-05 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208021#comment-17208021
 ] 

Alex Petrov commented on CASSANDRA-16155:
-

[~maedhroz] thank you for the review. I've incorporated your changes, and added 
you as a co-author, since indeed our patches, down to tests, are identical. 
Force-pushed the branch.

> ByteBufferAccessor cast exceptions are thrown when trying to query a virtual 
> table
> --
>
> Key: CASSANDRA-16155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Start a fresh trunk node, and try to run
> SELECT * FROM system_views.local_read_latency ;
> You’ll get: 
> {code:java}
> ERROR [Native-Transport-Requests-1] 2020-09-30 09:44:45,099 
> ErrorMessage.java:457 - Unexpected exception during request
>  java.lang.ClassCastException: 
> org.apache.cassandra.db.marshal.ByteBufferAccessor cannot be cast to 
> java.lang.String
>          at 
> org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:29)
>          at 
> org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:131) 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16180) 4.0 quality testing: Coordination

2020-10-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16180:
--
Fix Version/s: 4.0

> 4.0 quality testing: Coordination
> -
>
> Key: CASSANDRA-16180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16180
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on coordination.
> I think that the main reference dtest for this is 
> [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16180) 4.0 quality testing: Coordination

2020-10-05 Thread Jira
Andres de la Peña created CASSANDRA-16180:
-

 Summary: 4.0 quality testing: Coordination
 Key: CASSANDRA-16180
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16180
 Project: Cassandra
  Issue Type: Task
  Components: Test/unit
Reporter: Andres de la Peña


This is a subtask of CASSANDRA-15579 focusing on coordination.

I think that the main reference dtest for this is 
[consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py].
 We should identify which other tests cover this and identify what should be 
extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16181) 4.0 quality testing: Replication

2020-10-05 Thread Jira
Andres de la Peña created CASSANDRA-16181:
-

 Summary: 4.0 quality testing: Replication
 Key: CASSANDRA-16181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16181
 Project: Cassandra
  Issue Type: Task
  Components: Test/unit
Reporter: Andres de la Peña


This is a subtask of CASSANDRA-15579 focusing on replication.

I think that the main reference dtest for this is 
[replication_test.py|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py].
 We should identify which other tests cover this and identify what should be 
extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth

2020-10-05 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207991#comment-17207991
 ] 

Marcus Eriksson commented on CASSANDRA-15369:
-

sorry, I'll try to get to this soon

> Fake row deletions and range tombstones, causing digest mismatch and sstable 
> growth
> ---
>
> Key: CASSANDRA-15369
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta, 4.0-triage
>
>
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
> tombstone markers under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone 
> markers for any range tombstone that begins or ends outside of the limit of 
> the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each 
> slice/clustering
> Unfortunately, these different behaviours can lead to very different data 
> stored in sstables until a full repair is run.  When we read-repair, we only 
> send these fake deletions or range tombstones.  A fake row deletion, 
> clustering RT and slice RT, each produces a different digest.  So for each 
> single point lookup we can produce a digest mismatch twice, and until a full 
> repair is run we can encounter an unlimited number of digest mismatches 
> across different overlapping queries.
> Relatedly, this seems a more problematic variant of our atomicity failures 
> caused by our monotonic reads, since RTs can have an atomic effect across (up 
> to) the entire partition, whereas the propagation may happen on an 
> arbitrarily small portion.  If the RT exists on only one node, this could 
> plausibly lead to fairly problematic scenario if that node fails before the 
> range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of 
> extraneous data being stored until the range is repaired and compaction 
> happens to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16012) sstablesplit unit test hardening

2020-10-05 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207940#comment-17207940
 ] 

Berenguer Blasi commented on CASSANDRA-16012:
-

Ready for review

> sstablesplit unit test hardening
> 
>
> Key: CASSANDRA-16012
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16012
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/sstable
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
>  Labels: low-hanging-fruit
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> During CASSANDRA-15883 / CASSANDRA-15991 it was detected unit test coverage 
> for this tool is minimal. There is a unit test to enhance upon under 
> {{test/unit/org/apache/cassandra/tools}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15958) org.apache.cassandra.net.ConnectionTest testMessagePurging

2020-10-05 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207934#comment-17207934
 ] 

Benjamin Lerer commented on CASSANDRA-15958:


Could not mark the ticket as committed. Will try again once the problem has 
been fixed.

> org.apache.cassandra.net.ConnectionTest testMessagePurging
> --
>
> Key: CASSANDRA-15958
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15958
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Build: 
> https://ci-cassandra.apache.org/job/Cassandra-trunk-test/196/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/
> Build: 
> https://ci-cassandra.apache.org/job/Cassandra-trunk-test/194/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/
> java.util.concurrent.TimeoutException
>   at org.apache.cassandra.net.AsyncPromise.get(AsyncPromise.java:258)
>   at org.apache.cassandra.net.FutureDelegate.get(FutureDelegate.java:143)
>   at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:268)
>   at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:236)
>   at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:679)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15958) org.apache.cassandra.net.ConnectionTest testMessagePurging

2020-10-05 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207931#comment-17207931
 ] 

Benjamin Lerer commented on CASSANDRA-15958:


I apparently forgot to put my +1 for the patch.

> org.apache.cassandra.net.ConnectionTest testMessagePurging
> --
>
> Key: CASSANDRA-15958
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15958
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Build: 
> https://ci-cassandra.apache.org/job/Cassandra-trunk-test/196/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/
> Build: 
> https://ci-cassandra.apache.org/job/Cassandra-trunk-test/194/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/
> java.util.concurrent.TimeoutException
>   at org.apache.cassandra.net.AsyncPromise.get(AsyncPromise.java:258)
>   at org.apache.cassandra.net.FutureDelegate.get(FutureDelegate.java:143)
>   at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:268)
>   at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:236)
>   at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:679)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Fix flaky test ConnectionTest.testMessagePurging

2020-10-05 Thread blerer
This is an automated email from the ASF dual-hosted git repository.

blerer pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new ba63fa3  Fix flaky test ConnectionTest.testMessagePurging
ba63fa3 is described below

commit ba63fa3c951cb5c18d0fa4f9483577c6e18389c4
Author: Adam Holmberg 
AuthorDate: Wed Aug 19 15:32:09 2020 -0500

Fix flaky test ConnectionTest.testMessagePurging

patch by Adam Holmberg; reviewed by Yifan Cai and Benjamin Lerer for
CASSANDRA-15958

The patch fix 2 problems a race condition in InboundSocket.close when it
is called multiple times and the flakyness in 
ConnectionTest.testMessagePurging.
---
 src/java/org/apache/cassandra/net/InboundSockets.java  | 14 +-
 .../org/apache/cassandra/net/OutboundConnection.java   |  3 ++-
 .../org/apache/cassandra/net/OutboundMessageQueue.java |  1 +
 test/unit/org/apache/cassandra/net/ConnectionTest.java | 18 +++---
 4 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/src/java/org/apache/cassandra/net/InboundSockets.java 
b/src/java/org/apache/cassandra/net/InboundSockets.java
index 6fc5f52..93caf85 100644
--- a/src/java/org/apache/cassandra/net/InboundSockets.java
+++ b/src/java/org/apache/cassandra/net/InboundSockets.java
@@ -63,6 +63,9 @@ class InboundSockets
 // purely to prevent close racing with open
 private boolean closedWithoutOpening;
 
+// used to prevent racing on close
+private Future closeFuture;
+
 /**
  * A group of the open, inbound {@link Channel}s connected to this 
node. This is mostly interesting so that all of
  * the inbound connections/channels can be closed when the listening 
socket itself is being closed.
@@ -109,7 +112,9 @@ class InboundSockets
  * Close this socket and any connections created on it. Once closed, 
this socket may not be re-opened.
  *
  * This may not execute synchronously, so a Future is returned 
encapsulating its result.
- * @param shutdownExecutors
+ * @param shutdownExecutors consumer invoked with the internal 
executor on completion
+ *  Note that the consumer will only be 
invoked once per InboundSocket.
+ *  Subsequent calls to close will not 
register a callback to different consumers.
  */
 private Future close(Consumer 
shutdownExecutors)
 {
@@ -136,6 +141,13 @@ class InboundSockets
 return new SucceededFuture<>(GlobalEventExecutor.INSTANCE, 
null);
 }
 
+if (closeFuture != null)
+{
+return closeFuture;
+}
+
+closeFuture = done;
+
 if (listen != null)
 {
 close.run();
diff --git a/src/java/org/apache/cassandra/net/OutboundConnection.java 
b/src/java/org/apache/cassandra/net/OutboundConnection.java
index b0edc03..66f14db 100644
--- a/src/java/org/apache/cassandra/net/OutboundConnection.java
+++ b/src/java/org/apache/cassandra/net/OutboundConnection.java
@@ -110,7 +110,8 @@ public class OutboundConnection
 
 private final OutboundMessageCallbacks callbacks;
 private final OutboundDebugCallbacks debug;
-private final OutboundMessageQueue queue;
+@VisibleForTesting
+final OutboundMessageQueue queue;
 /** the number of bytes we permit to queue to the network without 
acquiring any shared resource permits */
 private final long pendingCapacityInBytes;
 /** the number of messages and bytes queued for flush to the network,
diff --git a/src/java/org/apache/cassandra/net/OutboundMessageQueue.java 
b/src/java/org/apache/cassandra/net/OutboundMessageQueue.java
index 3d8bac0..d7360a0 100644
--- a/src/java/org/apache/cassandra/net/OutboundMessageQueue.java
+++ b/src/java/org/apache/cassandra/net/OutboundMessageQueue.java
@@ -87,6 +87,7 @@ class OutboundMessageQueue
 {
 maybePruneExpired();
 externalQueue.offer(m);
+// Known race here. See CASSANDRAi-15958
 nextExpirationDeadlineUpdater.accumulateAndGet(this,

maybeUpdateEarliestExpiresAt(clock.now(), m.expiresAtNanos()),
Math::min);
diff --git a/test/unit/org/apache/cassandra/net/ConnectionTest.java 
b/test/unit/org/apache/cassandra/net/ConnectionTest.java
index eb8d867..5c637ac 100644
--- a/test/unit/org/apache/cassandra/net/ConnectionTest.java
+++ b/test/unit/org/apache/cassandra/net/ConnectionTest.java
@@ -685,9 +685,21 @@ public class ConnectionTest
 Message message = Message.builder(Verb._TEST_1, 
noPayload)
 
.withExpiresAt(System.nanoTime() + 

[jira] [Commented] (CASSANDRA-16162) Improve empty hint file handling on startup

2020-10-05 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207911#comment-17207911
 ] 

Marcus Eriksson commented on CASSANDRA-16162:
-

[3.0|https://github.com/krummas/cassandra/commits/marcuse/16162] 
[cci|https://app.circleci.com/pipelines/github/krummas/cassandra/549/workflows/b7fe048c-1da1-4b5a-9379-c888f7e2aa2c]
[3.11|https://github.com/krummas/cassandra/commits/marcuse/16162-3.11] 
[cci|https://app.circleci.com/pipelines/github/krummas/cassandra/548/workflows/8a53c036-fd18-4768-b174-ec82cfeec19d]
[trunk|https://github.com/krummas/cassandra/commits/marcuse/16162-trunk] 
[cci|https://app.circleci.com/pipelines/github/krummas/cassandra/550/workflows/83023e38-18a1-4872-a11c-1258a81f3e83]

> Improve empty hint file handling on startup
> ---
>
> Key: CASSANDRA-16162
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16162
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>
> Since CASSANDRA-14080 we handle empty/corrupt hint files on startup, we 
> should remove empty files and rename corrupt ones to make sure we don't get 
> the same exception on every startup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16109) Don't adjust nodeCount when setting node id topology in in-jvm dtests

2020-10-05 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207863#comment-17207863
 ] 

Marcus Eriksson commented on CASSANDRA-16109:
-

pushed a patch to ignore exceptions on shutdown nodes to the branches above
2.2 circle: 
https://app.circleci.com/pipelines/github/krummas/cassandra/553/workflows/74f7c8ec-e593-4172-9932-15a97ab15e44
3.0 circle: 
https://app.circleci.com/pipelines/github/krummas/cassandra/552/workflows/70aa2198-a1fc-431e-b670-4082c71d88b8
3.11 circle: 
https://app.circleci.com/pipelines/github/krummas/cassandra/551/workflows/cb37d6f9-e877-4099-87f3-4fa03cf21b37
trunk circle: 
https://app.circleci.com/pipelines/github/krummas/cassandra/554/workflows/28c67ef1-26e6-4373-a11c-d38413fd30ea

> Don't adjust nodeCount when setting node id topology in in-jvm dtests
> -
>
> Key: CASSANDRA-16109
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16109
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
>  Labels: pull-request-available
>
> We update the node count when setting the node id topology in in-jvm dtests, 
> this should only happen if node count is smaller than the node id topology, 
> otherwise bootstrap tests error out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org