[jira] [Updated] (CASSANDRA-14060) Separate CorruptSSTableException and FSError handling policies

2017-11-20 Thread Jay Zhuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14060:
---
Status: Patch Available  (was: Open)

> Separate CorruptSSTableException and FSError handling policies
> --
>
> Key: CASSANDRA-14060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14060
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> Currently, if 
> [{{disk_failure_policy}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L230]
>  is set to {{stop}} (default), StorageService will shutdown for {{FSError}}, 
> but not {{CorruptSSTableException}} 
> [DefaultFSErrorHandler.java:40|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java#L40].
> But when we use policy: {{die}}, it has different behave, JVM will be killed 
> for both {{FSError}} and {{CorruptSSTableException}} 
> [JVMStabilityInspector.java:63|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java#L63]:
> ||{{disk_failure_policy}}|| hit {{FSError}} Exception || hit 
> {{CorruptSSTableException}} ||
> |{{stop}}| (/) stop | (x) not stop |
> |{{die}}| (/) die | (/) die |
> We saw {{CorruptSSTableException}} from time to time in our production, but 
> mostly it's *not* because of a disk issue. So I would suggest having a 
> separate policy for CorruptSSTable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14061) trunk eclipse-warnings

2017-11-20 Thread Jay Zhuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang reassigned CASSANDRA-14061:
--

Assignee: Jay Zhuang

> trunk eclipse-warnings
> --
>
> Key: CASSANDRA-14061
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14061
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> {noformat}
> eclipse-warnings:
> [mkdir] Created dir: /home/ubuntu/cassandra/build/ecj
>  [echo] Running Eclipse Code Analysis.  Output logged to 
> /home/ubuntu/cassandra/build/ecj/eclipse_compiler_checks.txt
>  [java] --
>  [java] 1. ERROR in 
> /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java
>  (at line 59)
>  [java]   return new SSTableIdentityIterator(sstable, key, 
> partitionLevelDeletion, file.getPath(), iterator);
>  [java]   
> ^^^
>  [java] Potential resource leak: 'iterator' may not be closed at this 
> location
>  [java] --
>  [java] 2. ERROR in 
> /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java
>  (at line 79)
>  [java]   return new SSTableIdentityIterator(sstable, key, 
> partitionLevelDeletion, dfile.getPath(), iterator);
>  [java]   
> 
>  [java] Potential resource leak: 'iterator' may not be closed at this 
> location
>  [java] --
>  [java] 2 problems (2 errors)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14061) trunk eclipse-warnings

2017-11-20 Thread Jay Zhuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14061:
---
Status: Patch Available  (was: Open)

> trunk eclipse-warnings
> --
>
> Key: CASSANDRA-14061
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14061
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> {noformat}
> eclipse-warnings:
> [mkdir] Created dir: /home/ubuntu/cassandra/build/ecj
>  [echo] Running Eclipse Code Analysis.  Output logged to 
> /home/ubuntu/cassandra/build/ecj/eclipse_compiler_checks.txt
>  [java] --
>  [java] 1. ERROR in 
> /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java
>  (at line 59)
>  [java]   return new SSTableIdentityIterator(sstable, key, 
> partitionLevelDeletion, file.getPath(), iterator);
>  [java]   
> ^^^
>  [java] Potential resource leak: 'iterator' may not be closed at this 
> location
>  [java] --
>  [java] 2. ERROR in 
> /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java
>  (at line 79)
>  [java]   return new SSTableIdentityIterator(sstable, key, 
> partitionLevelDeletion, dfile.getPath(), iterator);
>  [java]   
> 
>  [java] Potential resource leak: 'iterator' may not be closed at this 
> location
>  [java] --
>  [java] 2 problems (2 errors)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14060) Separate CorruptSSTableException and FSError handling policies

2017-11-20 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258916#comment-16258916
 ] 

Jay Zhuang commented on CASSANDRA-14060:


Here is the patch, please review:
| Branch | uTest |
| [14060|https://github.com/cooldoger/cassandra/tree/14060] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14060.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14060]
 |

> Separate CorruptSSTableException and FSError handling policies
> --
>
> Key: CASSANDRA-14060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14060
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> Currently, if 
> [{{disk_failure_policy}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L230]
>  is set to {{stop}} (default), StorageService will shutdown for {{FSError}}, 
> but not {{CorruptSSTableException}} 
> [DefaultFSErrorHandler.java:40|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java#L40].
> But when we use policy: {{die}}, it has different behave, JVM will be killed 
> for both {{FSError}} and {{CorruptSSTableException}} 
> [JVMStabilityInspector.java:63|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java#L63]:
> ||{{disk_failure_policy}}|| hit {{FSError}} Exception || hit 
> {{CorruptSSTableException}} ||
> |{{stop}}| (/) stop | (x) not stop |
> |{{die}}| (/) die | (/) die |
> We saw {{CorruptSSTableException}} from time to time in our production, but 
> mostly it's *not* because of a disk issue. So I would suggest having a 
> separate policy for CorruptSSTable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14061) trunk eclipse-warnings

2017-11-20 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258926#comment-16258926
 ] 

Jay Zhuang commented on CASSANDRA-14061:


Seems they're not the real problem, here is the patch to {{SupressWarning}}:
| Branch | uTest |
| [14061|https://github.com/cooldoger/cassandra/tree/14061] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14061.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14061]
 |

Also, making {{ant eclipse-warnings}} required in circleci build.

> trunk eclipse-warnings
> --
>
> Key: CASSANDRA-14061
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14061
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> {noformat}
> eclipse-warnings:
> [mkdir] Created dir: /home/ubuntu/cassandra/build/ecj
>  [echo] Running Eclipse Code Analysis.  Output logged to 
> /home/ubuntu/cassandra/build/ecj/eclipse_compiler_checks.txt
>  [java] --
>  [java] 1. ERROR in 
> /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java
>  (at line 59)
>  [java]   return new SSTableIdentityIterator(sstable, key, 
> partitionLevelDeletion, file.getPath(), iterator);
>  [java]   
> ^^^
>  [java] Potential resource leak: 'iterator' may not be closed at this 
> location
>  [java] --
>  [java] 2. ERROR in 
> /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java
>  (at line 79)
>  [java]   return new SSTableIdentityIterator(sstable, key, 
> partitionLevelDeletion, dfile.getPath(), iterator);
>  [java]   
> 
>  [java] Potential resource leak: 'iterator' may not be closed at this 
> location
>  [java] --
>  [java] 2 problems (2 errors)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky

2017-11-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrés de la Peña updated CASSANDRA-13963:
--
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

> SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild
>  is flaky
> -
>
> Key: CASSANDRA-13963
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13963
> Project: Cassandra
>  Issue Type: Bug
>  Components: Secondary Indexes, Testing
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
>Priority: Minor
> Fix For: 4.x
>
>
> The unit test 
> [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476]
>  is flaky. Apart from [the CI results showing a 3% 
> flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/],
>  the test failure can be locally reproduced just running the test multiple 
> times. In my case, it fails 2-5 times for each 1000 executions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky

2017-11-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259506#comment-16259506
 ] 

Andrés de la Peña commented on CASSANDRA-13963:
---

Thank for the review :)

Committed to master as 
[5792b667ecf461a40cc391bc1496287547179c91|https://github.com/apache/cassandra/commit/5792b667ecf461a40cc391bc1496287547179c91]

> SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild
>  is flaky
> -
>
> Key: CASSANDRA-13963
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13963
> Project: Cassandra
>  Issue Type: Bug
>  Components: Secondary Indexes, Testing
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
>Priority: Minor
> Fix For: 4.x
>
>
> The unit test 
> [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476]
>  is flaky. Apart from [the CI results showing a 3% 
> flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/],
>  the test failure can be locally reproduced just running the test multiple 
> times. In my case, it fails 2-5 times for each 1000 executions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



cassandra git commit: Fix flaky unit test indexWithFailedInitializationIsNotQueryableAfterPartialRebuild

2017-11-20 Thread adelapena
Repository: cassandra
Updated Branches:
  refs/heads/trunk da410153e -> 5792b667e


Fix flaky unit test 
indexWithFailedInitializationIsNotQueryableAfterPartialRebuild

patch by Andres de la Peña; reviewed by Robert Stupp for CASSANDRA-13963


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5792b667
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5792b667
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5792b667

Branch: refs/heads/trunk
Commit: 5792b667ecf461a40cc391bc1496287547179c91
Parents: da41015
Author: Andrés de la Peña 
Authored: Wed Oct 18 12:25:26 2017 +0100
Committer: Andrés de la Peña 
Committed: Mon Nov 20 17:07:45 2017 +

--
 CHANGES.txt |  1 +
 .../cassandra/index/SecondaryIndexManager.java  | 13 
 .../org/apache/cassandra/cql3/CQLTester.java| 33 
 .../index/SecondaryIndexManagerTest.java|  6 ++--
 4 files changed, 51 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/5792b667/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index a690e17..03f5de8 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0
+ * Fix flaky indexWithFailedInitializationIsNotQueryableAfterPartialRebuild 
(CASSANDRA-13963)
  * Introduce leaf-only iterator (CASSANDRA-9988)
  * Upgrade Guava to 23.3 and Airline to 0.8 (CASSANDRA-13997)
  * Allow only one concurrent call to StatusLogger (CASSANDRA-12182)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5792b667/src/java/org/apache/cassandra/index/SecondaryIndexManager.java
--
diff --git a/src/java/org/apache/cassandra/index/SecondaryIndexManager.java 
b/src/java/org/apache/cassandra/index/SecondaryIndexManager.java
index 27a9b16..b60d811 100644
--- a/src/java/org/apache/cassandra/index/SecondaryIndexManager.java
+++ b/src/java/org/apache/cassandra/index/SecondaryIndexManager.java
@@ -276,6 +276,19 @@ public class SecondaryIndexManager implements 
IndexRegistry, INotificationConsum
 return queryableIndexes.contains(index.getIndexMetadata().name);
 }
 
+/**
+ * Checks if the specified index has any running build task.
+ *
+ * @param indexName the index name
+ * @return {@code true} if the index is building, {@code false} otherwise
+ */
+@VisibleForTesting
+public synchronized boolean isIndexBuilding(String indexName)
+{
+AtomicInteger counter = inProgressBuilds.get(indexName);
+return counter != null && counter.get() > 0;
+}
+
 public synchronized void removeIndex(String indexName)
 {
 Index index = unregisterIndex(indexName);

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5792b667/test/unit/org/apache/cassandra/cql3/CQLTester.java
--
diff --git a/test/unit/org/apache/cassandra/cql3/CQLTester.java 
b/test/unit/org/apache/cassandra/cql3/CQLTester.java
index 062a4bc..b038ce0 100644
--- a/test/unit/org/apache/cassandra/cql3/CQLTester.java
+++ b/test/unit/org/apache/cassandra/cql3/CQLTester.java
@@ -46,6 +46,7 @@ import com.datastax.driver.core.ResultSet;
 
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.concurrent.ScheduledExecutors;
+import org.apache.cassandra.index.SecondaryIndexManager;
 import org.apache.cassandra.schema.*;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.cql3.functions.FunctionName;
@@ -736,6 +737,38 @@ public abstract class CQLTester
 return indexCreated;
 }
 
+/**
+ * Index creation is asynchronous, this method waits until the specified 
index hasn't any building task running.
+ * 
+ * This method differs from {@link #waitForIndex(String, String, String)} 
in that it doesn't require the index to be
+ * fully nor successfully built, so it can be used to wait for failing 
index builds.
+ *
+ * @param keyspace the index keyspace name
+ * @param indexName the index name
+ * @return {@code true} if the index build tasks have finished in 5 
seconds, {@code false} otherwise
+ */
+protected boolean waitForIndexBuilds(String keyspace, String indexName) 
throws InterruptedException
+{
+long start = System.currentTimeMillis();
+SecondaryIndexManager indexManager = 
getCurrentColumnFamilyStore(keyspace).indexManager;
+
+while (true)
+{
+if (!indexManager.isIndexBuilding(indexName))
+{
+return true;
+}
+else if 

[jira] [Commented] (CASSANDRA-14043) Lots of failures in test_network_topology_strategy

2017-11-20 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260036#comment-16260036
 ] 

Joseph Lynch commented on CASSANDRA-14043:
--

[~mkjellman]

I'm trying to reproduce these failures/flakes so I can debug them, but these 
appear to pass locally for me against latest trunk running on Mac OSX (once I 
setup the 9 localhost aliases needed by Mac). Do these fail every time for you 
or is it more of a flakey experience?

e.g.

{noformat}
~/pg/cassandra-dtest-mirror(CASSANDRA-14043*) » nosetests --no-skip -x 
consistency_test.py   
...
--
Ran 24 tests in 1388.105s

OK
{noformat}



> Lots of failures in test_network_topology_strategy
> --
>
> Key: CASSANDRA-14043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14043
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>
> There are lots of failures in test_network_topology_strategy... Creating one 
> JIRA to track them all...
> test_network_topology_strategy - consistency_test.TestAvailability
> [Errno 2] No such file or directory: 
> '/tmp/dtest-H1psQ0/test/node1/logs/system.log'
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-H1psQ0
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 358, in run
> self.tearDown()
>   File "/home/cassandra/cassandra-dtest/dtest.py", line 597, in tearDown
> if not self.allow_log_errors and self.check_logs_for_errors():
>   File "/home/cassandra/cassandra-dtest/dtest.py", line 614, in 
> check_logs_for_errors
> ['\n'.join(msg) for msg in node.grep_log_for_errors()]))
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 386, in grep_log_for_errors
> return self.grep_log_for_errors_from(seek_start=getattr(self, 
> 'error_mark', 0))
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 389, in grep_log_for_errors_from
> with open(os.path.join(self.get_path(), 'logs', filename)) as f:
> "[Errno 2] No such file or directory: 
> '/tmp/dtest-H1psQ0/test/node1/logs/system.log'\n >> begin 
> captured logging << \ndtest: DEBUG: cluster ccm 
> directory: /tmp/dtest-H1psQ0\ndtest: DEBUG: Done setting configuration 
> options:\n{   'initial_token': None,\n'num_tokens': '32',\n
> 'phi_convict_threshold': 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"
> test_network_topology_strategy_counters - consistency_test.TestAccuracy
> Error starting node3.
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-3px8TH
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Testing multiple dcs, counters
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 753, in 
> test_network_topology_strategy_counters
> 
> self._run_test_function_in_parallel(TestAccuracy.Validation.validate_counters,
>  self.nodes, self.rf.values(), combinations),
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in 
> _run_test_function_in_parallel
> self._start_cluster(save_sessions=True, 
> requires_local_reads=requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 141, in 
> _start_cluster
> cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", 
> line 423, in start
> raise 

[jira] [Created] (CASSANDRA-14062) Pluggable CommitLog

2017-11-20 Thread Rei Odaira (JIRA)
Rei Odaira created CASSANDRA-14062:
--

 Summary: Pluggable CommitLog
 Key: CASSANDRA-14062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14062
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Rei Odaira
 Fix For: 4.x
 Attachments: pluggable-commitlog-src.patch, 
pluggable-commitlog-test.patch

This proposal is to make CommitLog pluggable, as discussed in [the Cassandra 
dev mailing 
list|https://lists.apache.org/thread.html/1936194d86f5954fa099ced9a0733458eb3249bff3fae3e03e2d1bd8@%3Cdev.cassandra.apache.org%3E].

We are developing a Cassandra plugin to store CommitLog on our low-latency 
Flash device (CAPI-Flash). To do that, the original CommitLog interface must be 
changed to allow plugins. Synching to CommitLog is one of the performance 
bottlenecks in Cassandra especially with batch commit. I think the pluggable 
CommitLog will allow other interesting alternatives, such as one using SPDK.

Our high-level design is similar to the CacheProvider framework
in org.apache.cassandra.cache:
* Introduce a new interface, ICommitLog, with methods like 
getCurrentPosition(), add(), shutdownBlocking(), etc.
* CommitLog implements ICommitLog.
* Introduce a new interface, CommitLogProvider, with a create() method, 
returning ICommitLog.
* Introduce a new class FileCommitLogProvider implementing CommitLogProvider, 
to return a singleton instance of CommitLog.
* Introduce a new property in cassandra.yaml, commitlog_class_name, which 
specifies what CommitLogProvider to use.  The default is FileCommitLogProvider.
* Introduce a new class, CommitLogHelper, that loads the class specified by the 
commitlog_class_name property, creates an instance, and stores it to 
CommitLogHelper.instance.
* Replace all of the references to CommitLog.instance with 
CommitLogHelper.instance.

Attached are two patches. "pluggable-commitlog-src.patch" is for changes in the 
src directory, and "pluggable-commitlog-test.patch" is for the test directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14043) Lots of failures in test_network_topology_strategy

2017-11-20 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260202#comment-16260202
 ] 

Jason Brown commented on CASSANDRA-14043:
-

These kind of errors usually seem to be environment related. If they are flaky 
(esp. failures like {{"Error starting node1."}}), it might not be that test 
specifically, but something in it may trigger the behavior.

Perhaps you run a single test function in a loop and wait until it fails. 
Here's a [naive 
script|https://gist.github.com/jasobrown/b293aaa989bf86c3a0d1e0bef1541e5b] I 
use. It's not brilliant, but useful

> Lots of failures in test_network_topology_strategy
> --
>
> Key: CASSANDRA-14043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14043
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>
> There are lots of failures in test_network_topology_strategy... Creating one 
> JIRA to track them all...
> test_network_topology_strategy - consistency_test.TestAvailability
> [Errno 2] No such file or directory: 
> '/tmp/dtest-H1psQ0/test/node1/logs/system.log'
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-H1psQ0
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 358, in run
> self.tearDown()
>   File "/home/cassandra/cassandra-dtest/dtest.py", line 597, in tearDown
> if not self.allow_log_errors and self.check_logs_for_errors():
>   File "/home/cassandra/cassandra-dtest/dtest.py", line 614, in 
> check_logs_for_errors
> ['\n'.join(msg) for msg in node.grep_log_for_errors()]))
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 386, in grep_log_for_errors
> return self.grep_log_for_errors_from(seek_start=getattr(self, 
> 'error_mark', 0))
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 389, in grep_log_for_errors_from
> with open(os.path.join(self.get_path(), 'logs', filename)) as f:
> "[Errno 2] No such file or directory: 
> '/tmp/dtest-H1psQ0/test/node1/logs/system.log'\n >> begin 
> captured logging << \ndtest: DEBUG: cluster ccm 
> directory: /tmp/dtest-H1psQ0\ndtest: DEBUG: Done setting configuration 
> options:\n{   'initial_token': None,\n'num_tokens': '32',\n
> 'phi_convict_threshold': 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"
> test_network_topology_strategy_counters - consistency_test.TestAccuracy
> Error starting node3.
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-3px8TH
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Testing multiple dcs, counters
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 753, in 
> test_network_topology_strategy_counters
> 
> self._run_test_function_in_parallel(TestAccuracy.Validation.validate_counters,
>  self.nodes, self.rf.values(), combinations),
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in 
> _run_test_function_in_parallel
> self._start_cluster(save_sessions=True, 
> requires_local_reads=requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 141, in 
> _start_cluster
> cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", 
> line 423, in start
> raise NodeError("Error starting {0}.".format(node.name), p)
> "Error starting node3.\n >> begin captured logging << 
> 

[jira] [Commented] (CASSANDRA-14051) Many materialized_views_test are busted

2017-11-20 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260219#comment-16260219
 ] 

Kurt Greaves commented on CASSANDRA-14051:
--

Just going to dump relevant logs/error messages here regarding different tests 
while I work on them... Some are flaky and take awhile to reproduce.

{{add_node_after_very_wide_mv_test}} fails on bootstrap of the fourth node due 
to:
{code}
INFO  [Stream-Deserializer-/127.0.0.1:7000-b69a2f59] 2017-11-21 03:31:33,964 
StreamResultFuture.java:193 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] 
Session with /127.0.0.1 is complete
WARN  [Stream-Deserializer-/127.0.0.2:44789-46b17cc7] 2017-11-21 03:31:34,186 
CompressedStreamReader.java:111 - [Stream 77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] 
Error while reading partition DecoratedKey(-3248873570005575792, 0002) from 
stream on ks='ks' and table='t'.
ERROR [Stream-Deserializer-/127.0.0.2:44789-46b17cc7] 2017-11-21 03:31:34,199 
StreamSession.java:617 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] 
Streaming error occurred on session with peer 127.0.0.2
org.apache.cassandra.streaming.StreamReceiveException: 
java.lang.AssertionError: stream can only read forward.
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:63)
 ~[main/:na]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:41)
 ~[main/:na]
at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
 ~[main/:na]
at 
org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:178)
 ~[main/:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: java.lang.AssertionError: stream can only read forward.
at 
org.apache.cassandra.streaming.compress.CompressedInputStream.position(CompressedInputStream.java:108)
 ~[main/:na]
at 
org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:94)
 ~[main/:na]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:58)
 ~[main/:na]
... 4 common frames omitted
INFO  [Stream-Deserializer-/127.0.0.2:44789-46b17cc7] 2017-11-21 03:31:34,201 
StreamResultFuture.java:193 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] 
Session with /127.0.0.2 is complete
INFO  [Stream-Deserializer-/127.0.0.3:7000-099ad298] 2017-11-21 03:31:34,372 
StreamResultFuture.java:179 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad 
ID#0] Prepare completed. Receiving 2 files(49.251KiB), sending 0 files(0.000KiB)
INFO  [Stream-Deserializer-/127.0.0.3:7000-099ad298] 2017-11-21 03:31:35,575 
StreamResultFuture.java:193 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] 
Session with /127.0.0.3 is complete
WARN  [Stream-Deserializer-/127.0.0.3:7000-099ad298] 2017-11-21 03:31:35,584 
StreamResultFuture.java:220 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] 
Stream failed
ERROR [main] 2017-11-21 03:31:35,585 StorageService.java:1487 - Error while 
waiting on bootstrap to complete. Bootstrap will have to be restarted.
java.util.concurrent.ExecutionException: 
org.apache.cassandra.streaming.StreamException: Stream failed
at 
com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:528)
 ~[guava-23.3-jre.jar:na]
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:507) 
~[guava-23.3-jre.jar:na]
at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1482) 
[main/:na]
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:932)
 [main/:na]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:637) 
[main/:na]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:569) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[main/:na]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at 
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
 ~[main/:na]
at 
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1371)
 ~[guava-23.3-jre.jar:na]
at 
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
 ~[guava-23.3-jre.jar:na]
at 
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1016)
 ~[guava-23.3-jre.jar:na]
at 

[jira] [Assigned] (CASSANDRA-14051) Many materialized_views_test are busted

2017-11-20 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves reassigned CASSANDRA-14051:


Assignee: Kurt Greaves

> Many materialized_views_test are busted
> ---
>
> Key: CASSANDRA-14051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14051
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Kurt Greaves
>
> Many materialized_views_test are busted... For now we should disable the 
> entire MV test suite until effort is put into making these test usable and 
> helpful. Currently they aren't helpful and almost all fail.
> test_base_column_in_view_pk_commutative_tombstone_without_flush - 
> materialized_views_test.TestMaterializedViews
> test_base_column_in_view_pk_complex_timestamp_with_flush - 
> materialized_views_test.TestMaterializedViews
> add_dc_after_mv_network_replication_test - 
> materialized_views_test.TestMaterializedViews
> add_dc_after_mv_simple_replication_test - 
> materialized_views_test.TestMaterializedViews
> add_node_after_mv_test - materialized_views_test.TestMaterializedViews
> add_node_after_very_wide_mv_test - 
> materialized_views_test.TestMaterializedViews
> add_node_after_wide_mv_with_range_deletions_test - 
> materialized_views_test.TestMaterializedViews



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14063) Cassandra will start listening for clients without initialising system_auth after a failed bootstrap

2017-11-20 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14063:
-

 Summary: Cassandra will start listening for clients without 
initialising system_auth after a failed bootstrap
 Key: CASSANDRA-14063
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14063
 Project: Cassandra
  Issue Type: Bug
  Components: Auth
Reporter: Vincent White
Priority: Minor


This issue is closely related to CASSANDRA-11381. In this case, when a node 
joining the ring fails to complete the bootstrapping process with a streaming 
failure it will still always call 
org.apache.cassandra.service.CassandraDaemon#start and begin listening for 
client connections. If no authentication is configured clients are able to 
connect to the node and query the cluster much like write survey mode. But if 
authentication is enabled then it will cause an NPE because 
org.apache.cassandra.service.StorageService#doAuthSetup is only called after 
successfully completing the bootstrapping process. With the changes made in 
CASSANDRA-11381 we could now simply call doAuthSetup earlier since we don't 
have to worry about calling it multiple times but reading some of the concerns 
related to third party authentication classes, and now that "Incremental 
Bootstrapping" as described in CASSANDRA-8494 and CASSANDRA-8943, don't appear 
to be nearing implementation any time soon I would probably prefer that 
bootstrapping nodes simply didn't start listening for clients following a 
failed bootstrapping attempt. 

I've attached a quick and naive patch that demonstrates my desired behaviour. I 
ended up creating a new variable for this for clarity but I also had a bit of 
trouble finding already existing information that wasn't tied up in more 
complicated or transient processes that I could use to determine this 
particular state. I believe 
org.apache.cassandra.service.StorageService#isAuthSetupComplete would also work 
in this case so we could tie it to that instead. If someone has something 
simpler or knows the correct place I should be querying for this state from, I 
welcome all feedback. 

This [patch|https://github.com/vincewhite/cassandra/commits/system_auth_npe] 
also doesn't really address enabling/disabling thrift/binary via nodetool once 
the node is running. I wasn't sure if we should disallow it completely or 
include a force flag.


Cheers
-Vince



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14063) Cassandra will start listening for clients without initialising system_auth after a failed bootstrap

2017-11-20 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14063:
-
Status: Awaiting Feedback  (was: Open)

> Cassandra will start listening for clients without initialising system_auth 
> after a failed bootstrap
> 
>
> Key: CASSANDRA-14063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Auth
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Minor
>
> This issue is closely related to CASSANDRA-11381. In this case, when a node 
> joining the ring fails to complete the bootstrapping process with a streaming 
> failure it will still always call 
> org.apache.cassandra.service.CassandraDaemon#start and begin listening for 
> client connections. If no authentication is configured clients are able to 
> connect to the node and query the cluster much like write survey mode. But if 
> authentication is enabled then it will cause an NPE because 
> org.apache.cassandra.service.StorageService#doAuthSetup is only called after 
> successfully completing the bootstrapping process. With the changes made in 
> CASSANDRA-11381 we could now simply call doAuthSetup earlier since we don't 
> have to worry about calling it multiple times but reading some of the 
> concerns related to third party authentication classes, and now that 
> "Incremental Bootstrapping" as described in CASSANDRA-8494 and 
> CASSANDRA-8943, don't appear to be nearing implementation any time soon I 
> would probably prefer that bootstrapping nodes simply didn't start listening 
> for clients following a failed bootstrapping attempt. 
> I've attached a quick and naive patch that demonstrates my desired behaviour. 
> I ended up creating a new variable for this for clarity but I also had a bit 
> of trouble finding already existing information that wasn't tied up in more 
> complicated or transient processes that I could use to determine this 
> particular state. I believe 
> org.apache.cassandra.service.StorageService#isAuthSetupComplete would also 
> work in this case so we could tie it to that instead. If someone has 
> something simpler or knows the correct place I should be querying for this 
> state from, I welcome all feedback. 
> This [patch|https://github.com/vincewhite/cassandra/commits/system_auth_npe] 
> also doesn't really address enabling/disabling thrift/binary via nodetool 
> once the node is running. I wasn't sure if we should disallow it completely 
> or include a force flag.
> Cheers
> -Vince



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14063) Cassandra will start listening for clients without initialising system_auth after a failed bootstrap

2017-11-20 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves reassigned CASSANDRA-14063:


Assignee: Vincent White

> Cassandra will start listening for clients without initialising system_auth 
> after a failed bootstrap
> 
>
> Key: CASSANDRA-14063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Auth
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Minor
>
> This issue is closely related to CASSANDRA-11381. In this case, when a node 
> joining the ring fails to complete the bootstrapping process with a streaming 
> failure it will still always call 
> org.apache.cassandra.service.CassandraDaemon#start and begin listening for 
> client connections. If no authentication is configured clients are able to 
> connect to the node and query the cluster much like write survey mode. But if 
> authentication is enabled then it will cause an NPE because 
> org.apache.cassandra.service.StorageService#doAuthSetup is only called after 
> successfully completing the bootstrapping process. With the changes made in 
> CASSANDRA-11381 we could now simply call doAuthSetup earlier since we don't 
> have to worry about calling it multiple times but reading some of the 
> concerns related to third party authentication classes, and now that 
> "Incremental Bootstrapping" as described in CASSANDRA-8494 and 
> CASSANDRA-8943, don't appear to be nearing implementation any time soon I 
> would probably prefer that bootstrapping nodes simply didn't start listening 
> for clients following a failed bootstrapping attempt. 
> I've attached a quick and naive patch that demonstrates my desired behaviour. 
> I ended up creating a new variable for this for clarity but I also had a bit 
> of trouble finding already existing information that wasn't tied up in more 
> complicated or transient processes that I could use to determine this 
> particular state. I believe 
> org.apache.cassandra.service.StorageService#isAuthSetupComplete would also 
> work in this case so we could tie it to that instead. If someone has 
> something simpler or knows the correct place I should be querying for this 
> state from, I welcome all feedback. 
> This [patch|https://github.com/vincewhite/cassandra/commits/system_auth_npe] 
> also doesn't really address enabling/disabling thrift/binary via nodetool 
> once the node is running. I wasn't sure if we should disallow it completely 
> or include a force flag.
> Cheers
> -Vince



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky

2017-11-20 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-13963:
-
Reviewer: Robert Stupp

> SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild
>  is flaky
> -
>
> Key: CASSANDRA-13963
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13963
> Project: Cassandra
>  Issue Type: Bug
>  Components: Secondary Indexes, Testing
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
>Priority: Minor
> Fix For: 4.x
>
>
> The unit test 
> [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476]
>  is flaky. Apart from [the CI results showing a 3% 
> flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/],
>  the test failure can be locally reproduced just running the test multiple 
> times. In my case, it fails 2-5 times for each 1000 executions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13175) Integrate "Error Prone" Code Analyzer

2017-11-20 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259203#comment-16259203
 ] 

Stefan Podkowinski commented on CASSANDRA-13175:


For the record, I just rebased and checked the errorprone target of my wip 
branch again, but this time with the latest guava version that we now have in 
trunk. Now I'm getting a new error regarding the putBytes method in the guava 
Hasher used by the Validator. This method has been added in 23 and cannot be 
found with the guava version used by errorprone. I'd therefor suggest to wait 
until [#492|https://github.com/google/error-prone/issues/492] has been 
addressed by the errorprone project 

> Integrate "Error Prone" Code Analyzer
> -
>
> Key: CASSANDRA-13175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13175
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Attachments: 0001-Add-Error-Prone-code-analyzer.patch, 
> checks-2_2.out, checks-3_0.out, checks-trunk.out
>
>
> I've been playing with [Error Prone|http://errorprone.info/] by integrating 
> it into the build process and to see what kind of warnings it would produce. 
> So far I'm positively impressed by the coverage and usefulness of some of the 
> implemented checks. See attachments for results.
> Unfortunately there are still some issues on how the analyzer is effecting 
> generated code and used guava versions, see 
> [#492|https://github.com/google/error-prone/issues/492]. In case those issues 
> have been solved and the resulting code isn't affected by the analyzer, I'd 
> suggest to add it to trunk with warn only behaviour and some less useful 
> checks disabled. Alternatively a new ant target could be added, maybe with 
> build breaking checks and CI integration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13530) GroupCommitLogService

2017-11-20 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259253#comment-16259253
 ] 

Jason Brown edited comment on CASSANDRA-13530 at 11/20/17 1:51 PM:
---

Update on testing: it seems like periodic mode doesn't test {{CommitLogTest}} 
very well in the current state of things. As the existing test only exercised 
{{batch}} mode, and given the scope of this ticket is to add {{group}} mode, I 
propose to ignore (not commit) the {{PeriodicCommitLogTest}} and save that for 
a followup ticket as it's slightly outside the scope of the current ticket. 
I'll commit {{BatchCommitLogTest}} and {{GroupCommitLogTest}} to preserve the 
existing test for batch and add the test for group. wdyt, [~aweisberg]?

{{CommitLogStressTest}} began to timeout in circleci because each {{@Test}} 
method tested all configuration (compression, encryption, plain) and mode 
variants (periodic, batch, and group). I refactored it to be a 
{{@Parameterized}} test class, like {{CommitLogTest}}, and that didn't help. 
What did fix it was upping the {{test.long.timeout}} in the {{build.xml}} to 15 
minutes from 10, as the timeouts apply to the test class as a whole, not 
individual test methods. I can revert the refactor and just leave the build 
file change on commit, or we can keep both as the refactor tidied up a few 
things in that test class (purely subjective statement) - wdyt?



was (Author: jasobrown):
Update on testing: it seems like periodic mode doesn't test {{CommitLogTest}} 
very well in the current state of things. As the existing test only exercised 
{{batch}} mode, and given the scope of this ticket is to add {{group}} mode, I 
propose to ignore (not commit) the {{PeriodicCommitLogTest}} and save that for 
a followup ticket as it's slightly outside the scope of the current ticket. 
I'll commit {{BatchCommitLogTest}} and {{GroupCommitLogTest}} to preserve the 
existing test for batch and add the test for group. wdyt, @Ariel?

{{CommitLogStressTest}} began to timeout in circleci because each {{@Test}} 
method tested all configuration (compression, encryption, plain) and mode 
variants (periodic, batch, and group). I refactored it to be a 
{{@Parameterized}} test class, like {{CommitLogTest}}, and that didn't help. 
What did fix it was upping the {{test.long.timeout}} in the {{build.xml}} to 15 
minutes from 10, as the timeouts apply to the test class as a whole, not 
individual test methods. I can revert the refactor and just leave the build 
file change on commit, or we can keep both as the refactor tidied up a few 
things in that test class (purely subjective statement) - wdyt?


> GroupCommitLogService
> -
>
> Key: CASSANDRA-13530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13530
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuji Ito
>Assignee: Yuji Ito
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: GuavaRequestThread.java, MicroRequestThread.java, 
> groupAndBatch.png, groupCommit22.patch, groupCommit30.patch, 
> groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, 
> groupCommitLog_result.xlsx
>
>
> I propose a new CommitLogService, GroupCommitLogService, to improve the 
> throughput when lots of requests are received.
> It improved the throughput by maximum 94%.
> I'd like to discuss about this CommitLogService.
> Currently, we can select either 2 CommitLog services; Periodic and Batch.
> In Periodic, we might lose some commit log which hasn't written to the disk.
> In Batch, we can write commit log to the disk every time. The size of commit 
> log to write is too small (< 4KB). When high concurrency, these writes are 
> gathered and persisted to the disk at once. But, when insufficient 
> concurrency, many small writes are issued and the performance decreases due 
> to the latency of the disk. Even if you use SSD, processes of many IO 
> commands decrease the performance.
> GroupCommitLogService writes some commitlog to the disk at once.
> The patch adds GroupCommitLogService (It is enabled by setting 
> `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml).
> The difference from Batch is just only waiting for the semaphore.
> By waiting for the semaphore, some writes for commit logs are executed at the 
> same time.
> In GroupCommitLogService, the latency becomes worse if the there is no 
> concurrency.
> I measured the performance with my microbench (MicroRequestThread.java) by 
> increasing the number of threads.The cluster has 3 nodes (Replication factor: 
> 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume.
> The result is as below. The GroupCommitLogService with 10ms window improved 
> update with Paxos by 94% and improved select with Paxos by 76%.
> h6. SELECT / sec
> ||\# of threads||Batch 

[jira] [Updated] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky

2017-11-20 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-13963:
-
Status: Ready to Commit  (was: Patch Available)

> SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild
>  is flaky
> -
>
> Key: CASSANDRA-13963
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13963
> Project: Cassandra
>  Issue Type: Bug
>  Components: Secondary Indexes, Testing
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
>Priority: Minor
> Fix For: 4.x
>
>
> The unit test 
> [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476]
>  is flaky. Apart from [the CI results showing a 3% 
> flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/],
>  the test failure can be locally reproduced just running the test multiple 
> times. In my case, it fails 2-5 times for each 1000 executions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService

2017-11-20 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259253#comment-16259253
 ] 

Jason Brown commented on CASSANDRA-13530:
-

Update on testing: it seems like periodic mode doesn't test {{CommitLogTest}} 
very well in the current state of things. As the existing test only exercised 
{{batch}} mode, and given the scope of this ticket is to add {{group}} mode, I 
propose to ignore (not commit) the {{PeriodicCommitLogTest}} and save that for 
a followup ticket as it's slightly outside the scope of the current ticket. 
I'll commit {{BatchCommitLogTest}} and {{GroupCommitLogTest}} to preserve the 
existing test for batch and add the test for group. wdyt, @Ariel?

{{CommitLogStressTest}} began to timeout in circleci because each {{@Test}} 
method tested all configuration (compression, encryption, plain) and mode 
variants (periodic, batch, and group). I refactored it to be a 
{{@Parameterized}} test class, like {{CommitLogTest}}, and that didn't help. 
What did fix it was upping the {{test.long.timeout}} in the {{build.xml}} to 15 
minutes from 10, as the timeouts apply to the test class as a whole, not 
individual test methods. I can revert the refactor and just leave the build 
file change on commit, or we can keep both as the refactor tidied up a few 
things in that test class (purely subjective statement) - wdyt?


> GroupCommitLogService
> -
>
> Key: CASSANDRA-13530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13530
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuji Ito
>Assignee: Yuji Ito
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: GuavaRequestThread.java, MicroRequestThread.java, 
> groupAndBatch.png, groupCommit22.patch, groupCommit30.patch, 
> groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, 
> groupCommitLog_result.xlsx
>
>
> I propose a new CommitLogService, GroupCommitLogService, to improve the 
> throughput when lots of requests are received.
> It improved the throughput by maximum 94%.
> I'd like to discuss about this CommitLogService.
> Currently, we can select either 2 CommitLog services; Periodic and Batch.
> In Periodic, we might lose some commit log which hasn't written to the disk.
> In Batch, we can write commit log to the disk every time. The size of commit 
> log to write is too small (< 4KB). When high concurrency, these writes are 
> gathered and persisted to the disk at once. But, when insufficient 
> concurrency, many small writes are issued and the performance decreases due 
> to the latency of the disk. Even if you use SSD, processes of many IO 
> commands decrease the performance.
> GroupCommitLogService writes some commitlog to the disk at once.
> The patch adds GroupCommitLogService (It is enabled by setting 
> `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml).
> The difference from Batch is just only waiting for the semaphore.
> By waiting for the semaphore, some writes for commit logs are executed at the 
> same time.
> In GroupCommitLogService, the latency becomes worse if the there is no 
> concurrency.
> I measured the performance with my microbench (MicroRequestThread.java) by 
> increasing the number of threads.The cluster has 3 nodes (Replication factor: 
> 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume.
> The result is as below. The GroupCommitLogService with 10ms window improved 
> update with Paxos by 94% and improved select with Paxos by 76%.
> h6. SELECT / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|192|103|
> |2|163|212|
> |4|264|416|
> |8|454|800|
> |16|744|1311|
> |32|1151|1481|
> |64|1767|1844|
> |128|2949|3011|
> |256|4723|5000|
> h6. UPDATE / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|45|26|
> |2|39|51|
> |4|58|102|
> |8|102|198|
> |16|167|213|
> |32|289|295|
> |64|544|548|
> |128|1046|1058|
> |256|2020|2061|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky

2017-11-20 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259266#comment-16259266
 ] 

Robert Stupp commented on CASSANDRA-13963:
--

+1 on the patch

Your evaluation is correct and checking the number of builds is fine as the 
{{CREATE INDEX}} doesn't return before the index build is triggered.
Thanks for the patch!

> SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild
>  is flaky
> -
>
> Key: CASSANDRA-13963
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13963
> Project: Cassandra
>  Issue Type: Bug
>  Components: Secondary Indexes, Testing
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
>Priority: Minor
> Fix For: 4.x
>
>
> The unit test 
> [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476]
>  is flaky. Apart from [the CI results showing a 3% 
> flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/],
>  the test failure can be locally reproduced just running the test multiple 
> times. In my case, it fails 2-5 times for each 1000 executions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259303#comment-16259303
 ] 

ASF GitHub Bot commented on CASSANDRA-14055:


GitHub user ludovic-boutros opened a pull request:

https://github.com/apache/cassandra/pull/174

CASSANDRA-14055: Index redistribution breaks SASI index

During index redistribution process, a new view is created.
During this creation, old indexes should be released.

But, new indexes are "attached" to the same SSTable as the old indexes.

This leads to the deletion of the last SASI index file and breaks the index.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ludovic-boutros/cassandra fix/CASSANDRA-14055

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #174


commit 532ed86090c27e51c745c57678cd19ff4b606a0c
Author: lbout...@flatironsjouve.com 
Date:   2017-11-20T14:39:41Z

CASSANDRA-14055: Index redistribution breaks SASI index




> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>  Labels: patch
> Fix For: 3.11.x
>
> Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org