[jira] [Updated] (CASSANDRA-14060) Separate CorruptSSTableException and FSError handling policies
[ https://issues.apache.org/jira/browse/CASSANDRA-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14060: --- Status: Patch Available (was: Open) > Separate CorruptSSTableException and FSError handling policies > -- > > Key: CASSANDRA-14060 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14060 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > Currently, if > [{{disk_failure_policy}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L230] > is set to {{stop}} (default), StorageService will shutdown for {{FSError}}, > but not {{CorruptSSTableException}} > [DefaultFSErrorHandler.java:40|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java#L40]. > But when we use policy: {{die}}, it has different behave, JVM will be killed > for both {{FSError}} and {{CorruptSSTableException}} > [JVMStabilityInspector.java:63|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java#L63]: > ||{{disk_failure_policy}}|| hit {{FSError}} Exception || hit > {{CorruptSSTableException}} || > |{{stop}}| (/) stop | (x) not stop | > |{{die}}| (/) die | (/) die | > We saw {{CorruptSSTableException}} from time to time in our production, but > mostly it's *not* because of a disk issue. So I would suggest having a > separate policy for CorruptSSTable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14061) trunk eclipse-warnings
[ https://issues.apache.org/jira/browse/CASSANDRA-14061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang reassigned CASSANDRA-14061: -- Assignee: Jay Zhuang > trunk eclipse-warnings > -- > > Key: CASSANDRA-14061 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14061 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > {noformat} > eclipse-warnings: > [mkdir] Created dir: /home/ubuntu/cassandra/build/ecj > [echo] Running Eclipse Code Analysis. Output logged to > /home/ubuntu/cassandra/build/ecj/eclipse_compiler_checks.txt > [java] -- > [java] 1. ERROR in > /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java > (at line 59) > [java] return new SSTableIdentityIterator(sstable, key, > partitionLevelDeletion, file.getPath(), iterator); > [java] > ^^^ > [java] Potential resource leak: 'iterator' may not be closed at this > location > [java] -- > [java] 2. ERROR in > /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java > (at line 79) > [java] return new SSTableIdentityIterator(sstable, key, > partitionLevelDeletion, dfile.getPath(), iterator); > [java] > > [java] Potential resource leak: 'iterator' may not be closed at this > location > [java] -- > [java] 2 problems (2 errors) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14061) trunk eclipse-warnings
[ https://issues.apache.org/jira/browse/CASSANDRA-14061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14061: --- Status: Patch Available (was: Open) > trunk eclipse-warnings > -- > > Key: CASSANDRA-14061 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14061 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > {noformat} > eclipse-warnings: > [mkdir] Created dir: /home/ubuntu/cassandra/build/ecj > [echo] Running Eclipse Code Analysis. Output logged to > /home/ubuntu/cassandra/build/ecj/eclipse_compiler_checks.txt > [java] -- > [java] 1. ERROR in > /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java > (at line 59) > [java] return new SSTableIdentityIterator(sstable, key, > partitionLevelDeletion, file.getPath(), iterator); > [java] > ^^^ > [java] Potential resource leak: 'iterator' may not be closed at this > location > [java] -- > [java] 2. ERROR in > /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java > (at line 79) > [java] return new SSTableIdentityIterator(sstable, key, > partitionLevelDeletion, dfile.getPath(), iterator); > [java] > > [java] Potential resource leak: 'iterator' may not be closed at this > location > [java] -- > [java] 2 problems (2 errors) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14060) Separate CorruptSSTableException and FSError handling policies
[ https://issues.apache.org/jira/browse/CASSANDRA-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258916#comment-16258916 ] Jay Zhuang commented on CASSANDRA-14060: Here is the patch, please review: | Branch | uTest | | [14060|https://github.com/cooldoger/cassandra/tree/14060] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14060.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14060] | > Separate CorruptSSTableException and FSError handling policies > -- > > Key: CASSANDRA-14060 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14060 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > Currently, if > [{{disk_failure_policy}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L230] > is set to {{stop}} (default), StorageService will shutdown for {{FSError}}, > but not {{CorruptSSTableException}} > [DefaultFSErrorHandler.java:40|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java#L40]. > But when we use policy: {{die}}, it has different behave, JVM will be killed > for both {{FSError}} and {{CorruptSSTableException}} > [JVMStabilityInspector.java:63|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java#L63]: > ||{{disk_failure_policy}}|| hit {{FSError}} Exception || hit > {{CorruptSSTableException}} || > |{{stop}}| (/) stop | (x) not stop | > |{{die}}| (/) die | (/) die | > We saw {{CorruptSSTableException}} from time to time in our production, but > mostly it's *not* because of a disk issue. So I would suggest having a > separate policy for CorruptSSTable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14061) trunk eclipse-warnings
[ https://issues.apache.org/jira/browse/CASSANDRA-14061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258926#comment-16258926 ] Jay Zhuang commented on CASSANDRA-14061: Seems they're not the real problem, here is the patch to {{SupressWarning}}: | Branch | uTest | | [14061|https://github.com/cooldoger/cassandra/tree/14061] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14061.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14061] | Also, making {{ant eclipse-warnings}} required in circleci build. > trunk eclipse-warnings > -- > > Key: CASSANDRA-14061 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14061 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > {noformat} > eclipse-warnings: > [mkdir] Created dir: /home/ubuntu/cassandra/build/ecj > [echo] Running Eclipse Code Analysis. Output logged to > /home/ubuntu/cassandra/build/ecj/eclipse_compiler_checks.txt > [java] -- > [java] 1. ERROR in > /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java > (at line 59) > [java] return new SSTableIdentityIterator(sstable, key, > partitionLevelDeletion, file.getPath(), iterator); > [java] > ^^^ > [java] Potential resource leak: 'iterator' may not be closed at this > location > [java] -- > [java] 2. ERROR in > /home/ubuntu/cassandra/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java > (at line 79) > [java] return new SSTableIdentityIterator(sstable, key, > partitionLevelDeletion, dfile.getPath(), iterator); > [java] > > [java] Potential resource leak: 'iterator' may not be closed at this > location > [java] -- > [java] 2 problems (2 errors) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrés de la Peña updated CASSANDRA-13963: -- Resolution: Fixed Status: Resolved (was: Ready to Commit) > SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild > is flaky > - > > Key: CASSANDRA-13963 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13963 > Project: Cassandra > Issue Type: Bug > Components: Secondary Indexes, Testing >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña >Priority: Minor > Fix For: 4.x > > > The unit test > [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476] > is flaky. Apart from [the CI results showing a 3% > flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/], > the test failure can be locally reproduced just running the test multiple > times. In my case, it fails 2-5 times for each 1000 executions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259506#comment-16259506 ] Andrés de la Peña commented on CASSANDRA-13963: --- Thank for the review :) Committed to master as [5792b667ecf461a40cc391bc1496287547179c91|https://github.com/apache/cassandra/commit/5792b667ecf461a40cc391bc1496287547179c91] > SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild > is flaky > - > > Key: CASSANDRA-13963 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13963 > Project: Cassandra > Issue Type: Bug > Components: Secondary Indexes, Testing >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña >Priority: Minor > Fix For: 4.x > > > The unit test > [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476] > is flaky. Apart from [the CI results showing a 3% > flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/], > the test failure can be locally reproduced just running the test multiple > times. In my case, it fails 2-5 times for each 1000 executions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Fix flaky unit test indexWithFailedInitializationIsNotQueryableAfterPartialRebuild
Repository: cassandra Updated Branches: refs/heads/trunk da410153e -> 5792b667e Fix flaky unit test indexWithFailedInitializationIsNotQueryableAfterPartialRebuild patch by Andres de la Peña; reviewed by Robert Stupp for CASSANDRA-13963 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5792b667 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5792b667 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5792b667 Branch: refs/heads/trunk Commit: 5792b667ecf461a40cc391bc1496287547179c91 Parents: da41015 Author: AndreÌs de la PenÌaAuthored: Wed Oct 18 12:25:26 2017 +0100 Committer: AndreÌs de la PenÌa Committed: Mon Nov 20 17:07:45 2017 + -- CHANGES.txt | 1 + .../cassandra/index/SecondaryIndexManager.java | 13 .../org/apache/cassandra/cql3/CQLTester.java| 33 .../index/SecondaryIndexManagerTest.java| 6 ++-- 4 files changed, 51 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5792b667/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index a690e17..03f5de8 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Fix flaky indexWithFailedInitializationIsNotQueryableAfterPartialRebuild (CASSANDRA-13963) * Introduce leaf-only iterator (CASSANDRA-9988) * Upgrade Guava to 23.3 and Airline to 0.8 (CASSANDRA-13997) * Allow only one concurrent call to StatusLogger (CASSANDRA-12182) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5792b667/src/java/org/apache/cassandra/index/SecondaryIndexManager.java -- diff --git a/src/java/org/apache/cassandra/index/SecondaryIndexManager.java b/src/java/org/apache/cassandra/index/SecondaryIndexManager.java index 27a9b16..b60d811 100644 --- a/src/java/org/apache/cassandra/index/SecondaryIndexManager.java +++ b/src/java/org/apache/cassandra/index/SecondaryIndexManager.java @@ -276,6 +276,19 @@ public class SecondaryIndexManager implements IndexRegistry, INotificationConsum return queryableIndexes.contains(index.getIndexMetadata().name); } +/** + * Checks if the specified index has any running build task. + * + * @param indexName the index name + * @return {@code true} if the index is building, {@code false} otherwise + */ +@VisibleForTesting +public synchronized boolean isIndexBuilding(String indexName) +{ +AtomicInteger counter = inProgressBuilds.get(indexName); +return counter != null && counter.get() > 0; +} + public synchronized void removeIndex(String indexName) { Index index = unregisterIndex(indexName); http://git-wip-us.apache.org/repos/asf/cassandra/blob/5792b667/test/unit/org/apache/cassandra/cql3/CQLTester.java -- diff --git a/test/unit/org/apache/cassandra/cql3/CQLTester.java b/test/unit/org/apache/cassandra/cql3/CQLTester.java index 062a4bc..b038ce0 100644 --- a/test/unit/org/apache/cassandra/cql3/CQLTester.java +++ b/test/unit/org/apache/cassandra/cql3/CQLTester.java @@ -46,6 +46,7 @@ import com.datastax.driver.core.ResultSet; import org.apache.cassandra.SchemaLoader; import org.apache.cassandra.concurrent.ScheduledExecutors; +import org.apache.cassandra.index.SecondaryIndexManager; import org.apache.cassandra.schema.*; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.cql3.functions.FunctionName; @@ -736,6 +737,38 @@ public abstract class CQLTester return indexCreated; } +/** + * Index creation is asynchronous, this method waits until the specified index hasn't any building task running. + * + * This method differs from {@link #waitForIndex(String, String, String)} in that it doesn't require the index to be + * fully nor successfully built, so it can be used to wait for failing index builds. + * + * @param keyspace the index keyspace name + * @param indexName the index name + * @return {@code true} if the index build tasks have finished in 5 seconds, {@code false} otherwise + */ +protected boolean waitForIndexBuilds(String keyspace, String indexName) throws InterruptedException +{ +long start = System.currentTimeMillis(); +SecondaryIndexManager indexManager = getCurrentColumnFamilyStore(keyspace).indexManager; + +while (true) +{ +if (!indexManager.isIndexBuilding(indexName)) +{ +return true; +} +else if
[jira] [Commented] (CASSANDRA-14043) Lots of failures in test_network_topology_strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260036#comment-16260036 ] Joseph Lynch commented on CASSANDRA-14043: -- [~mkjellman] I'm trying to reproduce these failures/flakes so I can debug them, but these appear to pass locally for me against latest trunk running on Mac OSX (once I setup the 9 localhost aliases needed by Mac). Do these fail every time for you or is it more of a flakey experience? e.g. {noformat} ~/pg/cassandra-dtest-mirror(CASSANDRA-14043*) » nosetests --no-skip -x consistency_test.py ... -- Ran 24 tests in 1388.105s OK {noformat} > Lots of failures in test_network_topology_strategy > -- > > Key: CASSANDRA-14043 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14043 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Michael Kjellman > > There are lots of failures in test_network_topology_strategy... Creating one > JIRA to track them all... > test_network_topology_strategy - consistency_test.TestAvailability > [Errno 2] No such file or directory: > '/tmp/dtest-H1psQ0/test/node1/logs/system.log' > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-H1psQ0 > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'num_tokens': '32', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 358, in run > self.tearDown() > File "/home/cassandra/cassandra-dtest/dtest.py", line 597, in tearDown > if not self.allow_log_errors and self.check_logs_for_errors(): > File "/home/cassandra/cassandra-dtest/dtest.py", line 614, in > check_logs_for_errors > ['\n'.join(msg) for msg in node.grep_log_for_errors()])) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line > 386, in grep_log_for_errors > return self.grep_log_for_errors_from(seek_start=getattr(self, > 'error_mark', 0)) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line > 389, in grep_log_for_errors_from > with open(os.path.join(self.get_path(), 'logs', filename)) as f: > "[Errno 2] No such file or directory: > '/tmp/dtest-H1psQ0/test/node1/logs/system.log'\n >> begin > captured logging << \ndtest: DEBUG: cluster ccm > directory: /tmp/dtest-H1psQ0\ndtest: DEBUG: Done setting configuration > options:\n{ 'initial_token': None,\n'num_tokens': '32',\n > 'phi_convict_threshold': 5,\n'range_request_timeout_in_ms': 1,\n > 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n > 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': > 1}\n- >> end captured logging << > -" > test_network_topology_strategy_counters - consistency_test.TestAccuracy > Error starting node3. > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-3px8TH > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'num_tokens': '32', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > dtest: DEBUG: Testing multiple dcs, counters > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 753, in > test_network_topology_strategy_counters > > self._run_test_function_in_parallel(TestAccuracy.Validation.validate_counters, > self.nodes, self.rf.values(), combinations), > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in > _run_test_function_in_parallel > self._start_cluster(save_sessions=True, > requires_local_reads=requires_local_reads) > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 141, in > _start_cluster > cluster.start(wait_for_binary_proto=True, wait_other_notice=True) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", > line 423, in start > raise
[jira] [Created] (CASSANDRA-14062) Pluggable CommitLog
Rei Odaira created CASSANDRA-14062: -- Summary: Pluggable CommitLog Key: CASSANDRA-14062 URL: https://issues.apache.org/jira/browse/CASSANDRA-14062 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Rei Odaira Fix For: 4.x Attachments: pluggable-commitlog-src.patch, pluggable-commitlog-test.patch This proposal is to make CommitLog pluggable, as discussed in [the Cassandra dev mailing list|https://lists.apache.org/thread.html/1936194d86f5954fa099ced9a0733458eb3249bff3fae3e03e2d1bd8@%3Cdev.cassandra.apache.org%3E]. We are developing a Cassandra plugin to store CommitLog on our low-latency Flash device (CAPI-Flash). To do that, the original CommitLog interface must be changed to allow plugins. Synching to CommitLog is one of the performance bottlenecks in Cassandra especially with batch commit. I think the pluggable CommitLog will allow other interesting alternatives, such as one using SPDK. Our high-level design is similar to the CacheProvider framework in org.apache.cassandra.cache: * Introduce a new interface, ICommitLog, with methods like getCurrentPosition(), add(), shutdownBlocking(), etc. * CommitLog implements ICommitLog. * Introduce a new interface, CommitLogProvider, with a create() method, returning ICommitLog. * Introduce a new class FileCommitLogProvider implementing CommitLogProvider, to return a singleton instance of CommitLog. * Introduce a new property in cassandra.yaml, commitlog_class_name, which specifies what CommitLogProvider to use. The default is FileCommitLogProvider. * Introduce a new class, CommitLogHelper, that loads the class specified by the commitlog_class_name property, creates an instance, and stores it to CommitLogHelper.instance. * Replace all of the references to CommitLog.instance with CommitLogHelper.instance. Attached are two patches. "pluggable-commitlog-src.patch" is for changes in the src directory, and "pluggable-commitlog-test.patch" is for the test directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14043) Lots of failures in test_network_topology_strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260202#comment-16260202 ] Jason Brown commented on CASSANDRA-14043: - These kind of errors usually seem to be environment related. If they are flaky (esp. failures like {{"Error starting node1."}}), it might not be that test specifically, but something in it may trigger the behavior. Perhaps you run a single test function in a loop and wait until it fails. Here's a [naive script|https://gist.github.com/jasobrown/b293aaa989bf86c3a0d1e0bef1541e5b] I use. It's not brilliant, but useful > Lots of failures in test_network_topology_strategy > -- > > Key: CASSANDRA-14043 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14043 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Michael Kjellman > > There are lots of failures in test_network_topology_strategy... Creating one > JIRA to track them all... > test_network_topology_strategy - consistency_test.TestAvailability > [Errno 2] No such file or directory: > '/tmp/dtest-H1psQ0/test/node1/logs/system.log' > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-H1psQ0 > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'num_tokens': '32', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 358, in run > self.tearDown() > File "/home/cassandra/cassandra-dtest/dtest.py", line 597, in tearDown > if not self.allow_log_errors and self.check_logs_for_errors(): > File "/home/cassandra/cassandra-dtest/dtest.py", line 614, in > check_logs_for_errors > ['\n'.join(msg) for msg in node.grep_log_for_errors()])) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line > 386, in grep_log_for_errors > return self.grep_log_for_errors_from(seek_start=getattr(self, > 'error_mark', 0)) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line > 389, in grep_log_for_errors_from > with open(os.path.join(self.get_path(), 'logs', filename)) as f: > "[Errno 2] No such file or directory: > '/tmp/dtest-H1psQ0/test/node1/logs/system.log'\n >> begin > captured logging << \ndtest: DEBUG: cluster ccm > directory: /tmp/dtest-H1psQ0\ndtest: DEBUG: Done setting configuration > options:\n{ 'initial_token': None,\n'num_tokens': '32',\n > 'phi_convict_threshold': 5,\n'range_request_timeout_in_ms': 1,\n > 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n > 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': > 1}\n- >> end captured logging << > -" > test_network_topology_strategy_counters - consistency_test.TestAccuracy > Error starting node3. > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-3px8TH > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'num_tokens': '32', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > dtest: DEBUG: Testing multiple dcs, counters > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 753, in > test_network_topology_strategy_counters > > self._run_test_function_in_parallel(TestAccuracy.Validation.validate_counters, > self.nodes, self.rf.values(), combinations), > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in > _run_test_function_in_parallel > self._start_cluster(save_sessions=True, > requires_local_reads=requires_local_reads) > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 141, in > _start_cluster > cluster.start(wait_for_binary_proto=True, wait_other_notice=True) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", > line 423, in start > raise NodeError("Error starting {0}.".format(node.name), p) > "Error starting node3.\n >> begin captured logging << >
[jira] [Commented] (CASSANDRA-14051) Many materialized_views_test are busted
[ https://issues.apache.org/jira/browse/CASSANDRA-14051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260219#comment-16260219 ] Kurt Greaves commented on CASSANDRA-14051: -- Just going to dump relevant logs/error messages here regarding different tests while I work on them... Some are flaky and take awhile to reproduce. {{add_node_after_very_wide_mv_test}} fails on bootstrap of the fourth node due to: {code} INFO [Stream-Deserializer-/127.0.0.1:7000-b69a2f59] 2017-11-21 03:31:33,964 StreamResultFuture.java:193 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] Session with /127.0.0.1 is complete WARN [Stream-Deserializer-/127.0.0.2:44789-46b17cc7] 2017-11-21 03:31:34,186 CompressedStreamReader.java:111 - [Stream 77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] Error while reading partition DecoratedKey(-3248873570005575792, 0002) from stream on ks='ks' and table='t'. ERROR [Stream-Deserializer-/127.0.0.2:44789-46b17cc7] 2017-11-21 03:31:34,199 StreamSession.java:617 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] Streaming error occurred on session with peer 127.0.0.2 org.apache.cassandra.streaming.StreamReceiveException: java.lang.AssertionError: stream can only read forward. at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:63) ~[main/:na] at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:41) ~[main/:na] at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55) ~[main/:na] at org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:178) ~[main/:na] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: java.lang.AssertionError: stream can only read forward. at org.apache.cassandra.streaming.compress.CompressedInputStream.position(CompressedInputStream.java:108) ~[main/:na] at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:94) ~[main/:na] at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:58) ~[main/:na] ... 4 common frames omitted INFO [Stream-Deserializer-/127.0.0.2:44789-46b17cc7] 2017-11-21 03:31:34,201 StreamResultFuture.java:193 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] Session with /127.0.0.2 is complete INFO [Stream-Deserializer-/127.0.0.3:7000-099ad298] 2017-11-21 03:31:34,372 StreamResultFuture.java:179 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad ID#0] Prepare completed. Receiving 2 files(49.251KiB), sending 0 files(0.000KiB) INFO [Stream-Deserializer-/127.0.0.3:7000-099ad298] 2017-11-21 03:31:35,575 StreamResultFuture.java:193 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] Session with /127.0.0.3 is complete WARN [Stream-Deserializer-/127.0.0.3:7000-099ad298] 2017-11-21 03:31:35,584 StreamResultFuture.java:220 - [Stream #77f16cb0-ce6c-11e7-b953-bbf6c637c4ad] Stream failed ERROR [main] 2017-11-21 03:31:35,585 StorageService.java:1487 - Error while waiting on bootstrap to complete. Bootstrap will have to be restarted. java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:528) ~[guava-23.3-jre.jar:na] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:507) ~[guava-23.3-jre.jar:na] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1482) [main/:na] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:932) [main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:637) [main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:569) [main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) [main/:na] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[main/:na] at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1371) ~[guava-23.3-jre.jar:na] at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399) ~[guava-23.3-jre.jar:na] at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1016) ~[guava-23.3-jre.jar:na] at
[jira] [Assigned] (CASSANDRA-14051) Many materialized_views_test are busted
[ https://issues.apache.org/jira/browse/CASSANDRA-14051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Greaves reassigned CASSANDRA-14051: Assignee: Kurt Greaves > Many materialized_views_test are busted > --- > > Key: CASSANDRA-14051 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14051 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Michael Kjellman >Assignee: Kurt Greaves > > Many materialized_views_test are busted... For now we should disable the > entire MV test suite until effort is put into making these test usable and > helpful. Currently they aren't helpful and almost all fail. > test_base_column_in_view_pk_commutative_tombstone_without_flush - > materialized_views_test.TestMaterializedViews > test_base_column_in_view_pk_complex_timestamp_with_flush - > materialized_views_test.TestMaterializedViews > add_dc_after_mv_network_replication_test - > materialized_views_test.TestMaterializedViews > add_dc_after_mv_simple_replication_test - > materialized_views_test.TestMaterializedViews > add_node_after_mv_test - materialized_views_test.TestMaterializedViews > add_node_after_very_wide_mv_test - > materialized_views_test.TestMaterializedViews > add_node_after_wide_mv_with_range_deletions_test - > materialized_views_test.TestMaterializedViews -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14063) Cassandra will start listening for clients without initialising system_auth after a failed bootstrap
Vincent White created CASSANDRA-14063: - Summary: Cassandra will start listening for clients without initialising system_auth after a failed bootstrap Key: CASSANDRA-14063 URL: https://issues.apache.org/jira/browse/CASSANDRA-14063 Project: Cassandra Issue Type: Bug Components: Auth Reporter: Vincent White Priority: Minor This issue is closely related to CASSANDRA-11381. In this case, when a node joining the ring fails to complete the bootstrapping process with a streaming failure it will still always call org.apache.cassandra.service.CassandraDaemon#start and begin listening for client connections. If no authentication is configured clients are able to connect to the node and query the cluster much like write survey mode. But if authentication is enabled then it will cause an NPE because org.apache.cassandra.service.StorageService#doAuthSetup is only called after successfully completing the bootstrapping process. With the changes made in CASSANDRA-11381 we could now simply call doAuthSetup earlier since we don't have to worry about calling it multiple times but reading some of the concerns related to third party authentication classes, and now that "Incremental Bootstrapping" as described in CASSANDRA-8494 and CASSANDRA-8943, don't appear to be nearing implementation any time soon I would probably prefer that bootstrapping nodes simply didn't start listening for clients following a failed bootstrapping attempt. I've attached a quick and naive patch that demonstrates my desired behaviour. I ended up creating a new variable for this for clarity but I also had a bit of trouble finding already existing information that wasn't tied up in more complicated or transient processes that I could use to determine this particular state. I believe org.apache.cassandra.service.StorageService#isAuthSetupComplete would also work in this case so we could tie it to that instead. If someone has something simpler or knows the correct place I should be querying for this state from, I welcome all feedback. This [patch|https://github.com/vincewhite/cassandra/commits/system_auth_npe] also doesn't really address enabling/disabling thrift/binary via nodetool once the node is running. I wasn't sure if we should disallow it completely or include a force flag. Cheers -Vince -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14063) Cassandra will start listening for clients without initialising system_auth after a failed bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Greaves updated CASSANDRA-14063: - Status: Awaiting Feedback (was: Open) > Cassandra will start listening for clients without initialising system_auth > after a failed bootstrap > > > Key: CASSANDRA-14063 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14063 > Project: Cassandra > Issue Type: Bug > Components: Auth >Reporter: Vincent White >Assignee: Vincent White >Priority: Minor > > This issue is closely related to CASSANDRA-11381. In this case, when a node > joining the ring fails to complete the bootstrapping process with a streaming > failure it will still always call > org.apache.cassandra.service.CassandraDaemon#start and begin listening for > client connections. If no authentication is configured clients are able to > connect to the node and query the cluster much like write survey mode. But if > authentication is enabled then it will cause an NPE because > org.apache.cassandra.service.StorageService#doAuthSetup is only called after > successfully completing the bootstrapping process. With the changes made in > CASSANDRA-11381 we could now simply call doAuthSetup earlier since we don't > have to worry about calling it multiple times but reading some of the > concerns related to third party authentication classes, and now that > "Incremental Bootstrapping" as described in CASSANDRA-8494 and > CASSANDRA-8943, don't appear to be nearing implementation any time soon I > would probably prefer that bootstrapping nodes simply didn't start listening > for clients following a failed bootstrapping attempt. > I've attached a quick and naive patch that demonstrates my desired behaviour. > I ended up creating a new variable for this for clarity but I also had a bit > of trouble finding already existing information that wasn't tied up in more > complicated or transient processes that I could use to determine this > particular state. I believe > org.apache.cassandra.service.StorageService#isAuthSetupComplete would also > work in this case so we could tie it to that instead. If someone has > something simpler or knows the correct place I should be querying for this > state from, I welcome all feedback. > This [patch|https://github.com/vincewhite/cassandra/commits/system_auth_npe] > also doesn't really address enabling/disabling thrift/binary via nodetool > once the node is running. I wasn't sure if we should disallow it completely > or include a force flag. > Cheers > -Vince -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14063) Cassandra will start listening for clients without initialising system_auth after a failed bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Greaves reassigned CASSANDRA-14063: Assignee: Vincent White > Cassandra will start listening for clients without initialising system_auth > after a failed bootstrap > > > Key: CASSANDRA-14063 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14063 > Project: Cassandra > Issue Type: Bug > Components: Auth >Reporter: Vincent White >Assignee: Vincent White >Priority: Minor > > This issue is closely related to CASSANDRA-11381. In this case, when a node > joining the ring fails to complete the bootstrapping process with a streaming > failure it will still always call > org.apache.cassandra.service.CassandraDaemon#start and begin listening for > client connections. If no authentication is configured clients are able to > connect to the node and query the cluster much like write survey mode. But if > authentication is enabled then it will cause an NPE because > org.apache.cassandra.service.StorageService#doAuthSetup is only called after > successfully completing the bootstrapping process. With the changes made in > CASSANDRA-11381 we could now simply call doAuthSetup earlier since we don't > have to worry about calling it multiple times but reading some of the > concerns related to third party authentication classes, and now that > "Incremental Bootstrapping" as described in CASSANDRA-8494 and > CASSANDRA-8943, don't appear to be nearing implementation any time soon I > would probably prefer that bootstrapping nodes simply didn't start listening > for clients following a failed bootstrapping attempt. > I've attached a quick and naive patch that demonstrates my desired behaviour. > I ended up creating a new variable for this for clarity but I also had a bit > of trouble finding already existing information that wasn't tied up in more > complicated or transient processes that I could use to determine this > particular state. I believe > org.apache.cassandra.service.StorageService#isAuthSetupComplete would also > work in this case so we could tie it to that instead. If someone has > something simpler or knows the correct place I should be querying for this > state from, I welcome all feedback. > This [patch|https://github.com/vincewhite/cassandra/commits/system_auth_npe] > also doesn't really address enabling/disabling thrift/binary via nodetool > once the node is running. I wasn't sure if we should disallow it completely > or include a force flag. > Cheers > -Vince -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-13963: - Reviewer: Robert Stupp > SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild > is flaky > - > > Key: CASSANDRA-13963 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13963 > Project: Cassandra > Issue Type: Bug > Components: Secondary Indexes, Testing >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña >Priority: Minor > Fix For: 4.x > > > The unit test > [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476] > is flaky. Apart from [the CI results showing a 3% > flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/], > the test failure can be locally reproduced just running the test multiple > times. In my case, it fails 2-5 times for each 1000 executions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13175) Integrate "Error Prone" Code Analyzer
[ https://issues.apache.org/jira/browse/CASSANDRA-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259203#comment-16259203 ] Stefan Podkowinski commented on CASSANDRA-13175: For the record, I just rebased and checked the errorprone target of my wip branch again, but this time with the latest guava version that we now have in trunk. Now I'm getting a new error regarding the putBytes method in the guava Hasher used by the Validator. This method has been added in 23 and cannot be found with the guava version used by errorprone. I'd therefor suggest to wait until [#492|https://github.com/google/error-prone/issues/492] has been addressed by the errorprone project > Integrate "Error Prone" Code Analyzer > - > > Key: CASSANDRA-13175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13175 > Project: Cassandra > Issue Type: Improvement >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski > Attachments: 0001-Add-Error-Prone-code-analyzer.patch, > checks-2_2.out, checks-3_0.out, checks-trunk.out > > > I've been playing with [Error Prone|http://errorprone.info/] by integrating > it into the build process and to see what kind of warnings it would produce. > So far I'm positively impressed by the coverage and usefulness of some of the > implemented checks. See attachments for results. > Unfortunately there are still some issues on how the analyzer is effecting > generated code and used guava versions, see > [#492|https://github.com/google/error-prone/issues/492]. In case those issues > have been solved and the resulting code isn't affected by the analyzer, I'd > suggest to add it to trunk with warn only behaviour and some less useful > checks disabled. Alternatively a new ant target could be added, maybe with > build breaking checks and CI integration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13530) GroupCommitLogService
[ https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259253#comment-16259253 ] Jason Brown edited comment on CASSANDRA-13530 at 11/20/17 1:51 PM: --- Update on testing: it seems like periodic mode doesn't test {{CommitLogTest}} very well in the current state of things. As the existing test only exercised {{batch}} mode, and given the scope of this ticket is to add {{group}} mode, I propose to ignore (not commit) the {{PeriodicCommitLogTest}} and save that for a followup ticket as it's slightly outside the scope of the current ticket. I'll commit {{BatchCommitLogTest}} and {{GroupCommitLogTest}} to preserve the existing test for batch and add the test for group. wdyt, [~aweisberg]? {{CommitLogStressTest}} began to timeout in circleci because each {{@Test}} method tested all configuration (compression, encryption, plain) and mode variants (periodic, batch, and group). I refactored it to be a {{@Parameterized}} test class, like {{CommitLogTest}}, and that didn't help. What did fix it was upping the {{test.long.timeout}} in the {{build.xml}} to 15 minutes from 10, as the timeouts apply to the test class as a whole, not individual test methods. I can revert the refactor and just leave the build file change on commit, or we can keep both as the refactor tidied up a few things in that test class (purely subjective statement) - wdyt? was (Author: jasobrown): Update on testing: it seems like periodic mode doesn't test {{CommitLogTest}} very well in the current state of things. As the existing test only exercised {{batch}} mode, and given the scope of this ticket is to add {{group}} mode, I propose to ignore (not commit) the {{PeriodicCommitLogTest}} and save that for a followup ticket as it's slightly outside the scope of the current ticket. I'll commit {{BatchCommitLogTest}} and {{GroupCommitLogTest}} to preserve the existing test for batch and add the test for group. wdyt, @Ariel? {{CommitLogStressTest}} began to timeout in circleci because each {{@Test}} method tested all configuration (compression, encryption, plain) and mode variants (periodic, batch, and group). I refactored it to be a {{@Parameterized}} test class, like {{CommitLogTest}}, and that didn't help. What did fix it was upping the {{test.long.timeout}} in the {{build.xml}} to 15 minutes from 10, as the timeouts apply to the test class as a whole, not individual test methods. I can revert the refactor and just leave the build file change on commit, or we can keep both as the refactor tidied up a few things in that test class (purely subjective statement) - wdyt? > GroupCommitLogService > - > > Key: CASSANDRA-13530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13530 > Project: Cassandra > Issue Type: Improvement >Reporter: Yuji Ito >Assignee: Yuji Ito > Fix For: 2.2.x, 3.0.x, 3.11.x > > Attachments: GuavaRequestThread.java, MicroRequestThread.java, > groupAndBatch.png, groupCommit22.patch, groupCommit30.patch, > groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, > groupCommitLog_result.xlsx > > > I propose a new CommitLogService, GroupCommitLogService, to improve the > throughput when lots of requests are received. > It improved the throughput by maximum 94%. > I'd like to discuss about this CommitLogService. > Currently, we can select either 2 CommitLog services; Periodic and Batch. > In Periodic, we might lose some commit log which hasn't written to the disk. > In Batch, we can write commit log to the disk every time. The size of commit > log to write is too small (< 4KB). When high concurrency, these writes are > gathered and persisted to the disk at once. But, when insufficient > concurrency, many small writes are issued and the performance decreases due > to the latency of the disk. Even if you use SSD, processes of many IO > commands decrease the performance. > GroupCommitLogService writes some commitlog to the disk at once. > The patch adds GroupCommitLogService (It is enabled by setting > `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml). > The difference from Batch is just only waiting for the semaphore. > By waiting for the semaphore, some writes for commit logs are executed at the > same time. > In GroupCommitLogService, the latency becomes worse if the there is no > concurrency. > I measured the performance with my microbench (MicroRequestThread.java) by > increasing the number of threads.The cluster has 3 nodes (Replication factor: > 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume. > The result is as below. The GroupCommitLogService with 10ms window improved > update with Paxos by 94% and improved select with Paxos by 76%. > h6. SELECT / sec > ||\# of threads||Batch
[jira] [Updated] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-13963: - Status: Ready to Commit (was: Patch Available) > SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild > is flaky > - > > Key: CASSANDRA-13963 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13963 > Project: Cassandra > Issue Type: Bug > Components: Secondary Indexes, Testing >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña >Priority: Minor > Fix For: 4.x > > > The unit test > [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476] > is flaky. Apart from [the CI results showing a 3% > flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/], > the test failure can be locally reproduced just running the test multiple > times. In my case, it fails 2-5 times for each 1000 executions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService
[ https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259253#comment-16259253 ] Jason Brown commented on CASSANDRA-13530: - Update on testing: it seems like periodic mode doesn't test {{CommitLogTest}} very well in the current state of things. As the existing test only exercised {{batch}} mode, and given the scope of this ticket is to add {{group}} mode, I propose to ignore (not commit) the {{PeriodicCommitLogTest}} and save that for a followup ticket as it's slightly outside the scope of the current ticket. I'll commit {{BatchCommitLogTest}} and {{GroupCommitLogTest}} to preserve the existing test for batch and add the test for group. wdyt, @Ariel? {{CommitLogStressTest}} began to timeout in circleci because each {{@Test}} method tested all configuration (compression, encryption, plain) and mode variants (periodic, batch, and group). I refactored it to be a {{@Parameterized}} test class, like {{CommitLogTest}}, and that didn't help. What did fix it was upping the {{test.long.timeout}} in the {{build.xml}} to 15 minutes from 10, as the timeouts apply to the test class as a whole, not individual test methods. I can revert the refactor and just leave the build file change on commit, or we can keep both as the refactor tidied up a few things in that test class (purely subjective statement) - wdyt? > GroupCommitLogService > - > > Key: CASSANDRA-13530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13530 > Project: Cassandra > Issue Type: Improvement >Reporter: Yuji Ito >Assignee: Yuji Ito > Fix For: 2.2.x, 3.0.x, 3.11.x > > Attachments: GuavaRequestThread.java, MicroRequestThread.java, > groupAndBatch.png, groupCommit22.patch, groupCommit30.patch, > groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, > groupCommitLog_result.xlsx > > > I propose a new CommitLogService, GroupCommitLogService, to improve the > throughput when lots of requests are received. > It improved the throughput by maximum 94%. > I'd like to discuss about this CommitLogService. > Currently, we can select either 2 CommitLog services; Periodic and Batch. > In Periodic, we might lose some commit log which hasn't written to the disk. > In Batch, we can write commit log to the disk every time. The size of commit > log to write is too small (< 4KB). When high concurrency, these writes are > gathered and persisted to the disk at once. But, when insufficient > concurrency, many small writes are issued and the performance decreases due > to the latency of the disk. Even if you use SSD, processes of many IO > commands decrease the performance. > GroupCommitLogService writes some commitlog to the disk at once. > The patch adds GroupCommitLogService (It is enabled by setting > `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml). > The difference from Batch is just only waiting for the semaphore. > By waiting for the semaphore, some writes for commit logs are executed at the > same time. > In GroupCommitLogService, the latency becomes worse if the there is no > concurrency. > I measured the performance with my microbench (MicroRequestThread.java) by > increasing the number of threads.The cluster has 3 nodes (Replication factor: > 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume. > The result is as below. The GroupCommitLogService with 10ms window improved > update with Paxos by 94% and improved select with Paxos by 76%. > h6. SELECT / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|192|103| > |2|163|212| > |4|264|416| > |8|454|800| > |16|744|1311| > |32|1151|1481| > |64|1767|1844| > |128|2949|3011| > |256|4723|5000| > h6. UPDATE / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|45|26| > |2|39|51| > |4|58|102| > |8|102|198| > |16|167|213| > |32|289|295| > |64|544|548| > |128|1046|1058| > |256|2020|2061| -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13963) SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild is flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259266#comment-16259266 ] Robert Stupp commented on CASSANDRA-13963: -- +1 on the patch Your evaluation is correct and checking the number of builds is fine as the {{CREATE INDEX}} doesn't return before the index build is triggered. Thanks for the patch! > SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild > is flaky > - > > Key: CASSANDRA-13963 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13963 > Project: Cassandra > Issue Type: Bug > Components: Secondary Indexes, Testing >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña >Priority: Minor > Fix For: 4.x > > > The unit test > [SecondaryIndexManagerTest.indexWithfailedInitializationIsNotQueryableAfterPartialRebuild|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java#L460-L476] > is flaky. Apart from [the CI results showing a 3% > flakiness|http://cassci.datastax.com/view/All_Jobs/job/trunk_utest/2430/testReport/org.apache.cassandra.index/SecondaryIndexManagerTest/indexWithfailedInitializationIsNotQueryableAfterPartialRebuild/], > the test failure can be locally reproduced just running the test multiple > times. In my case, it fails 2-5 times for each 1000 executions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259303#comment-16259303 ] ASF GitHub Bot commented on CASSANDRA-14055: GitHub user ludovic-boutros opened a pull request: https://github.com/apache/cassandra/pull/174 CASSANDRA-14055: Index redistribution breaks SASI index During index redistribution process, a new view is created. During this creation, old indexes should be released. But, new indexes are "attached" to the same SSTable as the old indexes. This leads to the deletion of the last SASI index file and breaks the index. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ludovic-boutros/cassandra fix/CASSANDRA-14055 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/174.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #174 commit 532ed86090c27e51c745c57678cd19ff4b606a0c Author: lbout...@flatironsjouve.comDate: 2017-11-20T14:39:41Z CASSANDRA-14055: Index redistribution breaks SASI index > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros > Labels: patch > Fix For: 3.11.x > > Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org