[jira] [Updated] (CASSANDRA-12365) Document cassandra stress

2016-10-17 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-12365:
--
Status: Open  (was: Ready to Commit)

> Document cassandra stress
> -
>
> Key: CASSANDRA-12365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12365
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Christopher Batey
>Assignee: Christopher Batey
>Priority: Minor
>  Labels: stress
> Fix For: 3.x
>
>
> I've started on this so creating a ticket to avoid duplicate work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12725) dtest failure in repair_tests.incremental_repair_test.TestIncRepair.sstable_marking_test_not_intersecting_all_ranges

2016-10-17 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582829#comment-15582829
 ] 

Jim Witschey commented on CASSANDRA-12725:
--

bq. What should happen if we run this tool on a node with no sstables

I feel like the way the tool failed here is reasonable. The failure message 
could be better.

We should check that the tool fails on a node with no SSTables.

> dtest failure in 
> repair_tests.incremental_repair_test.TestIncRepair.sstable_marking_test_not_intersecting_all_ranges
> 
>
> Key: CASSANDRA-12725
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12725
> Project: Cassandra
>  Issue Type: Test
>Reporter: Sean McCarthy
>Assignee: DS Test Eng
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_offheap_dtest/406/testReport/repair_tests.incremental_repair_test/TestIncRepair/sstable_marking_test_not_intersecting_all_ranges
> {code}
> Error Message
> Subprocess sstablemetadata on keyspace: keyspace1, column_family: None exited 
> with non-zero status; exit status: 1; 
> stdout: 
> usage: Usage: sstablemetadata [--gc_grace_seconds n] 
> Dump contents of given SSTable to standard output in JSON format.
> --gc_grace_secondsThe gc_grace_seconds to use when
>calculating droppable tombstones
> {code}
> {code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File 
> "/home/automaton/cassandra-dtest/repair_tests/incremental_repair_test.py", 
> line 366, in sstable_marking_test_not_intersecting_all_ranges
> for out in (node.run_sstablemetadata(keyspace='keyspace1').stdout for 
> node in cluster.nodelist()):
>   File 
> "/home/automaton/cassandra-dtest/repair_tests/incremental_repair_test.py", 
> line 366, in 
> for out in (node.run_sstablemetadata(keyspace='keyspace1').stdout for 
> node in cluster.nodelist()):
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1021, in 
> run_sstablemetadata
> return handle_external_tool_process(p, "sstablemetadata on keyspace: {}, 
> column_family: {}".format(keyspace, column_families))
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1983, in 
> handle_external_tool_process
> raise ToolError(cmd_args, rc, out, err)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.X

2016-10-17 Thread carl
Merge branch 'cassandra-3.0' into cassandra-3.X


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f5bc3784
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f5bc3784
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f5bc3784

Branch: refs/heads/cassandra-3.X
Commit: f5bc3784fb08741b2d4636dd09061b06c3d893d6
Parents: 720acc6 bc9a079
Author: Carl Yeksigian 
Authored: Mon Oct 17 16:16:29 2016 -0400
Committer: Carl Yeksigian 
Committed: Mon Oct 17 16:16:29 2016 -0400

--
 .../apache/cassandra/cql3/ReservedKeywords.java | 28 ++---
 .../cassandra/cql3/ReservedKeywordsTest.java| 43 
 2 files changed, 46 insertions(+), 25 deletions(-)
--




[3/6] cassandra git commit: Update reserved keyword list

2016-10-17 Thread carl
Update reserved keyword list

Patch by Carl Yeksigian; reviewed by Alex Petrov for CASSANDRA-11803


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bc9a0793
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bc9a0793
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bc9a0793

Branch: refs/heads/trunk
Commit: bc9a0793944f7dd481646c4014d13b844439906c
Parents: 7872318
Author: Carl Yeksigian 
Authored: Mon Oct 17 16:15:49 2016 -0400
Committer: Carl Yeksigian 
Committed: Mon Oct 17 16:15:49 2016 -0400

--
 .../apache/cassandra/cql3/ReservedKeywords.java | 28 ++---
 .../cassandra/cql3/ReservedKeywordsTest.java| 43 
 2 files changed, 46 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bc9a0793/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
--
diff --git a/src/java/org/apache/cassandra/cql3/ReservedKeywords.java 
b/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
index 91b7e61..ee052a7 100644
--- a/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
+++ b/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
@@ -20,17 +20,18 @@ package org.apache.cassandra.cql3;
 
 import java.util.Set;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.ImmutableSet;
 
 class ReservedKeywords
 {
-private static final String[] reservedKeywords = new String[]
+@VisibleForTesting
+static final String[] reservedKeywords = new String[]
  {
  "SELECT",
  "FROM",
  "WHERE",
  "AND",
- "KEY",
  "ENTRIES",
  "FULL",
  "INSERT",
@@ -39,7 +40,6 @@ class ReservedKeywords
  "LIMIT",
  "USING",
  "USE",
- "COUNT",
  "SET",
  "BEGIN",
  "UNLOGGED",
@@ -61,8 +61,6 @@ class ReservedKeywords
  "DROP",
  "PRIMARY",
  "INTO",
- "TIMESTAMP",
- "TTL",
  "ALTER",
  "RENAME",
  "ADD",
@@ -81,27 +79,7 @@ class ReservedKeywords
  "DESCRIBE",
  "EXECUTE",
  "NORECURSIVE",
- "ASCII",
- "BIGINT",
- "BLOB",
- "BOOLEAN",
- "COUNTER",
- "DECIMAL",
- "DOUBLE",
- "FLOAT",
- "INET",
- "INT",
- "SMALLINT",
- "TINYINT",
- "TEXT",
- "UUID",
- "VARCHAR",
- "VARINT",
- "TIMEUUID",
  "TOKEN",
- "WRITETIME",
- "DATE",
- "TIME",

[2/6] cassandra git commit: Update reserved keyword list

2016-10-17 Thread carl
Update reserved keyword list

Patch by Carl Yeksigian; reviewed by Alex Petrov for CASSANDRA-11803


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bc9a0793
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bc9a0793
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bc9a0793

Branch: refs/heads/cassandra-3.X
Commit: bc9a0793944f7dd481646c4014d13b844439906c
Parents: 7872318
Author: Carl Yeksigian 
Authored: Mon Oct 17 16:15:49 2016 -0400
Committer: Carl Yeksigian 
Committed: Mon Oct 17 16:15:49 2016 -0400

--
 .../apache/cassandra/cql3/ReservedKeywords.java | 28 ++---
 .../cassandra/cql3/ReservedKeywordsTest.java| 43 
 2 files changed, 46 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bc9a0793/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
--
diff --git a/src/java/org/apache/cassandra/cql3/ReservedKeywords.java 
b/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
index 91b7e61..ee052a7 100644
--- a/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
+++ b/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
@@ -20,17 +20,18 @@ package org.apache.cassandra.cql3;
 
 import java.util.Set;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.ImmutableSet;
 
 class ReservedKeywords
 {
-private static final String[] reservedKeywords = new String[]
+@VisibleForTesting
+static final String[] reservedKeywords = new String[]
  {
  "SELECT",
  "FROM",
  "WHERE",
  "AND",
- "KEY",
  "ENTRIES",
  "FULL",
  "INSERT",
@@ -39,7 +40,6 @@ class ReservedKeywords
  "LIMIT",
  "USING",
  "USE",
- "COUNT",
  "SET",
  "BEGIN",
  "UNLOGGED",
@@ -61,8 +61,6 @@ class ReservedKeywords
  "DROP",
  "PRIMARY",
  "INTO",
- "TIMESTAMP",
- "TTL",
  "ALTER",
  "RENAME",
  "ADD",
@@ -81,27 +79,7 @@ class ReservedKeywords
  "DESCRIBE",
  "EXECUTE",
  "NORECURSIVE",
- "ASCII",
- "BIGINT",
- "BLOB",
- "BOOLEAN",
- "COUNTER",
- "DECIMAL",
- "DOUBLE",
- "FLOAT",
- "INET",
- "INT",
- "SMALLINT",
- "TINYINT",
- "TEXT",
- "UUID",
- "VARCHAR",
- "VARINT",
- "TIMEUUID",
  "TOKEN",
- "WRITETIME",
- "DATE",
- "TIME",

[6/6] cassandra git commit: Merge branch 'cassandra-3.X' into trunk

2016-10-17 Thread carl
Merge branch 'cassandra-3.X' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b51f7e25
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b51f7e25
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b51f7e25

Branch: refs/heads/trunk
Commit: b51f7e25977cc82a54be24bb254ff3fa82b11795
Parents: e66305d f5bc378
Author: Carl Yeksigian 
Authored: Mon Oct 17 16:16:55 2016 -0400
Committer: Carl Yeksigian 
Committed: Mon Oct 17 16:16:55 2016 -0400

--
 .../apache/cassandra/cql3/ReservedKeywords.java | 28 ++---
 .../cassandra/cql3/ReservedKeywordsTest.java| 43 
 2 files changed, 46 insertions(+), 25 deletions(-)
--




[1/6] cassandra git commit: Update reserved keyword list

2016-10-17 Thread carl
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 7872318d6 -> bc9a07939
  refs/heads/cassandra-3.X 720acc611 -> f5bc3784f
  refs/heads/trunk e66305de0 -> b51f7e259


Update reserved keyword list

Patch by Carl Yeksigian; reviewed by Alex Petrov for CASSANDRA-11803


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bc9a0793
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bc9a0793
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bc9a0793

Branch: refs/heads/cassandra-3.0
Commit: bc9a0793944f7dd481646c4014d13b844439906c
Parents: 7872318
Author: Carl Yeksigian 
Authored: Mon Oct 17 16:15:49 2016 -0400
Committer: Carl Yeksigian 
Committed: Mon Oct 17 16:15:49 2016 -0400

--
 .../apache/cassandra/cql3/ReservedKeywords.java | 28 ++---
 .../cassandra/cql3/ReservedKeywordsTest.java| 43 
 2 files changed, 46 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bc9a0793/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
--
diff --git a/src/java/org/apache/cassandra/cql3/ReservedKeywords.java 
b/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
index 91b7e61..ee052a7 100644
--- a/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
+++ b/src/java/org/apache/cassandra/cql3/ReservedKeywords.java
@@ -20,17 +20,18 @@ package org.apache.cassandra.cql3;
 
 import java.util.Set;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.ImmutableSet;
 
 class ReservedKeywords
 {
-private static final String[] reservedKeywords = new String[]
+@VisibleForTesting
+static final String[] reservedKeywords = new String[]
  {
  "SELECT",
  "FROM",
  "WHERE",
  "AND",
- "KEY",
  "ENTRIES",
  "FULL",
  "INSERT",
@@ -39,7 +40,6 @@ class ReservedKeywords
  "LIMIT",
  "USING",
  "USE",
- "COUNT",
  "SET",
  "BEGIN",
  "UNLOGGED",
@@ -61,8 +61,6 @@ class ReservedKeywords
  "DROP",
  "PRIMARY",
  "INTO",
- "TIMESTAMP",
- "TTL",
  "ALTER",
  "RENAME",
  "ADD",
@@ -81,27 +79,7 @@ class ReservedKeywords
  "DESCRIBE",
  "EXECUTE",
  "NORECURSIVE",
- "ASCII",
- "BIGINT",
- "BLOB",
- "BOOLEAN",
- "COUNTER",
- "DECIMAL",
- "DOUBLE",
- "FLOAT",
- "INET",
- "INT",
- "SMALLINT",
- "TINYINT",
- "TEXT",
- "UUID",
- "VARCHAR",
- "VARINT",
- "TIMEUUID",
  "TOKEN",
-  

[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.X

2016-10-17 Thread carl
Merge branch 'cassandra-3.0' into cassandra-3.X


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f5bc3784
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f5bc3784
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f5bc3784

Branch: refs/heads/trunk
Commit: f5bc3784fb08741b2d4636dd09061b06c3d893d6
Parents: 720acc6 bc9a079
Author: Carl Yeksigian 
Authored: Mon Oct 17 16:16:29 2016 -0400
Committer: Carl Yeksigian 
Committed: Mon Oct 17 16:16:29 2016 -0400

--
 .../apache/cassandra/cql3/ReservedKeywords.java | 28 ++---
 .../cassandra/cql3/ReservedKeywordsTest.java| 43 
 2 files changed, 46 insertions(+), 25 deletions(-)
--




[jira] [Updated] (CASSANDRA-11803) Creating a materialized view on a table with "token" column breaks the cluster

2016-10-17 Thread Carl Yeksigian (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Yeksigian updated CASSANDRA-11803:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the quick review, [~ifesdjeen].

Removed the bad keywords and added a test to ensure all words in the 
ReservedKeywords list do not parse in 
[bc9a079|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=bc9a0793944f7dd481646c4014d13b844439906c].

> Creating a materialized view on a table with "token" column breaks the cluster
> --
>
> Key: CASSANDRA-11803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11803
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Kernel:
> Linux 4.4.8-20.46.amzn1.x86_64
> Java:
> Java OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
> Cassandra: 
> datastax-ddc-3.3.0-1.noarch
> datastax-ddc-tools-3.3.0-1.noarch
>Reporter: Victor Trac
>Assignee: Carl Yeksigian
> Fix For: 3.0.10, 3.10, 4.0
>
>
> On a new Cassandra cluster, if we create a table with a field called "token" 
> (with quotes) and then create a materialized view that uses "token", the 
> cluster breaks. A ServerError is returned, and no further nodetool operations 
> on the sstables work. Restarting the Cassandra server will also fail. It 
> seems like the entire cluster is hosed.
> We tried this on Cassandra 3.3 and 3.5. 
> Here's how to produce (on an new, empty cassandra 3.5 docker container):
> {code}
> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4]
> Use HELP for help.
> cqlsh> CREATE KEYSPACE account WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor' : 1 };
> cqlsh> CREATE TABLE account.session  (
>...   "token" blob,
>...   account_id uuid,
>...   PRIMARY KEY("token")
>... )WITH compaction={'class': 'LeveledCompactionStrategy'} AND
>...   compression={'sstable_compression': 'LZ4Compressor'};
> cqlsh> CREATE MATERIALIZED VIEW account.account_session AS
>...SELECT account_id,"token" FROM account.session
>...WHERE "token" IS NOT NULL and account_id IS NOT NULL
>...PRIMARY KEY (account_id, "token");
> ServerError:  message="java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> org.apache.cassandra.exceptions.SyntaxException: line 1:25 no viable 
> alternative at input 'FROM' (SELECT account_id, token [FROM]...)">
> cqlsh> drop table account.session;
> ServerError:  message="java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> org.apache.cassandra.exceptions.SyntaxException: line 1:25 no viable 
> alternative at input 'FROM' (SELECT account_id, token [FROM]...)">
> {code}
> When any sstable*, nodetool, or when the Cassandra process is restarted, this 
> is emitted on startup and Cassandra exits (copied from a server w/ data):
> {code}
> INFO  [main] 2016-05-12 23:25:30,074 ColumnFamilyStore.java:395 - 
> Initializing system_schema.indexes
> DEBUG [SSTableBatchOpen:1] 2016-05-12 23:25:30,075 SSTableReader.java:480 - 
> Opening 
> /mnt/cassandra/data/system_schema/indexes-0feb57ac311f382fba6d9024d305702f/ma-4-big
>  (91 bytes)
> ERROR [main] 2016-05-12 23:25:30,143 CassandraDaemon.java:697 - Exception 
> encountered during startup
> org.apache.cassandra.exceptions.SyntaxException: line 1:59 no viable 
> alternative at input 'FROM' (..., expire_at, last_used, token [FROM]...)
> at 
> org.apache.cassandra.cql3.ErrorCollector.throwFirstSyntaxError(ErrorCollector.java:101)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.cql3.CQLFragmentParser.parseAnyUnhandled(CQLFragmentParser.java:80)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:512)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchView(SchemaKeyspace.java:1128)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchViews(SchemaKeyspace.java:1092)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:903)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:879)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:867)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:134) 
> ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583336#comment-15583336
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

sure, let me change it now. 

also, If you have any input on the overall way I'm testing and generating load 
please let me know -- I really did try to make it as realistic as I could and 
we discussed it internally over here but I'm all ears if you have a different 
kind of load I'm missing that isn't making it an accurate test so far for 
certain workloads.

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583328#comment-15583328
 ] 

Pavel Yaskevich commented on CASSANDRA-9754:


[~mkjellman] Maybe "largeuuid1"? Looks like rows there were about ~300KB too, 
which is reasonable.

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583373#comment-15583373
 ] 

Pavel Yaskevich commented on CASSANDRA-9754:


I'm planning to take a closer look at the code etc. soon, so if I see something 
or have any ideas I'll let you know!

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12417) Built-in AVG aggregate is much less useful than it should be

2016-10-17 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-12417:
--
   Resolution: Fixed
Fix Version/s: 3.10
   3.0.10
   Status: Resolved  (was: Ready to Commit)

> Built-in AVG aggregate is much less useful than it should be
> 
>
> Key: CASSANDRA-12417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12417
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Branimir Lambov
>Assignee: Alex Petrov
> Fix For: 3.0.10, 3.10
>
>
> For fixed-size integer types overflow is all but guaranteed to happen, 
> yielding incorrect result. While for sum it is somewhat acceptable as the 
> result cannot fit the type, this is not the case for average.
> As the result of average is always within the scope of the source type, 
> failing to produce it only signifies a bad implementation. Yes, one can solve 
> this by type-casting, but do we really want to always have to be telling 
> people that the correct spelling of the average function is 
> {{cast(avg(cast(value as bigint))) as int)}}, especially if this is so 
> trivial to fix?
> Additionally, the straightforward addition we use for floating point versions 
> is not a good choice numerically for larger numbers of values. We should 
> switch to a more stable version, e.g. iterative mean using {{avg = avg + 
> (value - avg) / count}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12485) Always require replace_address to replace existing token

2016-10-17 Thread Christopher Licata (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582905#comment-15582905
 ] 

Christopher Licata  edited comment on CASSANDRA-12485 at 10/17/16 6:26 PM:
---

So, as you can see the {{ stress-build }} step completes properly: 

{noformat}
build:
build-test:
[javac] Compiling 487 source files to 
/Users/xle012/cstar/cassandra/build/test/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
 [copy] Copying 23 files to /Users/xle012/cstar/cassandra/build/test/classes

stress-build:
[mkdir] Created dir: /Users/xle012/cstar/cassandra/build/classes/stress
[javac] Compiling 120 source files to 
/Users/xle012/cstar/cassandra/build/classes/stress
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
 [copy] Copying 1 file to /Users/xle012/cstar/cassandra/build/classes/stress

write-poms:

init:

maven-ant-tasks-localrepo:

maven-ant-tasks-download:

maven-ant-tasks-init:

maven-declare-dependencies:

_write-poms:

jar:
 [copy] Copying 1 file to 
/Users/xle012/cstar/cassandra/build/classes/main/META-INF
 [copy] Copying 1 file to 
/Users/xle012/cstar/cassandra/build/classes/thrift/META-INF
 [copy] Copying 1 file to 
/Users/xle012/cstar/cassandra/build/classes/main/META-INF
 [copy] Copying 1 file to 
/Users/xle012/cstar/cassandra/build/classes/thrift/META-INF
  [jar] Building jar: 
/Users/xle012/cstar/cassandra/build/apache-cassandra-thrift-4.0-SNAPSHOT.jar
  [jar] Building jar: 
/Users/xle012/cstar/cassandra/build/apache-cassandra-4.0-SNAPSHOT.jar
[mkdir] Created dir: 
/Users/xle012/cstar/cassandra/build/classes/stress/META-INF
  [jar] Building jar: 
/Users/xle012/cstar/cassandra/build/tools/lib/stress.jar

BUILD SUCCESSFUL
Total time: 40 seconds
{noformat}

However, I am getting the following error when I try to run the Cassandra 
Stress binaries manually: 

{noformat}
Christopher Licata ➡ ~/cstar/cassandra/tools/bin [12485-trunk]
± λ sh cassandra-stress 


  1:44PM - October 17
Error: Could not find or load main class org.apache.cassandra.stress.Stress
{noformat}
And actually I am no longer able to run a local cassandra server on my machine 
when I start it from tarball either: 

{noformat}
Christopher Licata ➡ ~/Software/apache-cassandra-3.9/bin
  λ ./cassandra -f 
Error: Could not find or load main class 
org.apache.cassandra.service.CassandraDaemon
{noformat}

Any advice? I am running on OSX.


was (Author: cmlicata):
So, as you can see the {{ stress-build }} step completes properly: 

{noformat}
build:
build-test:
[javac] Compiling 487 source files to 
/Users/xle012/cstar/cassandra/build/test/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
 [copy] Copying 23 files to /Users/xle012/cstar/cassandra/build/test/classes

stress-build:
[mkdir] Created dir: /Users/xle012/cstar/cassandra/build/classes/stress
[javac] Compiling 120 source files to 
/Users/xle012/cstar/cassandra/build/classes/stress
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
 [copy] Copying 1 file to /Users/xle012/cstar/cassandra/build/classes/stress

write-poms:

init:

maven-ant-tasks-localrepo:

maven-ant-tasks-download:

maven-ant-tasks-init:

maven-declare-dependencies:

_write-poms:

jar:
 [copy] Copying 1 file to 
/Users/xle012/cstar/cassandra/build/classes/main/META-INF
 [copy] Copying 1 file to 
/Users/xle012/cstar/cassandra/build/classes/thrift/META-INF
 [copy] Copying 1 file to 
/Users/xle012/cstar/cassandra/build/classes/main/META-INF
 [copy] Copying 1 file to 
/Users/xle012/cstar/cassandra/build/classes/thrift/META-INF
  [jar] Building jar: 
/Users/xle012/cstar/cassandra/build/apache-cassandra-thrift-4.0-SNAPSHOT.jar
  [jar] Building jar: 

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583160#comment-15583160
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

Test clusters have crossed 110GB for the large CQL Partitions!!! Latency is 
still stable :)

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583224#comment-15583224
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

Here is cfstats from one of the instances.

{code}
Keyspace: test_keyspace
Read Count: 114179492
Read Latency: 1.6377607135701742 ms.
Write Count: 662747473
Write Latency: 0.030130128499184786 ms.
Pending Flushes: 0
Table: largetext1
SSTable count: 26
SSTables in each level: [0, 3, 7, 8, 8, 0, 0, 0, 0]
Space used (live): 434883821719
Space used (total): 434883821719
Space used by snapshots (total): 0
Off heap memory used (total): 67063584
SSTable Compression Ratio: 0.7882047641965452
Number of keys (estimate): 14
Memtable cell count: 58930
Memtable data size: 25518748
Memtable off heap memory used: 0
Memtable switch count: 3416
Local read count: 71154231
Local read latency: 2.468 ms
Local write count: 410631676
Local write latency: 0.030 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 496
Bloom filter off heap memory used: 288
Index summary off heap memory used: 1144
Compression metadata off heap memory used: 67062152
Compacted partition minimum bytes: 20924301
Compacted partition maximum bytes: 91830775932
Compacted partition mean bytes: 19348020195
Average live cells per slice (last five minutes): 
0.9998001524322566
Maximum live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

Table: largeuuid1
SSTable count: 59
SSTables in each level: [1, 10, 48, 0, 0, 0, 0, 0, 0]
Space used (live): 9597872057
Space used (total): 9597872057
Space used by snapshots (total): 0
Off heap memory used (total): 3960428
SSTable Compression Ratio: 0.2836031289299396
Number of keys (estimate): 27603
Memtable cell count: 228244
Memtable data size: 7874514
Memtable off heap memory used: 0
Memtable switch count: 521
Local read count: 18463741
Local read latency: 0.271 ms
Local write count: 108570121
Local write latency: 0.031 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 22008
Bloom filter off heap memory used: 21536
Index summary off heap memory used: 11308
Compression metadata off heap memory used: 3927584
Compacted partition minimum bytes: 42511
Compacted partition maximum bytes: 4866323
Compacted partition mean bytes: 1290148
Average live cells per slice (last five minutes): 
0.9992537806937392
Maximum live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

Table: timeuuid1
SSTable count: 7
SSTables in each level: [0, 1, 3, 3, 0, 0, 0, 0, 0]
Space used (live): 103161816378
Space used (total): 103161816378
Space used by snapshots (total): 0
Off heap memory used (total): 13820716
SSTable Compression Ratio: 0.9105016396444802
Number of keys (estimate): 6
Memtable cell count: 150596
Memtable data size: 41378801
Memtable off heap memory used: 0
Memtable switch count: 1117
Local read count: 24561527
Local read latency: 0.264 ms
Local write count: 143545778
Local write latency: 0.033 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 128
Bloom filter off heap memory used: 72
Index summary off heap memory used: 308
Compression metadata off heap memory used: 13820336
Compacted partition minimum 

[jira] [Commented] (CASSANDRA-12485) Always require replace_address to replace existing token

2016-10-17 Thread Christopher Licata (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583124#comment-15583124
 ] 

Christopher Licata  commented on CASSANDRA-12485:
-

[~jkni] you were right... thank god!  I had conflicting $CASSANDRA_HOME 
variables in my .aliases and my .zshrc file.  Since in {{cassandra.in.sh}} the 
CLASSPATH is derived from CASSANDRA_HOME, that fixed everything with running 
the dtests.

> Always require replace_address to replace existing token
> 
>
> Key: CASSANDRA-12485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12485
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Paulo Motta
>Priority: Minor
>  Labels: lhf
>
> CASSANDRA-10134 prevented replace an existing node unless 
> {{\-Dcassandra.replace_address}} or 
> {{\-Dcassandra.allow_unsafe_replace=true}} is specified.
> We should extend this behavior to tokens, preventing a node from joining the 
> ring if another node with the same token already existing in the ring, unless 
> {{\-Dcassandra.replace_address}} or 
> {{\-Dcassandra.allow_unsafe_replace=true}} is specified in order to avoid 
> catastrophic scenarios.
> One scenario where this can easily happen is if you replace a node with 
> another node with a different IP, and after some time you restart the 
> original node by mistake. The original node will then take over the tokens of 
> the replaced node (since it has a newer gossip generation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583163#comment-15583163
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

Not a "stupid" question at all! There is certainly a bit more overhead here 
than what we did before, however, I'm closely monitoring compaction in these 
tests and Pending Tasks isn't backing up so at this read/write load it seems 
like the additional work is negligible in real world terms.

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583210#comment-15583210
 ] 

Pavel Yaskevich commented on CASSANDRA-9754:


[~mkjellman] This looks great! Can you please post information regarding 
SSTables sizes and their estimated key counts as well? AFAIR there exists 
another problem related to how indexes are currently stored - if key is not in 
the key cache there is no way to jump to it directly in the index file, index 
reader has to scan through index segment to find requested key, so I'm 
wondering what happens in the situation when there are many keys which are 
small-to-medium sized e.g. 64-128 MB in each given SSTable (let's say SSTable 
size is set to 1G or 2G) and stress readers are trying to read random keys, 
what would be the difference between current index read performance vs. index + 
birch tree?...

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583249#comment-15583249
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

In regards to your second point: I'm actually only am using the key cache in 
the current implementation if a) it's a legacy index that hasn't been upgraded 
yet (to keep performance for indexed rows the same during upgrades) b) if the 
row is "non" indexed, or < 64kb so just the starting offset.

For Birch indexed rows they always come from the Birch impl on disk and don't 
get stored in the key cache at all. Ideally I think it would be great if we 
could get rid of the key cache all together! There was some chat about this in 
the ticket earlier...

There is the index summary which has an offset for keys as they are sampled 
during compaction which let you skip to a given starting file offset inside the 
index for a key which reduces the problem you're talking about. I don't think 
the performance of the small-to-medium sized case should be any different with 
the Birch implementation than the current implementation and I'm trying to test 
that with the workload going on for the test_keyspace.largeuuid1 table. The 
issue with the Birch implementation vs the current though is going to be the 
size of the index file on disk due to the segments being aligned on 4kb 
boundaries. I've talked a bunch about this and thrown some ideas around with 
people and I think maybe the best case would be to check if the previously 
added row was a non-indexed segment (so just a long for the start of the 
partition in the index and no tree being built) and then don't align the file 
to a boundary for those cases. The real issue is I don't know the length ahead 
of time so I can't just encode the aligned segments at the end starting at some 
starting offset and encode relative offsets iteratively during compaction. Any 
thoughts on this would be really appreciated though...

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-9754:

Comment: was deleted

(was: In regards to your second point: I'm actually only am using the key cache 
in the current implementation if a) it's a legacy index that hasn't been 
upgraded yet (to keep performance for indexed rows the same during upgrades) b) 
if the row is "non" indexed, or < 64kb so just the starting offset.

For Birch indexed rows they always come from the Birch impl on disk and don't 
get stored in the key cache at all. Ideally I think it would be great if we 
could get rid of the key cache all together! There was some chat about this in 
the ticket earlier...

There is the index summary which has an offset for keys as they are sampled 
during compaction which let you skip to a given starting file offset inside the 
index for a key which reduces the problem you're talking about. I don't think 
the performance of the small-to-medium sized case should be any different with 
the Birch implementation than the current implementation and I'm trying to test 
that with the workload going on for the test_keyspace.largeuuid1 table. The 
issue with the Birch implementation vs the current though is going to be the 
size of the index file on disk due to the segments being aligned on 4kb 
boundaries. I've talked a bunch about this and thrown some ideas around with 
people and I think maybe the best case would be to check if the previously 
added row was a non-indexed segment (so just a long for the start of the 
partition in the index and no tree being built) and then don't align the file 
to a boundary for those cases. The real issue is I don't know the length ahead 
of time so I can't just encode the aligned segments at the end starting at some 
starting offset and encode relative offsets iteratively during compaction. Any 
thoughts on this would be really appreciated though...)

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583294#comment-15583294
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

One idea I've had for a while is that we could switch the current Summary 
implementation to just having it be a Birch tree itself with all keys (not 
sampled). You could then do a lookup into the row index to get the offset to 
the columns index in what we call the "primary index" today. Then you'd have a 
tree per row like we have today.

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583250#comment-15583250
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

In regards to your second point: I'm actually only am using the key cache in 
the current implementation if a) it's a legacy index that hasn't been upgraded 
yet (to keep performance for indexed rows the same during upgrades) b) if the 
row is "non" indexed, or < 64kb so just the starting offset.

For Birch indexed rows they always come from the Birch impl on disk and don't 
get stored in the key cache at all. Ideally I think it would be great if we 
could get rid of the key cache all together! There was some chat about this in 
the ticket earlier...

There is the index summary which has an offset for keys as they are sampled 
during compaction which let you skip to a given starting file offset inside the 
index for a key which reduces the problem you're talking about. I don't think 
the performance of the small-to-medium sized case should be any different with 
the Birch implementation than the current implementation and I'm trying to test 
that with the workload going on for the test_keyspace.largeuuid1 table. The 
issue with the Birch implementation vs the current though is going to be the 
size of the index file on disk due to the segments being aligned on 4kb 
boundaries. I've talked a bunch about this and thrown some ideas around with 
people and I think maybe the best case would be to check if the previously 
added row was a non-indexed segment (so just a long for the start of the 
partition in the index and no tree being built) and then don't align the file 
to a boundary for those cases. The real issue is I don't know the length ahead 
of time so I can't just encode the aligned segments at the end starting at some 
starting offset and encode relative offsets iteratively during compaction. Any 
thoughts on this would be really appreciated though...

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11803) Creating a materialized view on a table with "token" column breaks the cluster

2016-10-17 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583247#comment-15583247
 ] 

Alex Petrov commented on CASSANDRA-11803:
-

Thanks for the quick reaction! 

+1 on changes. As discussed offline, it'd be great to have a test for 
non-reserved word.

> Creating a materialized view on a table with "token" column breaks the cluster
> --
>
> Key: CASSANDRA-11803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11803
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Kernel:
> Linux 4.4.8-20.46.amzn1.x86_64
> Java:
> Java OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
> Cassandra: 
> datastax-ddc-3.3.0-1.noarch
> datastax-ddc-tools-3.3.0-1.noarch
>Reporter: Victor Trac
>Assignee: Carl Yeksigian
> Fix For: 3.0.10, 3.10, 4.0
>
>
> On a new Cassandra cluster, if we create a table with a field called "token" 
> (with quotes) and then create a materialized view that uses "token", the 
> cluster breaks. A ServerError is returned, and no further nodetool operations 
> on the sstables work. Restarting the Cassandra server will also fail. It 
> seems like the entire cluster is hosed.
> We tried this on Cassandra 3.3 and 3.5. 
> Here's how to produce (on an new, empty cassandra 3.5 docker container):
> {code}
> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4]
> Use HELP for help.
> cqlsh> CREATE KEYSPACE account WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor' : 1 };
> cqlsh> CREATE TABLE account.session  (
>...   "token" blob,
>...   account_id uuid,
>...   PRIMARY KEY("token")
>... )WITH compaction={'class': 'LeveledCompactionStrategy'} AND
>...   compression={'sstable_compression': 'LZ4Compressor'};
> cqlsh> CREATE MATERIALIZED VIEW account.account_session AS
>...SELECT account_id,"token" FROM account.session
>...WHERE "token" IS NOT NULL and account_id IS NOT NULL
>...PRIMARY KEY (account_id, "token");
> ServerError:  message="java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> org.apache.cassandra.exceptions.SyntaxException: line 1:25 no viable 
> alternative at input 'FROM' (SELECT account_id, token [FROM]...)">
> cqlsh> drop table account.session;
> ServerError:  message="java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> org.apache.cassandra.exceptions.SyntaxException: line 1:25 no viable 
> alternative at input 'FROM' (SELECT account_id, token [FROM]...)">
> {code}
> When any sstable*, nodetool, or when the Cassandra process is restarted, this 
> is emitted on startup and Cassandra exits (copied from a server w/ data):
> {code}
> INFO  [main] 2016-05-12 23:25:30,074 ColumnFamilyStore.java:395 - 
> Initializing system_schema.indexes
> DEBUG [SSTableBatchOpen:1] 2016-05-12 23:25:30,075 SSTableReader.java:480 - 
> Opening 
> /mnt/cassandra/data/system_schema/indexes-0feb57ac311f382fba6d9024d305702f/ma-4-big
>  (91 bytes)
> ERROR [main] 2016-05-12 23:25:30,143 CassandraDaemon.java:697 - Exception 
> encountered during startup
> org.apache.cassandra.exceptions.SyntaxException: line 1:59 no viable 
> alternative at input 'FROM' (..., expire_at, last_used, token [FROM]...)
> at 
> org.apache.cassandra.cql3.ErrorCollector.throwFirstSyntaxError(ErrorCollector.java:101)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.cql3.CQLFragmentParser.parseAnyUnhandled(CQLFragmentParser.java:80)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:512)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchView(SchemaKeyspace.java:1128)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchViews(SchemaKeyspace.java:1092)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:903)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:879)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:867)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:134) 
> ~[apache-cassandra-3.5.0.jar:3.5.0]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:124) 
> ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:229) 
> 

[jira] [Commented] (CASSANDRA-12666) dtest failure in paging_test.TestPagingData.test_paging_with_filtering_on_partition_key

2016-10-17 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583260#comment-15583260
 ] 

Alex Petrov commented on CASSANDRA-12666:
-

Oh. Sorry, turns out I did not understand your comment in full from the first 
take, got too concentrated on ranges part. 

I've updated the patch (links unchanged). The bounds (min for beginning, max 
for end) are based on logic similar to what's 
[here|https://github.com/ifesdjeen/cassandra/blob/49ad523c97cb0d1ab43671e4839d381411bfe935/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java#L690-L706].
 

> dtest failure in 
> paging_test.TestPagingData.test_paging_with_filtering_on_partition_key
> ---
>
> Key: CASSANDRA-12666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12666
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sean McCarthy
>Assignee: Alex Petrov
>  Labels: dtest
> Fix For: 3.x
>
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/480/testReport/paging_test/TestPagingData/test_paging_with_filtering_on_partition_key
> {code}
> Standard Output
> Unexpected error in node3 log, error: 
> ERROR [Native-Transport-Requests-3] 2016-09-17 00:50:11,543 Message.java:622 
> - Unexpected exception during request; channel = [id: 0x467a4afe, 
> L:/127.0.0.3:9042 - R:/127.0.0.1:59115]
> java.lang.AssertionError: null
>   at 
> org.apache.cassandra.dht.IncludingExcludingBounds.split(IncludingExcludingBounds.java:45)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageProxy.getRestrictedRanges(StorageProxy.java:2368)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageProxy$RangeIterator.(StorageProxy.java:1951)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:2235)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.PartitionRangeReadCommand.execute(PartitionRangeReadCommand.java:184)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:66)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.pager.PartitionRangeQueryPager.fetchPage(PartitionRangeQueryPager.java:36)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement$Pager$NormalPager.fetchPage(SelectStatement.java:328)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:375)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:250)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:78)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:216)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:247) 
> ~[main/:na]
>   at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:232) 
> ~[main/:na]
>   at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
>  ~[main/:na]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:516)
>  [main/:na]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:409)
>  [main/:na]
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [main/:na]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [main/:na]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {code}
> Related failures:
> http://cassci.datastax.com/job/trunk_novnode_dtest/480/testReport/paging_test/TestPagingData/test_paging_with_filtering_on_partition_key_on_clustering_columns/
> 

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583289#comment-15583289
 ] 

Pavel Yaskevich commented on CASSANDRA-9754:


bq. I'm actually only am using the key cache in the current implementation

I wanted to mention that purely from looking up key in the key cache 
perspective, I've assumed that index is only going to have key offsets in it, 
so we are on the same page. 

[~barnie] Is there any way you can run this through automated perf stress test? 
Since the size of the tree attached to the key is bigger than it was 
originally, I'm curious what is performance difference in conditions where rows 
are just barely big enough to be indexed and there are a lot of keys.

[~mkjellman] I understand that the test you are running is designed to check 
what is the performance like relative to the Birch tree itself, but is there 
there any chance you can disable key cache and generate some more keys (maybe 
~100k?) to see how changes to the column index are affecting read path top-down?

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583305#comment-15583305
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

Sure, I can change the test right now. Which table specifically are you talking 
about adding more keys, it's a single command line parameter and a restart of 
the perf load? I'll need to bounce the cluster for the key cache change 
obviously though.

The control cluster which had 2.1.16 without Birch which I did on purpose to 
see how performance was with Birch vs without to specifically make sure there 
wasn't a regression at the low end like you're rightfully concerned about (as I 
am/was too). 

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12777) Optimize the vnode allocation for single replica per DC

2016-10-17 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15577330#comment-15577330
 ] 

Dikang Gu edited comment on CASSANDRA-12777 at 10/17/16 8:10 PM:
-

[~blambov] thanks a lot for the review! I addressed your comments and here is a 
new commit.

1. wraparound calculation, for BigIntegerToken, if it's above maximum, I will 
`mod` the max token. I also add test and validation for it.
2. I put the createTokenInfo in constructor because I need to populate the unit 
info according to the tokens.
3. agree, removed the createTokenInfo
4. I tried different fractions, from 0.50 - 0.99, 0.50 will fail the assertion, 
otherwise, they do not make much difference. 
https://gist.github.com/DikangGu/acd8f568f67b11082443419a8d503b01, I put 0.75 
in this commit.
5. Add the `TokenAllocatorBase`, I keep the factory class because I think it's 
cleaner to keep the factory method there.

Thanks!


was (Author: dikanggu):
[~blambov] thanks a lot for the review! I addressed your comments and here is a 
new commit: 
https://github.com/DikangGu/cassandra/commit/402050e32732e67055935689951a56f92b9be281

1. wraparound calculation, for BigIntegerToken, if it's above maximum, I will 
`mod` the max token. I also add test and validation for it.
2. I put the createTokenInfo in constructor because I need to populate the unit 
info according to the tokens.
3. agree, removed the createTokenInfo
4. I tried different fractions, from 0.50 - 0.99, 0.50 will fail the assertion, 
otherwise, they do not make much difference. 
https://gist.github.com/DikangGu/acd8f568f67b11082443419a8d503b01, I put 0.75 
in this commit.
5. Add the `TokenAllocatorBase`, I keep the factory class because I think it's 
cleaner to keep the factory method there.

Thanks!

> Optimize the vnode allocation for single replica per DC
> ---
>
> Key: CASSANDRA-12777
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12777
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
> Fix For: 3.x
>
>
> The new vnode allocation algorithm introduced in CASSANDRA-7032 is optimized 
> for the situation that there are multiple replicas per DC.
> In our production environment, most cluster only has one replica, in this 
> case, the algorithm does not work perfectly. It always tries to split token 
> ranges by half, so that the ownership of "min" node could go as low as ~60% 
> compared to avg.
> So for single replica case, I'm working on a new algorithm, which is based on 
> Branimir's previous commit, to split token ranges by "some" percentage, 
> instead of always by half. In this way, we can get a very small variation of 
> the ownership among different nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12777) Optimize the vnode allocation for single replica per DC

2016-10-17 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583301#comment-15583301
 ] 

Dikang Gu commented on CASSANDRA-12777:
---

Sure, I add the limit of the range of take over ratio, for both MIN and MAX 
ratios. Here is the updated patch, 
https://github.com/DikangGu/cassandra/commit/5e837747974b5faa9833dc55ac5bd33a8c5e8b31,
 and the simulation results are here, 
https://gist.github.com/DikangGu/29a6b5ab876ff6979de45118b855622b. I'd like to 
go with 0.90, since it produces better results.

Thanks.

> Optimize the vnode allocation for single replica per DC
> ---
>
> Key: CASSANDRA-12777
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12777
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
> Fix For: 3.x
>
>
> The new vnode allocation algorithm introduced in CASSANDRA-7032 is optimized 
> for the situation that there are multiple replicas per DC.
> In our production environment, most cluster only has one replica, in this 
> case, the algorithm does not work perfectly. It always tries to split token 
> ranges by half, so that the ownership of "min" node could go as low as ~60% 
> compared to avg.
> So for single replica case, I'm working on a new algorithm, which is based on 
> Branimir's previous commit, to split token ranges by "some" percentage, 
> instead of always by half. In this way, we can get a very small variation of 
> the ownership among different nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583396#comment-15583396
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

Great. I'm working on a trunk based version now. 8099 is really fun! :)

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-10-17 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583503#comment-15583503
 ] 

Jason Brown commented on CASSANDRA-8457:


Addressed a few problems due to failing dtests. The currently failing utests 
and dtests are almost *all* related to streaming; I'm working on the few 
remaining dtests.

* correctly handle canceled futures in 
{{OutboundMessagingConnection#handleMessageFuture}}
* Fixed NPE in {{OutboundMessagingConnection#reconnectWithNewIp}}
* refactored some code wrt outbound connection creation
* added package-level documentation, primarily describing the handshake and 
messaging format

Also, I have a patched version of netty 4.1.6 in this branch that fixes a flush 
problem in {{LZ4FrameEncoder}}. Netty folks are reviewing now, and will 
hopefully get an updated netty lib soon.

[~slebresne] ready for your next round of comments when you have time :)

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/3] cassandra git commit: Fix broken clean target in doc/Makefile

2016-10-17 Thread mshuler
Fix broken clean target in doc/Makefile


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7a5118c7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7a5118c7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7a5118c7

Branch: refs/heads/trunk
Commit: 7a5118c7de664d8bd5b18cc2e32d7d4c8b04313e
Parents: f5bc378
Author: Michael Shuler 
Authored: Mon Oct 17 16:19:52 2016 -0500
Committer: Michael Shuler 
Committed: Mon Oct 17 16:19:52 2016 -0500

--
 doc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/7a5118c7/doc/Makefile
--
diff --git a/doc/Makefile b/doc/Makefile
index 81d7707..c6632a5 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -55,7 +55,7 @@ help:
 .PHONY: clean
 clean:
rm -rf $(BUILDDIR)/*
-   rm $(YAML_DOC_OUTPUT)
+   rm -f $(YAML_DOC_OUTPUT)
 
 .PHONY: html
 html:



[1/3] cassandra git commit: Fix broken clean target in doc/Makefile

2016-10-17 Thread mshuler
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.X f5bc3784f -> 7a5118c7d
  refs/heads/trunk b51f7e259 -> d478f45d1


Fix broken clean target in doc/Makefile


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7a5118c7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7a5118c7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7a5118c7

Branch: refs/heads/cassandra-3.X
Commit: 7a5118c7de664d8bd5b18cc2e32d7d4c8b04313e
Parents: f5bc378
Author: Michael Shuler 
Authored: Mon Oct 17 16:19:52 2016 -0500
Committer: Michael Shuler 
Committed: Mon Oct 17 16:19:52 2016 -0500

--
 doc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/7a5118c7/doc/Makefile
--
diff --git a/doc/Makefile b/doc/Makefile
index 81d7707..c6632a5 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -55,7 +55,7 @@ help:
 .PHONY: clean
 clean:
rm -rf $(BUILDDIR)/*
-   rm $(YAML_DOC_OUTPUT)
+   rm -f $(YAML_DOC_OUTPUT)
 
 .PHONY: html
 html:



[3/3] cassandra git commit: Merge branch 'cassandra-3.X' into trunk

2016-10-17 Thread mshuler
Merge branch 'cassandra-3.X' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d478f45d
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d478f45d
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d478f45d

Branch: refs/heads/trunk
Commit: d478f45d1061c9ee6605193cf4a5dfda4348e0aa
Parents: b51f7e2 7a5118c
Author: Michael Shuler 
Authored: Mon Oct 17 16:20:15 2016 -0500
Committer: Michael Shuler 
Committed: Mon Oct 17 16:20:15 2016 -0500

--
 doc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/d478f45d/doc/Makefile
--



<    1   2