[jira] [Commented] (CASSANDRA-8365) CamelCase name is used as index name instead of lowercase
[ https://issues.apache.org/jira/browse/CASSANDRA-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249624#comment-14249624 ] Benjamin Lerer commented on CASSANDRA-8365: --- Thanks fo pointing me to CASSANDRA-7314. The code was confusing for me due to the difference between {{CreateIndexStatement}} and {{DropIndexStatement}} (and my MySQL experience where an index is associated to a table). I should have looked for the ticket. CamelCase name is used as index name instead of lowercase - Key: CASSANDRA-8365 URL: https://issues.apache.org/jira/browse/CASSANDRA-8365 Project: Cassandra Issue Type: Bug Reporter: Pierre Laporte Assignee: Benjamin Lerer Priority: Minor Labels: cqlsh, docs Fix For: 2.1.3 Attachments: CASSANDRA-8365.txt In cqlsh, when I execute a CREATE INDEX FooBar ... statement, the CamelCase name is used as index name, even though it is unquoted. Trying to quote the index name results in a syntax error. However, when I try to delete the index, I have to quote the index name, otherwise I get an invalid-query error telling me that the index (lowercase) does not exist. This seems inconsistent. Shouldn't the index name be lowercased before the index is created ? Here is the code to reproduce the issue : {code} cqlsh:schemabuilderit CREATE TABLE IndexTest (a int primary key, b int); cqlsh:schemabuilderit CREATE INDEX FooBar on indextest (b); cqlsh:schemabuilderit DESCRIBE TABLE indextest ; CREATE TABLE schemabuilderit.indextest ( a int PRIMARY KEY, b int ) ; CREATE INDEX FooBar ON schemabuilderit.indextest (b); cqlsh:schemabuilderit DROP INDEX FooBar; code=2200 [Invalid query] message=Index 'foobar' could not be found in any of the tables of keyspace 'schemabuilderit' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8473) Secondary index support for key-value pairs in CQL3 maps
[ https://issues.apache.org/jira/browse/CASSANDRA-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249646#comment-14249646 ] Sylvain Lebresne commented on CASSANDRA-8473: - bq. My preference is still to stick with a 3.0 target. I would agree. Secondary index support for key-value pairs in CQL3 maps Key: CASSANDRA-8473 URL: https://issues.apache.org/jira/browse/CASSANDRA-8473 Project: Cassandra Issue Type: Improvement Reporter: Samuel Klock Assignee: Samuel Klock Fix For: 3.0 Attachments: cassandra-2.1-8473-actual-v1.txt, cassandra-2.1-8473-v2.txt, cassandra-2.1-8473.txt, trunk-8473-v2.txt CASSANDRA-4511 and CASSANDRA-6383 made substantial progress on secondary indexes on CQL3 maps, but support for a natural use case is still missing: queries to find rows with map columns containing some key-value pair. For example (from a comment on CASSANDRA-4511): {code:sql} SELECT * FROM main.users WHERE notify['email'] = true; {code} Cassandra should add support for this kind of index. One option is to expose a CQL interface like the following: * Creating an index: {code:sql} cqlsh:mykeyspace CREATE TABLE mytable (key TEXT PRIMARY KEY, value MAPTEXT, TEXT); cqlsh:mykeyspace CREATE INDEX ON mytable(ENTRIES(value)); {code} * Querying the index: {code:sql} cqlsh:mykeyspace INSERT INTO mytable (key, value) VALUES ('foo', {'a': '1', 'b': '2', 'c': '3'}); cqlsh:mykeyspace INSERT INTO mytable (key, value) VALUES ('bar', {'a': '1', 'b': '4'}); cqlsh:mykeyspace INSERT INTO mytable (key, value) VALUES ('baz', {'b': '4', 'c': '3'}); cqlsh:mykeyspace SELECT * FROM mytable WHERE value['a'] = '1'; key | value -+ bar | {'a': '1', 'b': '4'} foo | {'a': '1', 'b': '2', 'c': '3'} (2 rows) cqlsh:mykeyspace SELECT * FROM mytable WHERE value['a'] = '1' AND value['b'] = '2' ALLOW FILTERING; key | value -+ foo | {'a': '1', 'b': '2', 'c': '3'} (1 rows) cqlsh:mykeyspace SELECT * FROM mytable WHERE value['b'] = '2' ALLOW FILTERING; key | value -+ foo | {'a': '1', 'b': '2', 'c': '3'} (1 rows) cqlsh:mykeyspace SELECT * FROM mytable WHERE value['b'] = '4'; key | value -+-- bar | {'a': '1', 'b': '4'} baz | {'b': '4', 'c': '3'} (2 rows) {code} A patch against the Cassandra-2.1 branch that implements this interface will be attached to this issue shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[1/2] cassandra git commit: fix up misapply of CASSANDRA-7964 nits
Repository: cassandra Updated Branches: refs/heads/trunk 8a38ce88d - e6c5982fa fix up misapply of CASSANDRA-7964 nits Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f8524011 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f8524011 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f8524011 Branch: refs/heads/trunk Commit: f852401166cdd1e6b2136b5e70b85a771f6f2a8f Parents: d5e5f98 Author: Benedict Elliott Smith bened...@apache.org Authored: Wed Dec 17 09:27:47 2014 + Committer: Benedict Elliott Smith bened...@apache.org Committed: Wed Dec 17 09:27:47 2014 + -- .../apache/cassandra/stress/generate/PartitionIterator.java | 7 +++ .../cassandra/stress/operations/userdefined/SchemaInsert.java | 7 +-- .../cassandra/stress/operations/userdefined/SchemaQuery.java | 7 +-- 3 files changed, 5 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f8524011/tools/stress/src/org/apache/cassandra/stress/generate/PartitionIterator.java -- diff --git a/tools/stress/src/org/apache/cassandra/stress/generate/PartitionIterator.java b/tools/stress/src/org/apache/cassandra/stress/generate/PartitionIterator.java index baab867..0d0cba1 100644 --- a/tools/stress/src/org/apache/cassandra/stress/generate/PartitionIterator.java +++ b/tools/stress/src/org/apache/cassandra/stress/generate/PartitionIterator.java @@ -169,7 +169,6 @@ public abstract class PartitionIterator implements IteratorRow // so that we know with what chance we reached there, and we adjust our roll at that level by that amount final double[] chancemodifier = new double[generator.clusteringComponents.size()]; final double[] rollmodifier = new double[generator.clusteringComponents.size()]; -final ThreadLocalRandom random = ThreadLocalRandom.current(); // track where in the partition we are, and where we are limited to final int[] position = new int[generator.clusteringComponents.size()]; @@ -240,7 +239,7 @@ public abstract class PartitionIterator implements IteratorRow } // seek to our start position -switch (seek(isWrite ? position : null)) +switch (seek(isWrite ? position : 0)) { case END_OF_PARTITION: return false; @@ -382,6 +381,7 @@ public abstract class PartitionIterator implements IteratorRow private boolean advance(int depth, boolean first) { +ThreadLocalRandom random = ThreadLocalRandom.current(); // advance the leaf component clusteringComponents[depth].poll(); position[depth]++; @@ -548,9 +548,9 @@ public abstract class PartitionIterator implements IteratorRow private State setHasNext(boolean hasNext) { +this.hasNext = hasNext; if (!hasNext) { -this.hasNext = false; boolean isLast = finishedPartition(); if (isWrite) { @@ -562,7 +562,6 @@ public abstract class PartitionIterator implements IteratorRow } return isLast ? State.END_OF_PARTITION : State.AFTER_LIMIT; } -this.hasNext = hasNext; return State.SUCCESS; } } http://git-wip-us.apache.org/repos/asf/cassandra/blob/f8524011/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaInsert.java -- diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaInsert.java b/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaInsert.java index 61237f1..a915d93 100644 --- a/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaInsert.java +++ b/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaInsert.java @@ -44,15 +44,10 @@ public class SchemaInsert extends SchemaStatement public SchemaInsert(Timer timer, StressSettings settings, PartitionGenerator generator, SeedManager seedManager, Distribution batchSize, RatioDistribution useRatio, Integer thriftId, PreparedStatement statement, ConsistencyLevel cl, BatchStatement.Type batchType) { -super(timer, settings, spec(generator, seedManager, batchSize, useRatio), statement, thriftId, cl, ValidationType.NOT_FAIL); +super(timer, settings, new DataSpec(generator, seedManager, batchSize, useRatio), statement, thriftId, cl, ValidationType.NOT_FAIL); this.batchType = batchType; } -private static DataSpec
[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e6c5982f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e6c5982f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e6c5982f Branch: refs/heads/trunk Commit: e6c5982fa3b9d3483319e2fcc7bf3c1a7980a7e9 Parents: 8a38ce8 f852401 Author: Benedict Elliott Smith bened...@apache.org Authored: Wed Dec 17 09:28:17 2014 + Committer: Benedict Elliott Smith bened...@apache.org Committed: Wed Dec 17 09:28:17 2014 + -- .../apache/cassandra/stress/generate/PartitionIterator.java | 7 +++ .../cassandra/stress/operations/userdefined/SchemaInsert.java | 7 +-- .../cassandra/stress/operations/userdefined/SchemaQuery.java | 7 +-- 3 files changed, 5 insertions(+), 16 deletions(-) --
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655 ] Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 9:39 AM: - Honestly, I don't like this idea for Spark because of the following reasons: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be submitted and executed in parallel *** each token range will be of size *hundreds* of MBs, which is enough large to hide one or two seeks Some *real* performance problems we (and users) observed: * Cassandra is taking plenty of CPU when doing sequential scans. It is not possible to saturate bandwidth of a single laptop spinning HDD, because all cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, merging rows from different SSTables, ordering cells, filtering out tombstones, serializing etc. The problem doesn't go away after doing full compaction or disabling vnodes. This is a serious problem, because doing exactly the same query on a plain text file stored in CFS (still C*, but data stored as 2MB blobs) gives 3-30x performance boost (depending on who did the benchmark). We need to close this gap. See: https://datastax.jira.com/browse/DSP-3670 * We need to improve backpressure mechanism at least in such a way that the driver or Spark connector would know to start throttling writes if the cluster doesn't keep up. Currently Cassandra just timeouts the writes, but once it happens, the driver has no clue how long to wait until it is ok to resubmit the update. It would be actually good to know long enough before timing out, so we could slow down and avoid wasteful retrying at all. Currently it is not possible to predict cluster load by e.g. observing write latency, because the latency is extremely good until it is suddently terrible (timeout). This is also important for other non-Spark related use cases. See https://issues.apache.org/jira/browse/CASSANDRA-7937. was (Author: pkolaczk): Honestly, I don't like this idea because of the following reasons: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655 ] Piotr Kołaczkowski commented on CASSANDRA-7296: --- Honestly, I don't like this idea because of the following reasons: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be submitted and executed in parallel *** each token range will be of size *hundreds* of MBs, which is enough large to hide one or two seeks Some *real* performance problems we (and users) observed: * Cassandra is taking plenty of CPU when doing sequential scans. It is not possible to saturate bandwidth of a single laptop spinning HDD, because all cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, merging rows from different SSTables, ordering cells, filtering out tombstones, serializing etc. The problem doesn't go away after doing full compaction or disabling vnodes. This is a serious problem, because doing exactly the same query on a plain text file stored in CFS (still C*, but data stored as 2MB blobs) gives 3-30x performance boost (depending on who did the benchmark). We need to close this gap. See: https://datastax.jira.com/browse/DSP-3670 * We need to improve backpressure mechanism at least in such a way that the driver or Spark connector would know to start throttling writes if the cluster doesn't keep up. Currently Cassandra just timeouts the writes, but once it happens, the driver has no clue how long to wait until it is ok to resubmit the update. It would be actually good to know long enough before timing out, so we could slow down and avoid wasteful retrying at all. Currently it is not possible to predict cluster load by e.g. observing write latency, because the latency is extremely good until it is suddently terrible (timeout). This is also important for other non-Spark related use cases. See https://issues.apache.org/jira/browse/CASSANDRA-7937. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655 ] Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 9:41 AM: - Honestly, I don't think it would benefit Spark integration: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be submitted and executed in parallel *** each token range will be of size *hundreds* of MBs, which is enough large to hide one or two seeks Some *real* performance problems we (and users) observed: * Cassandra is taking plenty of CPU when doing sequential scans. It is not possible to saturate bandwidth of a single laptop spinning HDD, because all cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, merging rows from different SSTables, ordering cells, filtering out tombstones, serializing etc. The problem doesn't go away after doing full compaction or disabling vnodes. This is a serious problem, because doing exactly the same query on a plain text file stored in CFS (still C*, but data stored as 2MB blobs) gives 3-30x performance boost (depending on who did the benchmark). We need to close this gap. See: https://datastax.jira.com/browse/DSP-3670 * We need to improve backpressure mechanism at least in such a way that the driver or Spark connector would know to start throttling writes if the cluster doesn't keep up. Currently Cassandra just timeouts the writes, but once it happens, the driver has no clue how long to wait until it is ok to resubmit the update. It would be actually good to know long enough before timing out, so we could slow down and avoid wasteful retrying at all. Currently it is not possible to predict cluster load by e.g. observing write latency, because the latency is extremely good until it is suddently terrible (timeout). This is also important for other non-Spark related use cases. See https://issues.apache.org/jira/browse/CASSANDRA-7937. was (Author: pkolaczk): Honestly, I don't like this idea for Spark because of the following reasons: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those
[jira] [Commented] (CASSANDRA-8421) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249659#comment-14249659 ] Sylvain Lebresne commented on CASSANDRA-8421: - [~blerer] Can you check if you have more luck than me at reproducing with the test attached? Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT -- Key: CASSANDRA-8421 URL: https://issues.apache.org/jira/browse/CASSANDRA-8421 Project: Cassandra Issue Type: Bug Components: API Environment: single node cassandra Reporter: madheswaran Assignee: Sylvain Lebresne Fix For: 3.0, 2.1.3 Attachments: 8421-unittest.txt, entity_data.csv I using List and its data type is UDT. UDT: {code} CREATE TYPE fieldmap ( key text, value text ); {code} TABLE: {code} CREATE TABLE entity ( entity_id uuid PRIMARY KEY, begining int, domain text, domain_type text, entity_template_name text, field_values listfieldmap, global_entity_type text, revision_time timeuuid, status_key int, status_name text, uuid timeuuid ) {code} INDEX: {code} CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain); CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values); CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type ); {code} QUERY {code} SELECT * FROM entity WHERE status_key 3 and field_values contains {key: 'userName', value: 'Sprint5_22'} and gen_type = 'USER' and domain = 'S4_1017.abc.com' allow filtering; {code} The above query return value for some row and not for many rows but those rows and data's are exist. Observation: If I execute query with other than field_maps, then it returns value. I suspect the problem with LIST with UDT. I have single node cassadra DB. Please let me know why this strange behavior from cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8421) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-8421: Assignee: Benjamin Lerer (was: Sylvain Lebresne) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT -- Key: CASSANDRA-8421 URL: https://issues.apache.org/jira/browse/CASSANDRA-8421 Project: Cassandra Issue Type: Bug Components: API Environment: single node cassandra Reporter: madheswaran Assignee: Benjamin Lerer Fix For: 3.0, 2.1.3 Attachments: 8421-unittest.txt, entity_data.csv I using List and its data type is UDT. UDT: {code} CREATE TYPE fieldmap ( key text, value text ); {code} TABLE: {code} CREATE TABLE entity ( entity_id uuid PRIMARY KEY, begining int, domain text, domain_type text, entity_template_name text, field_values listfieldmap, global_entity_type text, revision_time timeuuid, status_key int, status_name text, uuid timeuuid ) {code} INDEX: {code} CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain); CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values); CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type ); {code} QUERY {code} SELECT * FROM entity WHERE status_key 3 and field_values contains {key: 'userName', value: 'Sprint5_22'} and gen_type = 'USER' and domain = 'S4_1017.abc.com' allow filtering; {code} The above query return value for some row and not for many rows but those rows and data's are exist. Observation: If I execute query with other than field_maps, then it returns value. I suspect the problem with LIST with UDT. I have single node cassadra DB. Please let me know why this strange behavior from cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8421) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249666#comment-14249666 ] madheswaran commented on CASSANDRA-8421: Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {quote} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {quote} 3) Try search operation ( Select with condition ) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT -- Key: CASSANDRA-8421 URL: https://issues.apache.org/jira/browse/CASSANDRA-8421 Project: Cassandra Issue Type: Bug Components: API Environment: single node cassandra Reporter: madheswaran Assignee: Benjamin Lerer Fix For: 3.0, 2.1.3 Attachments: 8421-unittest.txt, entity_data.csv I using List and its data type is UDT. UDT: {code} CREATE TYPE fieldmap ( key text, value text ); {code} TABLE: {code} CREATE TABLE entity ( entity_id uuid PRIMARY KEY, begining int, domain text, domain_type text, entity_template_name text, field_values listfieldmap, global_entity_type text, revision_time timeuuid, status_key int, status_name text, uuid timeuuid ) {code} INDEX: {code} CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain); CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values); CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type ); {code} QUERY {code} SELECT * FROM entity WHERE status_key 3 and field_values contains {key: 'userName', value: 'Sprint5_22'} and gen_type = 'USER' and domain = 'S4_1017.abc.com' allow filtering; {code} The above query return value for some row and not for many rows but those rows and data's are exist. Observation: If I execute query with other than field_maps, then it returns value. I suspect the problem with LIST with UDT. I have single node cassadra DB. Please let me know why this strange behavior from cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8421) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249666#comment-14249666 ] madheswaran edited comment on CASSANDRA-8421 at 12/17/14 10:09 AM: --- Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {quote} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {quote} 3) Try search operation ( Select with condition ) was (Author: madheswaran): Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {quote} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {quote} 3) Try search operation ( Select with condition ) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT -- Key: CASSANDRA-8421 URL: https://issues.apache.org/jira/browse/CASSANDRA-8421 Project: Cassandra Issue Type: Bug Components: API Environment: single node cassandra Reporter: madheswaran Assignee: Benjamin Lerer Fix For: 3.0, 2.1.3 Attachments: 8421-unittest.txt, entity_data.csv I using List and its data type is UDT. UDT: {code} CREATE TYPE fieldmap ( key text, value text ); {code} TABLE: {code} CREATE TABLE entity ( entity_id uuid PRIMARY KEY, begining int, domain text, domain_type text, entity_template_name text, field_values listfieldmap, global_entity_type text, revision_time timeuuid, status_key int, status_name text, uuid timeuuid ) {code} INDEX: {code} CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain); CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values); CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type ); {code} QUERY {code} SELECT * FROM entity WHERE status_key 3 and field_values contains {key: 'userName', value: 'Sprint5_22'} and gen_type = 'USER' and domain = 'S4_1017.abc.com' allow filtering; {code} The above query return value for some row and not for many rows but those rows and data's are exist. Observation: If I execute query with other than field_maps, then it returns value. I suspect the problem with LIST with UDT. I have single node cassadra DB. Please let me know why this strange behavior from cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249678#comment-14249678 ] Benedict commented on CASSANDRA-8457: - A couple of comments on the code: # Requiring synchronized seems _possible_ to negate any benefit, since we could easily get more threads competing. Certainly the cost seems higher than what we save from a lazySet; my typical strategy for this situation is to set the wakeup flag to true, check if any more work needs to be done, and _if so_ atomically re-adopt the state. So the despatch task would be a a loop, terminating only when there is definitely no work to do. # If we're worrying about context switching, we should probably switch to CLQ so that producers never conflict. If we're worried about less efficient draining, we can later introduce a MPSC queue with an efficient drainTo(). # droppedUpdater.decrementAndGet() looks like a typo However I doubt any of these will have a _significant_ impact. The stress test you're running should exercise these pathways in a typical manner. I'm not terribly surprised at a lack of impact, but it is probably worth trying on a much larger cluster to see if more-often-empty queues and a surfeit of threads can elicit a result. nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8421) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249666#comment-14249666 ] madheswaran edited comment on CASSANDRA-8421 at 12/17/14 10:09 AM: --- Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {quote} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {quote} 3) Try search operation ( Select with condition ) was (Author: madheswaran): Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {quote} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {quote} 3) Try search operation ( Select with condition ) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT -- Key: CASSANDRA-8421 URL: https://issues.apache.org/jira/browse/CASSANDRA-8421 Project: Cassandra Issue Type: Bug Components: API Environment: single node cassandra Reporter: madheswaran Assignee: Benjamin Lerer Fix For: 3.0, 2.1.3 Attachments: 8421-unittest.txt, entity_data.csv I using List and its data type is UDT. UDT: {code} CREATE TYPE fieldmap ( key text, value text ); {code} TABLE: {code} CREATE TABLE entity ( entity_id uuid PRIMARY KEY, begining int, domain text, domain_type text, entity_template_name text, field_values listfieldmap, global_entity_type text, revision_time timeuuid, status_key int, status_name text, uuid timeuuid ) {code} INDEX: {code} CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain); CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values); CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type ); {code} QUERY {code} SELECT * FROM entity WHERE status_key 3 and field_values contains {key: 'userName', value: 'Sprint5_22'} and gen_type = 'USER' and domain = 'S4_1017.abc.com' allow filtering; {code} The above query return value for some row and not for many rows but those rows and data's are exist. Observation: If I execute query with other than field_maps, then it returns value. I suspect the problem with LIST with UDT. I have single node cassadra DB. Please let me know why this strange behavior from cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8191) After sstablesplit all nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey)
[ https://issues.apache.org/jira/browse/CASSANDRA-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249680#comment-14249680 ] Jimmy Mårdell commented on CASSANDRA-8191: -- We got the same exception and stack trace after running repair where some nodes got streamed a lot of new data. The table used LCS and got a lot of data in level 0. The exception kept repeating itself every now and then, and no progress was made compactions. Restarting the node however seems to have solved the problem. This was on 2.0.11. After sstablesplit all nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey) -- Key: CASSANDRA-8191 URL: https://issues.apache.org/jira/browse/CASSANDRA-8191 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev While recovering the cluster from CASSANDRA-7949 (using the flag from CASSANDRA-6621) I had to use sstablesplit tool to split large sstables. Nodes were off while using this tool and only one sstablesplit instance was running, of course. After splitting was done I have restarted the nodes and they all started compacting the data. All the nodes are logging the exceptions like this: {code} ERROR [CompactionExecutor:4028] 2014-10-26 23:14:52,653 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4028,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130525-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} It seems that scrubbing helps but scrubbing blocks the compactions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8421) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249666#comment-14249666 ] madheswaran edited comment on CASSANDRA-8421 at 12/17/14 10:10 AM: --- Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {code} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {code} 3) Try search operation ( Select with condition ) was (Author: madheswaran): Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {quote} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {quote} 3) Try search operation ( Select with condition ) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT -- Key: CASSANDRA-8421 URL: https://issues.apache.org/jira/browse/CASSANDRA-8421 Project: Cassandra Issue Type: Bug Components: API Environment: single node cassandra Reporter: madheswaran Assignee: Benjamin Lerer Fix For: 3.0, 2.1.3 Attachments: 8421-unittest.txt, entity_data.csv I using List and its data type is UDT. UDT: {code} CREATE TYPE fieldmap ( key text, value text ); {code} TABLE: {code} CREATE TABLE entity ( entity_id uuid PRIMARY KEY, begining int, domain text, domain_type text, entity_template_name text, field_values listfieldmap, global_entity_type text, revision_time timeuuid, status_key int, status_name text, uuid timeuuid ) {code} INDEX: {code} CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain); CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values); CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type ); {code} QUERY {code} SELECT * FROM entity WHERE status_key 3 and field_values contains {key: 'userName', value: 'Sprint5_22'} and gen_type = 'USER' and domain = 'S4_1017.abc.com' allow filtering; {code} The above query return value for some row and not for many rows but those rows and data's are exist. Observation: If I execute query with other than field_maps, then it returns value. I suspect the problem with LIST with UDT. I have single node cassadra DB. Please let me know why this strange behavior from cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8421) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249666#comment-14249666 ] madheswaran edited comment on CASSANDRA-8421 at 12/17/14 10:10 AM: --- Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {code} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {code} 3) Try search operation ( Select query with condition ) was (Author: madheswaran): Hi Benjamin, Steps to reproduce: 1) Please create table as I mentioned with Secondary index's for other fields. 2) Insert just 4 records , similar to below {code} 723d4295-1ad2-4aba-8638-5bdd6b3be8b7 |1 | pcs.com |null | DefaultPCSTemplate_User | [{key: 'access', value: '[{entityId:c0e30978-9662-4d6e-9503-7fcfd4d7693c},{entityId:9af1e1e2-05bd-4929-a2cc-ff9e6526992c}]'}, {key: 'contactNumber', value: '89007'}, {key: 'firstName', value: 'James'}, {key: 'lastName', value: 'Smith'}, {key: 'primaryEmail', value: 'james9...@pcs.com'}, {key: 'roleName', value: 'admin'}, {key: 'userName', value: 'James9007'}, {key: 'userType', value: 'some'}, {key: 'password', value: 'James9007'}] | USER | 5a0dad10-7edf-11e4-8d47-4b86331ee8c7 | 0 | ACTIVE | null {code} 3) Try search operation ( Select with condition ) Cassandra 2.1.1 Cassandra 2.1.2 UDT not returning value for LIST type as UDT -- Key: CASSANDRA-8421 URL: https://issues.apache.org/jira/browse/CASSANDRA-8421 Project: Cassandra Issue Type: Bug Components: API Environment: single node cassandra Reporter: madheswaran Assignee: Benjamin Lerer Fix For: 3.0, 2.1.3 Attachments: 8421-unittest.txt, entity_data.csv I using List and its data type is UDT. UDT: {code} CREATE TYPE fieldmap ( key text, value text ); {code} TABLE: {code} CREATE TABLE entity ( entity_id uuid PRIMARY KEY, begining int, domain text, domain_type text, entity_template_name text, field_values listfieldmap, global_entity_type text, revision_time timeuuid, status_key int, status_name text, uuid timeuuid ) {code} INDEX: {code} CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain); CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values); CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type ); {code} QUERY {code} SELECT * FROM entity WHERE status_key 3 and field_values contains {key: 'userName', value: 'Sprint5_22'} and gen_type = 'USER' and domain = 'S4_1017.abc.com' allow filtering; {code} The above query return value for some row and not for many rows but those rows and data's are exist. Observation: If I execute query with other than field_maps, then it returns value. I suspect the problem with LIST with UDT. I have single node cassadra DB. Please let me know why this strange behavior from cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7304) Ability to distinguish between NULL and UNSET values in Prepared Statements
[ https://issues.apache.org/jira/browse/CASSANDRA-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249679#comment-14249679 ] Ben Hood commented on CASSANDRA-7304: - We've had [an issue raised with us at gocql|https://github.com/gocql/gocql/issues/296] that would like to take advantage of these changes, but I'm not 100% what the final scope of this patch will be. One first glance it looked like these were just driver side changes, and hence we were considering to what extent we could replicate this in our driver. There was mention that the wire protocol would be clarified to include the UNSET (-2) flag. But reading the latest patch, it looks like there might be some server side changes as well. To be fair, it looks like the understanding of what needs to happen has evolved since the issue was first raised. So I was wondering whether the intention of this can get restated of the benefit of external readers. Would it be possible to update the original summary to reflect the current motivation and scope of the change? Ability to distinguish between NULL and UNSET values in Prepared Statements --- Key: CASSANDRA-7304 URL: https://issues.apache.org/jira/browse/CASSANDRA-7304 Project: Cassandra Issue Type: Sub-task Reporter: Drew Kutcharian Assignee: Oded Peer Labels: cql, protocolv4 Fix For: 3.0 Attachments: 7304-03.patch, 7304-04.patch, 7304-2.patch, 7304.patch Currently Cassandra inserts tombstones when a value of a column is bound to NULL in a prepared statement. At higher insert rates managing all these tombstones becomes an unnecessary overhead. This limits the usefulness of the prepared statements since developers have to either create multiple prepared statements (each with a different combination of column names, which at times is just unfeasible because of the sheer number of possible combinations) or fall back to using regular (non-prepared) statements. This JIRA is here to explore the possibility of either: A. Have a flag on prepared statements that once set, tells Cassandra to ignore null columns or B. Have an UNSET value which makes Cassandra skip the null columns and not tombstone them Basically, in the context of a prepared statement, a null value means delete, but we don’t have anything that means ignore (besides creating a new prepared statement without the ignored column). Please refer to the original conversation on DataStax Java Driver mailing list for more background: https://groups.google.com/a/lists.datastax.com/d/topic/java-driver-user/cHE3OOSIXBU/discussion -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249690#comment-14249690 ] Marcus Eriksson commented on CASSANDRA-8316: I think we are simply timing out the Prepare message when TRACE is enabled (I can't even start a 8 node cluster with TRACE on) One solution could be to increase the timeout, but we use the same timeout for snapshot creation and that would be just as likely to fail on a heavily loaded cluster, wdyt [~yukim]? Also, note, that in your test you repair all ranges, meaning, when you repair node5 for example, you actually include node3,4,5,6,7, so you can't repair any of those at the same time Did not get positive replies from all endpoints error on incremental repair -- Key: CASSANDRA-8316 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.1.2 Reporter: Loic Lambiel Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: 0001-patch.patch, 8316-v2.patch, CassandraDaemon-2014-11-25-2.snapshot.tar.gz, CassandraDaemon-2014-12-14.snapshot.tar.gz, test.sh Hi, I've got an issue with incremental repairs on our production 15 nodes 2.1.2 (new cluster, not yet loaded, RF=3) After having successfully performed an incremental repair (-par -inc) on 3 nodes, I started receiving Repair failed with error Did not get positive replies from all endpoints. from nodetool on all remaining nodes : [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges for keyspace (seq=false, full=false) [2014-11-14 09:12:47,919] Repair failed with error Did not get positive replies from all endpoints. All the nodes are up and running and the local system log shows that the repair commands got started and that's it. I've also noticed that soon after the repair, several nodes started having more cpu load indefinitely without any particular reason (no tasks / queries, nothing in the logs). I then restarted C* on these nodes and retried the repair on several nodes, which were successful until facing the issue again. I tried to repro on our 3 nodes preproduction cluster without success It looks like I'm not the only one having this issue: http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html Any idea? Thanks Loic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again
[ https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249688#comment-14249688 ] Benedict commented on CASSANDRA-8449: - We can't hoist that one out to the outer layer because if something went wrong it would be tragic, as the system would keel over by being unable to reclaim memtable space. I agree this may not be worth the investment, although I don't know that mmap is so underutilised - for anyone with spare IO lying around, or datasets small enough to fit in-memory, mmap is likely to be the access method of choice. Implementing this change for the netty client is relatively easy: we can simply open/close our OpOrder reference in Message.Despatcher. For thrift this is trickier, as we need to modify the generated code or need to invoke a method before and after each RPC completes. I'm not sufficiently familiar with thrift to know how easy this is to achieve. I suspect the internode messages will actually be the trickiest to get right. Allow zero-copy reads again --- Key: CASSANDRA-8449 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Labels: performance Fix For: 3.0 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads accessing a ByteBuffer when the data was unmapped by compaction. Currently this code path is only used for uncompressed reads. The actual bytes are in fact copied to the client output buffers for both netty and thrift before being sent over the wire, so the only issue really is the time it takes to process the read internally. This patch adds a slow network read test and changes the tidy() method to actually delete a sstable once the readTimeout has elapsed giving plenty of time to serialize the read. Removing this copy causes significantly less GC on the read path and improves the tail latencies: http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249690#comment-14249690 ] Marcus Eriksson edited comment on CASSANDRA-8316 at 12/17/14 10:27 AM: --- I think we are simply timing out the Prepare message when TRACE is enabled (I can't even start a 8 node cluster with TRACE on) One solution could be to increase the timeout for this message, but we use the same timeout for snapshot creation and that would be just as likely to fail on a heavily loaded cluster, wdyt [~yukim]? Also, note, that in your test you repair all ranges, meaning, when you repair node5 for example, you actually include node3,4,5,6,7, so you can't repair any of those at the same time was (Author: krummas): I think we are simply timing out the Prepare message when TRACE is enabled (I can't even start a 8 node cluster with TRACE on) One solution could be to increase the timeout, but we use the same timeout for snapshot creation and that would be just as likely to fail on a heavily loaded cluster, wdyt [~yukim]? Also, note, that in your test you repair all ranges, meaning, when you repair node5 for example, you actually include node3,4,5,6,7, so you can't repair any of those at the same time Did not get positive replies from all endpoints error on incremental repair -- Key: CASSANDRA-8316 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.1.2 Reporter: Loic Lambiel Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: 0001-patch.patch, 8316-v2.patch, CassandraDaemon-2014-11-25-2.snapshot.tar.gz, CassandraDaemon-2014-12-14.snapshot.tar.gz, test.sh Hi, I've got an issue with incremental repairs on our production 15 nodes 2.1.2 (new cluster, not yet loaded, RF=3) After having successfully performed an incremental repair (-par -inc) on 3 nodes, I started receiving Repair failed with error Did not get positive replies from all endpoints. from nodetool on all remaining nodes : [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges for keyspace (seq=false, full=false) [2014-11-14 09:12:47,919] Repair failed with error Did not get positive replies from all endpoints. All the nodes are up and running and the local system log shows that the repair commands got started and that's it. I've also noticed that soon after the repair, several nodes started having more cpu load indefinitely without any particular reason (no tasks / queries, nothing in the logs). I then restarted C* on these nodes and retried the repair on several nodes, which were successful until facing the issue again. I tried to repro on our 3 nodes preproduction cluster without success It looks like I'm not the only one having this issue: http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html Any idea? Thanks Loic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-8191) After sstablesplit all nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey)
[ https://issues.apache.org/jira/browse/CASSANDRA-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson resolved CASSANDRA-8191. Resolution: Duplicate After sstablesplit all nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey) -- Key: CASSANDRA-8191 URL: https://issues.apache.org/jira/browse/CASSANDRA-8191 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev While recovering the cluster from CASSANDRA-7949 (using the flag from CASSANDRA-6621) I had to use sstablesplit tool to split large sstables. Nodes were off while using this tool and only one sstablesplit instance was running, of course. After splitting was done I have restarted the nodes and they all started compacting the data. All the nodes are logging the exceptions like this: {code} ERROR [CompactionExecutor:4028] 2014-10-26 23:14:52,653 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4028,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130525-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} It seems that scrubbing helps but scrubbing blocks the compactions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8496) Remove MemtablePostFlusher
Benedict created CASSANDRA-8496: --- Summary: Remove MemtablePostFlusher Key: CASSANDRA-8496 URL: https://issues.apache.org/jira/browse/CASSANDRA-8496 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Priority: Minor To improve clearing of the CL and the prompt completion of tasks waiting on flush in the case of transient errors, large flushes or slow disks, in 2.1 we could eliminate the post flusher altogether. Since we now enforce that Memtables track contiguous ranges, a relatively small change would permit Memtables to know the exact minimum as well as the currently known exact maximum. The CL could easily track the total dirty range, knowing that it must be contiguous, by using an AtomicLong instead of an AtomicInteger, and tracking both the min/max seen, not just the max. The only slight complexity will come in for tracking the _clean_ range as this can now be non-contiguous, if there are 3 memtable flushes covering the same CL segment, and one of them completes later. To solve this we can use an interval tree since these operations are infrequent, so the extra overhead is nominal. Once the interval tree completely overlaps the dirty range, we mark the entire dirty range clean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7275) Errors in FlushRunnable may leave threads hung
[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249718#comment-14249718 ] Benedict commented on CASSANDRA-7275: - I've filed CASSANDRA-8496, which would help with this problem in 2.1 only. It isn't sufficient to ensure the server stays stable, but would both avoid forward progress being stopped by errors on the post flusher, and that the affected commit log records would be retained indefinitely without resulting in infinite commit log growth. Errors in FlushRunnable may leave threads hung -- Key: CASSANDRA-7275 URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Pavel Yaskevich Priority: Minor Fix For: 2.0.12 Attachments: 0001-Move-latch.countDown-into-finally-block.patch, 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch In Memtable.FlushRunnable, the CountDownLatch will never be counted down if there are errors, which results in hanging any threads that are waiting for the flush to complete. For example, an error like this causes the problem: {noformat} ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:474,5,main] java.lang.IllegalArgumentException at java.nio.Buffer.position(Unknown Source) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) at org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) at org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7275) Errors in FlushRunnable may leave threads hung
[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249719#comment-14249719 ] Sylvain Lebresne commented on CASSANDRA-7275: - The current behavior is that an unexpected flush error blocks any flush thereon. It does seems to me that changing it so that it blocks only flushes for the column family on which there was a problem (which is not exactly what the patch does, and I do agree with Benedict that it does need to do that) is an improvement: if the problem happens for every CF then we're no worst than currently, but if it's a one-time event it might leave time for operators to take proper actions (of course, we should log a scary error, it's not something that should be ignored). So maybe we can start there since we don't seem to agree on whether crashing the node is an even better improvement? As far as my own opinion goes, I do am not in favor of crashing in that case because again, if you hold enough memtables in memory that your node become unresponsive, you're not really worth off that if you had crashed it right away, but if the problem ends up impacting a low traffic table (for instance a system table), you might be able to fix the problem in a way that is less impactful for your cluster. I'll note however that I would agree that if the error is a IO one, we should respect the disk_failure_policy. And I don't know, maybe we need another failure policy (best_effort/crash) for unexpected errors (aka bugs) that have the potential of destabilizing a node (I would agree that adding this is pushing the problem to our users, but it appears not everyone has the same idea on what is the best strategy, and there is maybe not a single good answer). Errors in FlushRunnable may leave threads hung -- Key: CASSANDRA-7275 URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Pavel Yaskevich Priority: Minor Fix For: 2.0.12 Attachments: 0001-Move-latch.countDown-into-finally-block.patch, 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch In Memtable.FlushRunnable, the CountDownLatch will never be counted down if there are errors, which results in hanging any threads that are waiting for the flush to complete. For example, an error like this causes the problem: {noformat} ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:474,5,main] java.lang.IllegalArgumentException at java.nio.Buffer.position(Unknown Source) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) at org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) at org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7275) Errors in FlushRunnable may leave threads hung
[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249733#comment-14249733 ] Benedict commented on CASSANDRA-7275: - Just to add to what Sylvain says about the size of the memtable, to hopefully help target a solution (spoken agnostically): in 2.1 we could become almost immediately unusable for writes if the memtable(s) we are retaining after this (or multiple exceptions) exceed a certain proportion of memory, as we will stop even trying to flush. So for 2.1 at least if we're going to try and stay alive we need to consider if we would prefer to drop writes on the floor (agressively, to avoid build up in the queue) if the set of memtables in limbo is too large, or if we drop memtables until we reclaim enough space to proceed, or if we introduce some special logic for flushing in this event. In 2.0, conversely, we may flush millions of tiny sstables in the wrong scenario, but this would not prevent function unless it permitted excess heap growth, or a compaction death spiral. Errors in FlushRunnable may leave threads hung -- Key: CASSANDRA-7275 URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Pavel Yaskevich Priority: Minor Fix For: 2.0.12 Attachments: 0001-Move-latch.countDown-into-finally-block.patch, 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch In Memtable.FlushRunnable, the CountDownLatch will never be counted down if there are errors, which results in hanging any threads that are waiting for the flush to complete. For example, an error like this causes the problem: {noformat} ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:474,5,main] java.lang.IllegalArgumentException at java.nio.Buffer.position(Unknown Source) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) at org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) at org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Conflicts: src/java/org/apache/cassandra/tools/StandaloneScrubber.java Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/417563a5 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/417563a5 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/417563a5 Branch: refs/heads/trunk Commit: 417563a59b9106aa1e225ef0e6412351bf4f2c9c Parents: e6c5982 6f98c6c Author: Marcus Eriksson marc...@apache.org Authored: Wed Dec 17 12:34:30 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Wed Dec 17 12:34:30 2014 +0100 -- CHANGES.txt | 1 + .../cassandra/tools/StandaloneScrubber.java | 54 ++-- 2 files changed, 39 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/417563a5/CHANGES.txt -- diff --cc CHANGES.txt index 6330e4b,9ec3585..9cea513 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,46 -1,5 +1,47 @@@ +3.0 + * Modernize schema tables (CASSANDRA-8261) + * Support for user-defined aggregation functions (CASSANDRA-8053) + * Fix NPE in SelectStatement with empty IN values (CASSANDRA-8419) + * Refactor SelectStatement, return IN results in natural order instead + of IN value list order (CASSANDRA-7981) + * Support UDTs, tuples, and collections in user-defined + functions (CASSANDRA-7563) + * Fix aggregate fn results on empty selection, result column name, + and cqlsh parsing (CASSANDRA-8229) + * Mark sstables as repaired after full repair (CASSANDRA-7586) + * Extend Descriptor to include a format value and refactor reader/writer apis (CASSANDRA-7443) + * Integrate JMH for microbenchmarks (CASSANDRA-8151) + * Keep sstable levels when bootstrapping (CASSANDRA-7460) + * Add Sigar library and perform basic OS settings check on startup (CASSANDRA-7838) + * Support for aggregation functions (CASSANDRA-4914) + * Remove cassandra-cli (CASSANDRA-7920) + * Accept dollar quoted strings in CQL (CASSANDRA-7769) + * Make assassinate a first class command (CASSANDRA-7935) + * Support IN clause on any clustering column (CASSANDRA-4762) + * Improve compaction logging (CASSANDRA-7818) + * Remove YamlFileNetworkTopologySnitch (CASSANDRA-7917) + * Do anticompaction in groups (CASSANDRA-6851) + * Support pure user-defined functions (CASSANDRA-7395, 7526, 7562, 7740, 7781, 7929, + 7924, 7812, 8063, 7813) + * Permit configurable timestamps with cassandra-stress (CASSANDRA-7416) + * Move sstable RandomAccessReader to nio2, which allows using the + FILE_SHARE_DELETE flag on Windows (CASSANDRA-4050) + * Remove CQL2 (CASSANDRA-5918) + * Add Thrift get_multi_slice call (CASSANDRA-6757) + * Optimize fetching multiple cells by name (CASSANDRA-6933) + * Allow compilation in java 8 (CASSANDRA-7028) + * Make incremental repair default (CASSANDRA-7250) + * Enable code coverage thru JaCoCo (CASSANDRA-7226) + * Switch external naming of 'column families' to 'tables' (CASSANDRA-4369) + * Shorten SSTable path (CASSANDRA-6962) + * Use unsafe mutations for most unit tests (CASSANDRA-6969) + * Fix race condition during calculation of pending ranges (CASSANDRA-7390) + * Fail on very large batch sizes (CASSANDRA-8011) + * Improve concurrency of repair (CASSANDRA-6455, 8208) + + 2.1.3 + * Make sstablescrub check leveled manifest again (CASSANDRA-8432) * Check first/last keys in sstable when giving out positions (CASSANDRA-8458) * Disable mmap on Windows (CASSANDRA-6993) * Add missing ConsistencyLevels to cassandra-stress (CASSANDRA-8253) http://git-wip-us.apache.org/repos/asf/cassandra/blob/417563a5/src/java/org/apache/cassandra/tools/StandaloneScrubber.java -- diff --cc src/java/org/apache/cassandra/tools/StandaloneScrubber.java index b6e2bf8,2a9763b..80640d0 --- a/src/java/org/apache/cassandra/tools/StandaloneScrubber.java +++ b/src/java/org/apache/cassandra/tools/StandaloneScrubber.java @@@ -21,9 -21,13 +21,12 @@@ package org.apache.cassandra.tools import java.io.File; import java.util.*; - import org.apache.cassandra.io.sstable.format.SSTableReader; + import com.google.common.base.Predicate; + import com.google.common.base.Predicates; + import com.google.common.collect.Iterables; + import com.google.common.collect.Lists; import org.apache.commons.cli.*; -import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.config.Schema; import org.apache.cassandra.db.ColumnFamilyStore; import org.apache.cassandra.db.Directories; @@@ -31,6 -36,7 +35,8 @@@ import org.apache.cassandra.db.compacti import
[1/2] cassandra git commit: Make sstablescrub check both the repaired and unrepaired leveled manifests
Repository: cassandra Updated Branches: refs/heads/trunk e6c5982fa - 417563a59 Make sstablescrub check both the repaired and unrepaired leveled manifests Patch by marcuse; reviewed by cyeksigian for CASSANDRA-8432 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6f98c6c4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6f98c6c4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6f98c6c4 Branch: refs/heads/trunk Commit: 6f98c6c4ed2e2b3d427cd10c2b2ada9c60b35acd Parents: d5e5f98 Author: Marcus Eriksson marc...@apache.org Authored: Mon Dec 8 11:45:59 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Wed Dec 17 12:31:40 2014 +0100 -- CHANGES.txt | 1 + .../cassandra/tools/StandaloneScrubber.java | 52 ++-- 2 files changed, 38 insertions(+), 15 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f98c6c4/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 410d49a..9ec3585 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.3 + * Make sstablescrub check leveled manifest again (CASSANDRA-8432) * Check first/last keys in sstable when giving out positions (CASSANDRA-8458) * Disable mmap on Windows (CASSANDRA-6993) * Add missing ConsistencyLevels to cassandra-stress (CASSANDRA-8253) http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f98c6c4/src/java/org/apache/cassandra/tools/StandaloneScrubber.java -- diff --git a/src/java/org/apache/cassandra/tools/StandaloneScrubber.java b/src/java/org/apache/cassandra/tools/StandaloneScrubber.java index 42799a5..2a9763b 100644 --- a/src/java/org/apache/cassandra/tools/StandaloneScrubber.java +++ b/src/java/org/apache/cassandra/tools/StandaloneScrubber.java @@ -21,6 +21,10 @@ package org.apache.cassandra.tools; import java.io.File; import java.util.*; +import com.google.common.base.Predicate; +import com.google.common.base.Predicates; +import com.google.common.collect.Iterables; +import com.google.common.collect.Lists; import org.apache.commons.cli.*; import org.apache.cassandra.config.DatabaseDescriptor; @@ -28,9 +32,11 @@ import org.apache.cassandra.config.Schema; import org.apache.cassandra.db.ColumnFamilyStore; import org.apache.cassandra.db.Directories; import org.apache.cassandra.db.Keyspace; +import org.apache.cassandra.db.compaction.AbstractCompactionStrategy; import org.apache.cassandra.db.compaction.LeveledCompactionStrategy; import org.apache.cassandra.db.compaction.LeveledManifest; import org.apache.cassandra.db.compaction.Scrubber; +import org.apache.cassandra.db.compaction.WrappingCompactionStrategy; import org.apache.cassandra.io.sstable.*; import org.apache.cassandra.utils.JVMStabilityInspector; import org.apache.cassandra.utils.OutputHandler; @@ -95,14 +101,6 @@ public class StandaloneScrubber } System.out.println(String.format(Pre-scrub sstables snapshotted into snapshot %s, snapshotName)); -LeveledManifest manifest = null; -// If leveled, load the manifest -if (cfs.getCompactionStrategy() instanceof LeveledCompactionStrategy) -{ -int maxSizeInMB = (int)((cfs.getCompactionStrategy().getMaxSSTableBytes()) / (1024L * 1024L)); -manifest = LeveledManifest.create(cfs, maxSizeInMB, sstables); -} - if (!options.manifestCheckOnly) { for (SSTableReader sstable : sstables) @@ -131,9 +129,8 @@ public class StandaloneScrubber } } -// Check (and repair) manifest -if (manifest != null) -checkManifest(manifest); +// Check (and repair) manifests +checkManifest(cfs.getCompactionStrategy(), cfs, sstables); SSTableDeletingTask.waitForDeletions(); System.exit(0); // We need that to stop non daemonized threads @@ -147,11 +144,36 @@ public class StandaloneScrubber } } -private static void checkManifest(LeveledManifest manifest) +private static void checkManifest(AbstractCompactionStrategy strategy, ColumnFamilyStore cfs, CollectionSSTableReader sstables) { -System.out.println(String.format(Checking leveled manifest)); -for (int i = 1; i = manifest.getLevelCount(); ++i) -manifest.repairOverlappingSSTables(i); +WrappingCompactionStrategy wrappingStrategy = (WrappingCompactionStrategy)strategy; +int maxSizeInMB = (int)((cfs.getCompactionStrategy().getMaxSSTableBytes()) / (1024L * 1024L)); +if
cassandra git commit: Make sstablescrub check both the repaired and unrepaired leveled manifests
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 d5e5f9800 - 6f98c6c4e Make sstablescrub check both the repaired and unrepaired leveled manifests Patch by marcuse; reviewed by cyeksigian for CASSANDRA-8432 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6f98c6c4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6f98c6c4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6f98c6c4 Branch: refs/heads/cassandra-2.1 Commit: 6f98c6c4ed2e2b3d427cd10c2b2ada9c60b35acd Parents: d5e5f98 Author: Marcus Eriksson marc...@apache.org Authored: Mon Dec 8 11:45:59 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Wed Dec 17 12:31:40 2014 +0100 -- CHANGES.txt | 1 + .../cassandra/tools/StandaloneScrubber.java | 52 ++-- 2 files changed, 38 insertions(+), 15 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f98c6c4/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 410d49a..9ec3585 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.3 + * Make sstablescrub check leveled manifest again (CASSANDRA-8432) * Check first/last keys in sstable when giving out positions (CASSANDRA-8458) * Disable mmap on Windows (CASSANDRA-6993) * Add missing ConsistencyLevels to cassandra-stress (CASSANDRA-8253) http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f98c6c4/src/java/org/apache/cassandra/tools/StandaloneScrubber.java -- diff --git a/src/java/org/apache/cassandra/tools/StandaloneScrubber.java b/src/java/org/apache/cassandra/tools/StandaloneScrubber.java index 42799a5..2a9763b 100644 --- a/src/java/org/apache/cassandra/tools/StandaloneScrubber.java +++ b/src/java/org/apache/cassandra/tools/StandaloneScrubber.java @@ -21,6 +21,10 @@ package org.apache.cassandra.tools; import java.io.File; import java.util.*; +import com.google.common.base.Predicate; +import com.google.common.base.Predicates; +import com.google.common.collect.Iterables; +import com.google.common.collect.Lists; import org.apache.commons.cli.*; import org.apache.cassandra.config.DatabaseDescriptor; @@ -28,9 +32,11 @@ import org.apache.cassandra.config.Schema; import org.apache.cassandra.db.ColumnFamilyStore; import org.apache.cassandra.db.Directories; import org.apache.cassandra.db.Keyspace; +import org.apache.cassandra.db.compaction.AbstractCompactionStrategy; import org.apache.cassandra.db.compaction.LeveledCompactionStrategy; import org.apache.cassandra.db.compaction.LeveledManifest; import org.apache.cassandra.db.compaction.Scrubber; +import org.apache.cassandra.db.compaction.WrappingCompactionStrategy; import org.apache.cassandra.io.sstable.*; import org.apache.cassandra.utils.JVMStabilityInspector; import org.apache.cassandra.utils.OutputHandler; @@ -95,14 +101,6 @@ public class StandaloneScrubber } System.out.println(String.format(Pre-scrub sstables snapshotted into snapshot %s, snapshotName)); -LeveledManifest manifest = null; -// If leveled, load the manifest -if (cfs.getCompactionStrategy() instanceof LeveledCompactionStrategy) -{ -int maxSizeInMB = (int)((cfs.getCompactionStrategy().getMaxSSTableBytes()) / (1024L * 1024L)); -manifest = LeveledManifest.create(cfs, maxSizeInMB, sstables); -} - if (!options.manifestCheckOnly) { for (SSTableReader sstable : sstables) @@ -131,9 +129,8 @@ public class StandaloneScrubber } } -// Check (and repair) manifest -if (manifest != null) -checkManifest(manifest); +// Check (and repair) manifests +checkManifest(cfs.getCompactionStrategy(), cfs, sstables); SSTableDeletingTask.waitForDeletions(); System.exit(0); // We need that to stop non daemonized threads @@ -147,11 +144,36 @@ public class StandaloneScrubber } } -private static void checkManifest(LeveledManifest manifest) +private static void checkManifest(AbstractCompactionStrategy strategy, ColumnFamilyStore cfs, CollectionSSTableReader sstables) { -System.out.println(String.format(Checking leveled manifest)); -for (int i = 1; i = manifest.getLevelCount(); ++i) -manifest.repairOverlappingSSTables(i); +WrappingCompactionStrategy wrappingStrategy = (WrappingCompactionStrategy)strategy; +int maxSizeInMB = (int)((cfs.getCompactionStrategy().getMaxSSTableBytes()) / (1024L * 1024L)); +
[jira] [Comment Edited] (CASSANDRA-7275) Errors in FlushRunnable may leave threads hung
[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249718#comment-14249718 ] Benedict edited comment on CASSANDRA-7275 at 12/17/14 11:47 AM: I've filed CASSANDRA-8496, which would help with this problem in 2.1 only. It isn't sufficient to ensure the server stays stable, but would both avoid forward progress being stopped by errors on the post flusher, and that the affected commit log records would be retained indefinitely without resulting in infinite commit log growth. I've also filed CASSANDRA-8497 and CASSANDRA-8498 which should help avoid data corruption in the cluster. was (Author: benedict): I've filed CASSANDRA-8496, which would help with this problem in 2.1 only. It isn't sufficient to ensure the server stays stable, but would both avoid forward progress being stopped by errors on the post flusher, and that the affected commit log records would be retained indefinitely without resulting in infinite commit log growth. Errors in FlushRunnable may leave threads hung -- Key: CASSANDRA-7275 URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Pavel Yaskevich Priority: Minor Fix For: 2.0.12 Attachments: 0001-Move-latch.countDown-into-finally-block.patch, 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch In Memtable.FlushRunnable, the CountDownLatch will never be counted down if there are errors, which results in hanging any threads that are waiting for the flush to complete. For example, an error like this causes the problem: {noformat} ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:474,5,main] java.lang.IllegalArgumentException at java.nio.Buffer.position(Unknown Source) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) at org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) at org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8463) Constant compaction under LCS
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8463: --- Reviewer: Yuki Morishita could you review [~yukim]? (unless you already checked the code as well [~rbranson]?) Constant compaction under LCS - Key: CASSANDRA-8463 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 Project: Cassandra Issue Type: Bug Components: Core Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 144G RAM, solid-state storage. Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. Heap is 32G total, 4G newsize. 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 memtable_cleanup_threshold concurrent_compactors: 20 Reporter: Rick Branson Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: 0001-better-logging.patch, 0001-make-sure-we-set-lastCompactedKey-properly.patch, log-for-8463.txt It appears that tables configured with LCS will completely re-compact themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 - 2.1.2, specifically). It starts out with 10 pending tasks for an hour or so, then starts building up, now with 50-100 tasks pending across the cluster after 12 hours. These nodes are under heavy write load, but were easily able to keep up in 2.0 (they rarely had 5 pending compaction tasks), so I don't think it's LCS in 2.1 actually being worse, just perhaps some different LCS behavior that causes the layout of tables from 2.0 to prompt the compactor to reorganize them? The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB SSTables due to the improved memtable setup in 2.1. Before I upgraded the entire cluster to 2.1, I noticed the problem and tried several variations on the flush size, thinking perhaps the larger tables in L0 were causing some kind of cascading compactions. Even if they're sized roughly like the 2.0 flushes were, same behavior occurs. I also tried both enabling disabling STCS in L0 with no real change other than L0 began to back up faster, so I left the STCS in L0 enabled. Tables are configured with 32MB sstable_size_in_mb, which was found to be an improvement on the 160MB table size for compaction performance. Maybe this is wrong now? Otherwise, the tables are configured with defaults. Compaction has been unthrottled to help them catch-up. The compaction threads stay very busy, with the cluster-wide CPU at 45% nice time. No nodes have completely caught up yet. I'll update JIRA with status about their progress if anything interesting happens. From a node around 12 hours ago, around an hour after the upgrade, with 19 pending compaction tasks: SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] Recently, with 41 pending compaction tasks: SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] More information about the use case: writes are roughly uniform across these tables. The data is sharded across these 8 tables by key to improve compaction parallelism. Each node receives up to 75,000 writes/sec sustained at peak, and a small number of reads. This is a pre-production cluster that's being warmed up with new data, so the low volume of reads (~100/sec per node) is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8498) Replaying commit log records that are older than gc_grace is dangerous
Benedict created CASSANDRA-8498: --- Summary: Replaying commit log records that are older than gc_grace is dangerous Key: CASSANDRA-8498 URL: https://issues.apache.org/jira/browse/CASSANDRA-8498 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict If we replay commit log records that are older than gc_grace we could introduce data corruption to the cluster. We should either (1) fail and suggest a repair, or (2) log an exception. I prefer (1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8497) Do not replay commit log records for tables that have been repaired since
Benedict created CASSANDRA-8497: --- Summary: Do not replay commit log records for tables that have been repaired since Key: CASSANDRA-8497 URL: https://issues.apache.org/jira/browse/CASSANDRA-8497 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Priority: Minor If somehow we have commit log records around since prior to a repair was completed, and we have repair-aware tombstone collection, then we could potentially reintroduce deleted data. Since we consider repaired data to be completely up-to-date, unless there has been a cross-cluster failure where data only ended up in the commit log, the commit log records will by now be superfluous. Some grace period may be necessary, as with CASSANDRA-6434. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8496) Remove MemtablePostFlusher
[ https://issues.apache.org/jira/browse/CASSANDRA-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-8496: Description: To improve clearing of the CL, prevent infinite growth, and ensure the prompt completion of tasks waiting on flush in the case of transient errors, large flushes or slow disks, in 2.1 we could eliminate the post flusher altogether. Since we now enforce that Memtables track contiguous ranges, a relatively small change would permit Memtables to know the exact minimum as well as the currently known exact maximum. The CL could easily track the total dirty range, knowing that it must be contiguous, by using an AtomicLong instead of an AtomicInteger, and tracking both the min/max seen, not just the max. The only slight complexity will come in for tracking the _clean_ range as this can now be non-contiguous, if there are 3 memtable flushes covering the same CL segment, and one of them completes later. To solve this we can use an interval tree since these operations are infrequent, so the extra overhead is nominal. Once the interval tree completely overlaps the dirty range, we mark the entire dirty range clean. was: To improve clearing of the CL and the prompt completion of tasks waiting on flush in the case of transient errors, large flushes or slow disks, in 2.1 we could eliminate the post flusher altogether. Since we now enforce that Memtables track contiguous ranges, a relatively small change would permit Memtables to know the exact minimum as well as the currently known exact maximum. The CL could easily track the total dirty range, knowing that it must be contiguous, by using an AtomicLong instead of an AtomicInteger, and tracking both the min/max seen, not just the max. The only slight complexity will come in for tracking the _clean_ range as this can now be non-contiguous, if there are 3 memtable flushes covering the same CL segment, and one of them completes later. To solve this we can use an interval tree since these operations are infrequent, so the extra overhead is nominal. Once the interval tree completely overlaps the dirty range, we mark the entire dirty range clean. Remove MemtablePostFlusher -- Key: CASSANDRA-8496 URL: https://issues.apache.org/jira/browse/CASSANDRA-8496 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Priority: Minor To improve clearing of the CL, prevent infinite growth, and ensure the prompt completion of tasks waiting on flush in the case of transient errors, large flushes or slow disks, in 2.1 we could eliminate the post flusher altogether. Since we now enforce that Memtables track contiguous ranges, a relatively small change would permit Memtables to know the exact minimum as well as the currently known exact maximum. The CL could easily track the total dirty range, knowing that it must be contiguous, by using an AtomicLong instead of an AtomicInteger, and tracking both the min/max seen, not just the max. The only slight complexity will come in for tracking the _clean_ range as this can now be non-contiguous, if there are 3 memtable flushes covering the same CL segment, and one of them completes later. To solve this we can use an interval tree since these operations are infrequent, so the extra overhead is nominal. Once the interval tree completely overlaps the dirty range, we mark the entire dirty range clean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8485) Move 2.0 metered flusher to its own thread
[ https://issues.apache.org/jira/browse/CASSANDRA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-8485: - Attachment: 8485.txt Move 2.0 metered flusher to its own thread -- Key: CASSANDRA-8485 URL: https://issues.apache.org/jira/browse/CASSANDRA-8485 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 2.0.12 Attachments: 8485.txt We are using SS.optionalTasks for the MF right now - something we most definitely should not be doing, given just how important running MF regularly is to the stability of a node. Currently a bunch of other tasks are also using SS.optionalTasks (like serializing caches). See also: CASSANDRA-8285. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8485) Move 2.0 metered flusher to its own thread
[ https://issues.apache.org/jira/browse/CASSANDRA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-8485: - Reviewer: Sam Tunnicliffe Move 2.0 metered flusher to its own thread -- Key: CASSANDRA-8485 URL: https://issues.apache.org/jira/browse/CASSANDRA-8485 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 2.0.12 Attachments: 8485.txt We are using SS.optionalTasks for the MF right now - something we most definitely should not be doing, given just how important running MF regularly is to the stability of a node. Currently a bunch of other tasks are also using SS.optionalTasks (like serializing caches). See also: CASSANDRA-8285. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8499) On SSTableWriter.abort() we do not free the bloom filter
Benedict created CASSANDRA-8499: --- Summary: On SSTableWriter.abort() we do not free the bloom filter Key: CASSANDRA-8499 URL: https://issues.apache.org/jira/browse/CASSANDRA-8499 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Fix For: 2.0.12 Although we do try to sync it to disk, which is also probably not a good idea. This affects 2.0 and earlier only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249854#comment-14249854 ] sankalp kohli commented on CASSANDRA-8457: -- I agree with [~iamaleksey]. We should try this with many nodes. nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249884#comment-14249884 ] T Jake Luciani commented on CASSANDRA-8457: --- This test doesn't deal with the signaling cost within the node. We have a thread blocking on the readExecutors completing. This does tie into CASSANDRA-5239 but part of the goal here is to push the callbacks to the edge without relying on explicit block/signalling calls. nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249930#comment-14249930 ] Ariel Weisberg commented on CASSANDRA-8457: --- #1 The wakeup is protected by a CAS so in the common case there shouldn't be multiple threads contending to dispatch. The synchronized block is there for the case where the thread that is finishing up dispatching signals that it is going to sleep and a dispatch task will be necessary for the next submission. At that point it has to check the queue one more time to avoid lost wakeups, and it is possible a new dispatch task will be created while that is happening. The synchronized forces the new task to wait while the last check and drain completes. How often this race occurs and blocks a thread I have no idea. I could add a counter and check. The only way to avoid it is to lock while checking the queue empty condition and updating the needs wakeup field, or to have a 1:1 mapping between sockets and dispatch threads (AKA not SEP). This would force producers to lock on task submission as well. I don't see how the dispatch task can atomically check that there is no work to do and set the needs wakeup flag at the same time. At that point is there a reason to use a lock free queue? #2 I didn't replace the queue because I needed to maintain size for the dropped message functionality and I didn't want to reason about maintaining size non-atomically with queue operations like offer/poll/drainto. I could give it a whirl. I am also not sure how well iterator.remove in CLQ works, but I can check. #3 Indeed this is a a typo Jake it definitely doesn't address several sources of signaling, but should reduce total # of threads signaled per request. I will profile the two versions today and then add more nodes. For benchmark purposes I could disable the message dropping functionality and use MPSCLinkedQueue from Netty. nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249940#comment-14249940 ] Benedict commented on CASSANDRA-8457: - bq. The only way to avoid it is to lock while checking the queue empty condition and updating the needs wakeup field, or to have a 1:1 mapping between sockets and dispatch threads (AKA not SEP). This would force producers to lock on task submission as well. I don't see how the dispatch task can atomically check that there is no work to do and set the needs wakeup flag at the same time. At that point is there a reason to use a lock free queue? {code} private Runnable dispatchTask = new Runnable() { @Override public void run() { while (true) { while (!backlog.isEmpty()) dispatchQueue(); needsWakeup = true; if (backlog.isEmpty() || !needsWakeupUpdater.compareAndSet(OutboundTcpConnection.this, TRUE, FALSE)) return; } } }; {code} nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249943#comment-14249943 ] Ariel Weisberg commented on CASSANDRA-8457: --- Well there you have it. Thanks Benedict. nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8500) Improve cassandra-stress help pages
Benedict created CASSANDRA-8500: --- Summary: Improve cassandra-stress help pages Key: CASSANDRA-8500 URL: https://issues.apache.org/jira/browse/CASSANDRA-8500 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Benedict cassandra-stress is flummoxing a lot of people. As well as rewriting its README, we should improve the help pages so that they're more legible. We should offer an all option that prints every sub-page, so it can be scanned like a README (and perhaps make the basis of said file), and we should at least stop printing all of the distribution parameter options every time they appear, as they're very common now. Offering some help about how to make the best out of the help might itself be a good idea, as well as perhaps printing what all of the options within each subgroup are in the summary page, so there is no pecking at them to be done. There should be a dedicated distribution help page that can explain all of the parameters that are currently just given names we hope are sufficiently descriptive. Finally, we should make sure all of the descriptions of each option are clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249979#comment-14249979 ] T Jake Luciani commented on CASSANDRA-7019: --- Just a note that this should also work on repaired sstables. As mentioned in CASSANDRA-7272 we repair the entire partition so we will end up with N copies of a partition in the repaired sstables. Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7272) Add Major Compaction to LCS
[ https://issues.apache.org/jira/browse/CASSANDRA-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249984#comment-14249984 ] Marcus Eriksson commented on CASSANDRA-7272: patch is here, needs rebase so not setting patch available yet https://github.com/krummas/cassandra/commits/marcuse/7019-2 Add Major Compaction to LCS -- Key: CASSANDRA-7272 URL: https://issues.apache.org/jira/browse/CASSANDRA-7272 Project: Cassandra Issue Type: Improvement Components: Core Reporter: T Jake Luciani Priority: Minor Labels: compaction Fix For: 3.0 LCS has a number of minor issues (maybe major depending on your perspective). LCS is primarily used for wide rows so for instance when you repair data in LCS you end up with a copy of an entire repaired row in L0. Over time if you repair you end up with multiple copies of a row in L0 - L5. This can make predicting disk usage confusing. Another issue is cleaning up tombstoned data. If a tombstone lives in level 1 and data for the cell lives in level 5 the data will not be reclaimed from disk until the tombstone reaches level 5. I propose we add a major compaction for LCS that forces consolidation of data to level 5 to address these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8464) Support direct buffer decompression for reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250014#comment-14250014 ] Branimir Lambov commented on CASSANDRA-8464: Very impressive performance gain. My comments: - The patch is in effect introducing a new {{CompressedRandomAccessReader}} implementation that uses memory mapping, in addition to the existing one that uses on-heap buffers. Rather than separating the two using multiple if/else checks inside the code, wouldn't this be structurally clearer if we created separate subclasses for the two? The latter is also more efficient as it replaces ifs with virtual calls which the JIT is better equipped to handle; the JRE will never see the irrelevant code, and space will not be wasted for irrelevant fields. - Will the CRAR be used with =2G files very often? If the 2G case is predominant it would make sense to further separate the code in a 2G-optimized class with single mmap buffer in addition to the current {{TreeMap}}-based catch-all. - I'd prefer not to refer to {{sub.nio.ch.DirectBuffer}} throughout code; its references should be confined to the {{FileUtils}} class, for example by changing {{FileUtils.clean()}} to take a {{ByteBuffer}} argument and do the conversion or skip cleaning for non-direct ones. Maybe we should also move the {{isCleanerAvailable()}} check into it (it should be easily optimized away by the JIT) and make cleaning a single call rather than the isDirect, isCleanerAvailable, cast sequence it is now. - Nit on {{FileUtils.clean()}} uses: since {{isCleanerAvailable()}} does not change value, it should be the first thing tested in all ifs with more than one condition. This makes the JIT's job easier. - {{CRAR.allocateBuffer}}: for Snappy {{uncompress(ByteBuffer)}} both buffers need to be direct. You could perhaps revive parts of CASSANDRA-6762 to deal with this (and support {{DeflateCompressor}}). - CRAR static init: {{FBUtilities.supportsDirectChecksum()}} could return true initially, but revert to false at the first invoke attempt. The choice for useMmap is made before that happens. Perhaps we could do a test invoke in the static init instead of letting the value change later? - {{FBUtilities.directCheckSum}}: Does the checksum-on-ByteBuffer trick work with non-Oracle JVMs? Have we tested what happens on one? - {{FBUtilities.directCheckSum}}: In the fallback case, are we sure we never change buffers' byte order? ({{getInt()}} might return swapped bytes if not) A chunked loop as done in CASSANDRA-6762 is safer and could be faster. (If you worry about the allocation, we could probably provide a thread-local scratch array in {{FBUtilities}} or similar.) - {{LZ4Compressor::uncompress}}: No need for {{hasArray()}} checks, decompressor will do this if it helps (it doesn't normally). You shouldn't need to duplicate any of the buffers or copy into byte array, just use buffer's {{get(position + ...)}}. - The new compressor and checksum functionality should be unit-tested, as well as all fallbacks; CASSANDRA-6762 has some tests you could reuse. Support direct buffer decompression for reads - Key: CASSANDRA-8464 URL: https://issues.apache.org/jira/browse/CASSANDRA-8464 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: performance Fix For: 3.0 Currently when we read a compressed sstable we copy the data on heap then send it to be de-compressed to another on heap buffer (albeit pooled). But now both snappy and lz4 (with CASSANDRA-7039) allow decompression of direct byte buffers. This lets us mmap the data and decompress completely off heap (and avoids moving bytes over JNI). One issue is performing the checksum offheap but the Adler32 does support in java 8 (it's also in java 7 but marked private?!) This change yields a 10% boost in read performance on cstar. Locally I see upto 30% improvement. http://cstar.datastax.com/graph?stats=5ebcdd70-816b-11e4-aed6-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=200.09ymin=0ymax=135908.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7281) SELECT on tuple relations are broken for mixed ASC/DESC clustering order
[ https://issues.apache.org/jira/browse/CASSANDRA-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250021#comment-14250021 ] Philip Thompson commented on CASSANDRA-7281: Any chance this could get reviewed before the holidays? SELECT on tuple relations are broken for mixed ASC/DESC clustering order Key: CASSANDRA-7281 URL: https://issues.apache.org/jira/browse/CASSANDRA-7281 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Marcin Szymaniuk Fix For: 2.0.12 Attachments: 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-.patch, 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-v2.patch, 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-v3.patch As noted on [CASSANDRA-6875|https://issues.apache.org/jira/browse/CASSANDRA-6875?focusedCommentId=13992153page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13992153], the tuple notation is broken when the clustering order mixes ASC and DESC directives because the range of data they describe don't correspond to a single continuous slice internally. To copy the example from CASSANDRA-6875: {noformat} cqlsh:ks create table foo (a int, b int, c int, PRIMARY KEY (a, b, c)) WITH CLUSTERING ORDER BY (b DESC, c ASC); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 2, 0); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 1, 0); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 1, 1); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 0, 0); cqlsh:ks SELECT * FROM foo WHERE a=0; a | b | c ---+---+--- 0 | 2 | 0 0 | 1 | 0 0 | 1 | 1 0 | 0 | 0 (4 rows) cqlsh:ks SELECT * FROM foo WHERE a=0 AND (b, c) (1, 0); a | b | c ---+---+--- 0 | 2 | 0 (1 rows) {noformat} The last query should really return {{(0, 2, 0)}} and {{(0, 1, 1)}}. For that specific example we should generate 2 internal slices, but I believe that with more clustering columns we may have more slices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8462) Upgrading a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-8462: Fix Version/s: 2.1.3 Upgrading a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes - Key: CASSANDRA-8462 URL: https://issues.apache.org/jira/browse/CASSANDRA-8462 Project: Cassandra Issue Type: Bug Components: Core Reporter: Rick Branson Assignee: Aleksey Yeschenko Fix For: 2.1.3 Added a 2.1.2 node to a cluster running 2.0.11. Didn't make any schema changes. When I tried to reboot one of the 2.0 nodes, it failed to boot with this exception. Besides an obvious fix, any workarounds for this? {noformat} java.lang.IllegalArgumentException: No enum constant org.apache.cassandra.config.CFMetaData.Caching.{keys:ALL, rows_per_partition:NONE} at java.lang.Enum.valueOf(Enum.java:236) at org.apache.cassandra.config.CFMetaData$Caching.valueOf(CFMetaData.java:286) at org.apache.cassandra.config.CFMetaData.fromSchemaNoColumnsNoTriggers(CFMetaData.java:1713) at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1793) at org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:307) at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:288) at org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:131) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:529) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:270) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8501) CompressionMetaData.Writer has no abort method
Benedict created CASSANDRA-8501: --- Summary: CompressionMetaData.Writer has no abort method Key: CASSANDRA-8501 URL: https://issues.apache.org/jira/browse/CASSANDRA-8501 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Priority: Minor Affects all versions, and results in a small amount of memory leakage for each compressed sstable we fail to write -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8502) Static columns returning null when paging is used
Flavien Charlon created CASSANDRA-8502: -- Summary: Static columns returning null when paging is used Key: CASSANDRA-8502 URL: https://issues.apache.org/jira/browse/CASSANDRA-8502 Project: Cassandra Issue Type: Bug Components: Core Reporter: Flavien Charlon Attachments: null-static-column.txt When paging is used for a query containing a static column, the first page contains the right value for the static column, but subsequent pages have null null for the static column instead of the expected value. Repro steps: - Create a table with a static column - Create a partition with 500 cells - Using cqlsh, query that partition Actual result: - You will see that first, the static column appears as expected, but if you press a key after ---MORE---, the static columns will appear as null. See the attached file for a repro of the output. I am using a single node cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8464) Support direct buffer decompression for reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250074#comment-14250074 ] T Jake Luciani commented on CASSANDRA-8464: --- bq. wouldn't this be structurally clearer if we created separate subclasses for the two? I started that way but it would be a much more invasive change since the CompressedThrottledReader extends CRAR. And there would be multiple places in the code that call open() that would need a switch for mmaped, non throttled vs non-mmapped unthrottled, etc etc. So this felt cleaner. If that answer is good for you I'll rebase and address your other comments which all look good thx. Support direct buffer decompression for reads - Key: CASSANDRA-8464 URL: https://issues.apache.org/jira/browse/CASSANDRA-8464 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: performance Fix For: 3.0 Currently when we read a compressed sstable we copy the data on heap then send it to be de-compressed to another on heap buffer (albeit pooled). But now both snappy and lz4 (with CASSANDRA-7039) allow decompression of direct byte buffers. This lets us mmap the data and decompress completely off heap (and avoids moving bytes over JNI). One issue is performing the checksum offheap but the Adler32 does support in java 8 (it's also in java 7 but marked private?!) This change yields a 10% boost in read performance on cstar. Locally I see upto 30% improvement. http://cstar.datastax.com/graph?stats=5ebcdd70-816b-11e4-aed6-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=200.09ymin=0ymax=135908.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8502) Static columns returning null for pages after first
[ https://issues.apache.org/jira/browse/CASSANDRA-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavien Charlon updated CASSANDRA-8502: --- Summary: Static columns returning null for pages after first (was: Static columns returning null when paging is used) Static columns returning null for pages after first --- Key: CASSANDRA-8502 URL: https://issues.apache.org/jira/browse/CASSANDRA-8502 Project: Cassandra Issue Type: Bug Components: Core Reporter: Flavien Charlon Attachments: null-static-column.txt When paging is used for a query containing a static column, the first page contains the right value for the static column, but subsequent pages have null null for the static column instead of the expected value. Repro steps: - Create a table with a static column - Create a partition with 500 cells - Using cqlsh, query that partition Actual result: - You will see that first, the static column appears as expected, but if you press a key after ---MORE---, the static columns will appear as null. See the attached file for a repro of the output. I am using a single node cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
Ryan McGuire created CASSANDRA-8503: --- Summary: Collect important stress profiles for regression analysis done by jenkins Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an exacmple: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250091#comment-14250091 ] Ryan McGuire commented on CASSANDRA-8503: - [~slebresne] [~thobbs] [~benedict] [~tjake] [~brandon.williams] please comment, do you have stress profiles, or ideas to contriubte? Collect important stress profiles for regression analysis done by jenkins - Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an exacmple: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250097#comment-14250097 ] Ryan McGuire commented on CASSANDRA-8503: - [~jasobrown] I have this stress profile that [~iamaleksey] forwarded me from you, think it's worthwhile to run? http://enigmacurry.com/tmp/abtests.yaml Collect important stress profiles for regression analysis done by jenkins - Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an exacmple: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McGuire updated CASSANDRA-8503: Description: We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an example: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... was: We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an exacmple: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... Collect important stress profiles for regression analysis done by jenkins - Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an example: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8502) Static columns returning null for pages after first
[ https://issues.apache.org/jira/browse/CASSANDRA-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-8502: Assignee: Tyler Hobbs Static columns returning null for pages after first --- Key: CASSANDRA-8502 URL: https://issues.apache.org/jira/browse/CASSANDRA-8502 Project: Cassandra Issue Type: Bug Components: Core Reporter: Flavien Charlon Assignee: Tyler Hobbs Attachments: null-static-column.txt When paging is used for a query containing a static column, the first page contains the right value for the static column, but subsequent pages have null null for the static column instead of the expected value. Repro steps: - Create a table with a static column - Create a partition with 500 cells - Using cqlsh, query that partition Actual result: - You will see that first, the static column appears as expected, but if you press a key after ---MORE---, the static columns will appear as null. See the attached file for a repro of the output. I am using a single node cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8502) Static columns returning null for pages after first
[ https://issues.apache.org/jira/browse/CASSANDRA-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250100#comment-14250100 ] Sylvain Lebresne commented on CASSANDRA-8502: - It's actually pretty easy to explain, as if we page in the middle of a partition, we'll start reading in the middle of said partition and thus miss the static data. I guess we'll have to detect if there is static columns to be fetched and include back a slice for the static block when we restart paging. A tad hairy but doable. Static columns returning null for pages after first --- Key: CASSANDRA-8502 URL: https://issues.apache.org/jira/browse/CASSANDRA-8502 Project: Cassandra Issue Type: Bug Components: Core Reporter: Flavien Charlon Attachments: null-static-column.txt When paging is used for a query containing a static column, the first page contains the right value for the static column, but subsequent pages have null null for the static column instead of the expected value. Repro steps: - Create a table with a static column - Create a partition with 500 cells - Using cqlsh, query that partition Actual result: - You will see that first, the static column appears as expected, but if you press a key after ---MORE---, the static columns will appear as null. See the attached file for a repro of the output. I am using a single node cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250108#comment-14250108 ] Jonathan Shook commented on CASSANDRA-8503: --- Currently, stress doesn't have the functionality necessary to test time-series beyond an in-memory size. It needs to support monotonically increasing times with no sorting requirements before earnest time-series tests can be run. Collect important stress profiles for regression analysis done by jenkins - Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an example: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8498) Replaying commit log records that are older than gc_grace is dangerous
[ https://issues.apache.org/jira/browse/CASSANDRA-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250111#comment-14250111 ] Jonathan Ellis commented on CASSANDRA-8498: --- What does failing buy us, other than making it really obvious that there's a problem? I'd prefer to log an error and skip the records involved but otherwise start up normally. Replaying commit log records that are older than gc_grace is dangerous -- Key: CASSANDRA-8498 URL: https://issues.apache.org/jira/browse/CASSANDRA-8498 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict If we replay commit log records that are older than gc_grace we could introduce data corruption to the cluster. We should either (1) fail and suggest a repair, or (2) log an exception. I prefer (1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7275) Errors in FlushRunnable may leave threads hung
[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250118#comment-14250118 ] Jonathan Ellis commented on CASSANDRA-7275: --- bq. if you hold enough memtables in memory that your node become unresponsive, you're not really worse off than if you had crashed it right away I disagree: we have a ton of evidence to date that a node that slowly falls over as it OOMs is much worse than a node that dies and gets marked down quickly by the FD. Errors in FlushRunnable may leave threads hung -- Key: CASSANDRA-7275 URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Pavel Yaskevich Priority: Minor Fix For: 2.0.12 Attachments: 0001-Move-latch.countDown-into-finally-block.patch, 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch In Memtable.FlushRunnable, the CountDownLatch will never be counted down if there are errors, which results in hanging any threads that are waiting for the flush to complete. For example, an error like this causes the problem: {noformat} ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:474,5,main] java.lang.IllegalArgumentException at java.nio.Buffer.position(Unknown Source) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) at org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) at org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8498) Replaying commit log records that are older than gc_grace is dangerous
[ https://issues.apache.org/jira/browse/CASSANDRA-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250121#comment-14250121 ] Benedict commented on CASSANDRA-8498: - In this situation a repair is a _really_ good idea, is all. Perhaps we should only fail if there hasn't been a sufficiently new repair to have likely fixed any issue? Of course, I know I have the strongest penchant for C* autodeath, but mostly because we know that users do not read their log files as diligently as they should, and we get blamed if there are data corruption or loss problems. Replaying commit log records that are older than gc_grace is dangerous -- Key: CASSANDRA-8498 URL: https://issues.apache.org/jira/browse/CASSANDRA-8498 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict If we replay commit log records that are older than gc_grace we could introduce data corruption to the cluster. We should either (1) fail and suggest a repair, or (2) log an exception. I prefer (1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250126#comment-14250126 ] Jonathan Ellis commented on CASSANDRA-8503: --- /cc [~aweisberg] Collect important stress profiles for regression analysis done by jenkins - Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an example: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250130#comment-14250130 ] jonathan lacefield commented on CASSANDRA-8447: --- applied patch from CASSANDRA-8485. prelim results look good. previous behavior of full GC isn't evident after 3 hours of testing. Will conduct another round of testing overnight. If results are still good, then it is recommended to close this one as solved. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655 ] Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 5:41 PM: - Honestly, I don't think it would benefit Spark integration: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be submitted and executed in parallel *** each token range will be of size *hundreds* of MBs, which is enough large to hide one or two seeks Some *real* performance problems we (and users) observed: * Cassandra is taking plenty of CPU when doing sequential scans. It is not possible to saturate bandwidth of a single laptop spinning HDD, because all cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, merging rows from different SSTables, ordering cells, filtering out tombstones, serializing etc. The problem doesn't go away after doing full compaction or disabling vnodes. This is a serious problem, because doing exactly the same query on a plain text file stored in CFS (still C*, but data stored as 2MB blobs) gives 3-30x performance boost (depending on who did the benchmark). We need to close this gap. * We need to improve backpressure mechanism at least in such a way that the driver or Spark connector would know to start throttling writes if the cluster doesn't keep up. Currently Cassandra just timeouts the writes, but once it happens, the driver has no clue how long to wait until it is ok to resubmit the update. It would be actually good to know long enough before timing out, so we could slow down and avoid wasteful retrying at all. Currently it is not possible to predict cluster load by e.g. observing write latency, because the latency is extremely good until it is suddently terrible (timeout). This is also important for other non-Spark related use cases. See https://issues.apache.org/jira/browse/CASSANDRA-7937. was (Author: pkolaczk): Honestly, I don't think it would benefit Spark integration: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be
[jira] [Commented] (CASSANDRA-8464) Support direct buffer decompression for reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250202#comment-14250202 ] Branimir Lambov commented on CASSANDRA-8464: It is fine with me if you proceed with the current structure. On the other hand, we probably do need to change the way throttling is done: it appears that it would be doing the wrong thing in the mmapped case (and perhaps generally in the compressed case as it's counting uncompressed data); I wonder if it is safe to use memory-mapped readers for throttled compaction at all. As to the changes to {{open()}} users, there should not be any as this factory method is the natural place to choose the type of reader to create. Support direct buffer decompression for reads - Key: CASSANDRA-8464 URL: https://issues.apache.org/jira/browse/CASSANDRA-8464 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: performance Fix For: 3.0 Currently when we read a compressed sstable we copy the data on heap then send it to be de-compressed to another on heap buffer (albeit pooled). But now both snappy and lz4 (with CASSANDRA-7039) allow decompression of direct byte buffers. This lets us mmap the data and decompress completely off heap (and avoids moving bytes over JNI). One issue is performing the checksum offheap but the Adler32 does support in java 8 (it's also in java 7 but marked private?!) This change yields a 10% boost in read performance on cstar. Locally I see upto 30% improvement. http://cstar.datastax.com/graph?stats=5ebcdd70-816b-11e4-aed6-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=200.09ymin=0ymax=135908.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250222#comment-14250222 ] Jon Haddad commented on CASSANDRA-7296: --- Good points. I think this issue would result in other, perhaps more serious problems, making an appearance. I am not convinced, however, that NUM_TOKENS = NUM_QUERIES is the right solution on the spark side either, under the case of (data disk disk_type == spinning_rust). I think we can move any future discussion to the driver JIRA and reference this from there. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250222#comment-14250222 ] Jon Haddad edited comment on CASSANDRA-7296 at 12/17/14 6:05 PM: - Good points. I think this issue would result in other, perhaps more serious problems, making an appearance. I am not convinced, however, that NUM_TOKENS = NUM_QUERIES is the right solution on the spark side either, under the case of (data size disk size disk_type == spinning_rust). I think we can move any future discussion to the driver JIRA and reference this from there. was (Author: rustyrazorblade): Good points. I think this issue would result in other, perhaps more serious problems, making an appearance. I am not convinced, however, that NUM_TOKENS = NUM_QUERIES is the right solution on the spark side either, under the case of (data disk disk_type == spinning_rust). I think we can move any future discussion to the driver JIRA and reference this from there. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250242#comment-14250242 ] Ariel Weisberg commented on CASSANDRA-8503: --- I think there are two general classes of benchmarks you would run in CI. Representative user workloads, and targeted microbenchmark workloads. Targeted workloads are a huge help during ongoing development because they magnify the impact of regressions from code changes that are harder to notice in representative workloads. They also point to the specific subsystem being benchmarked. I will just cover the microbenchmarks. The full matrix is large so there is an element of wanting ponies, but the reality is that they are all interesting from a preventing performance regressions and understanding the impact of ongoing changes perspective. Benchmark the stress client, so excess server capacity and a single client testing lots of small messages. Lots of large messages. Stuff the servers can answer as fast as possible. The flip side of this workload is the same thing but for the server where you measure how many trivially answerable tiny queries you can shove through a cluster given excess client capacity. Benchmark perfomance of non-prepared statements. Benchmark performance of preparing statements? A full test matrix for data intensive workloads would test read, write, and 50/50, and for a bonus 90/10. Single cell partitions with a small value and a large value, and a range of wide rows (small, medium, large). All 3 compaction strategies with compression on/off. Data intensive workloads also need to run against a spinning rust and SSDs. CQL specific microbenchmarks against specific CQL datatypes. If there are interactions that are important we should capture those. Counters Lightweight transactions The matrix also needs to include different permutations of replication strategies and consistency levels. Maybe we can constrain those variations to parts of the matrix that would best reflect the impact of replication strategies and CL. Probably a subset of the data intensive workloads. Also want a workload targeting the row cache and key cache when everything is cached and when there is a realistic long tail of data not in the cache. For every workload to really get the value you would like a graph for throughput and a graph for latency at some percentile with a data point per revision tested going back to the beginning as well as a 90 day graph. A trend line also helps. Then someone has to be it for monitoring the graphs and poking people when there is an issue. The workflow usually goes something like the monitor tags the author of the suspected bad revision who triages it and either fixes it or hands it off to the correct person. Timeliness is really important because once regressions start stacking it's a pain to know whether you have done what you should to fix it. Collect important stress profiles for regression analysis done by jenkins - Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an example: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250242#comment-14250242 ] Ariel Weisberg edited comment on CASSANDRA-8503 at 12/17/14 6:17 PM: - I think there are two general classes of benchmarks you would run in CI. Representative user workloads, and targeted microbenchmark workloads. Targeted workloads are a huge help during ongoing development because they magnify the impact of regressions from code changes that are harder to notice in representative workloads. They also point to the specific subsystem being benchmarked. I will just cover the microbenchmarks. The full matrix is large so there is an element of wanting ponies, but the reality is that they are all interesting from a preventing performance regressions and understanding the impact of ongoing changes perspective. Benchmark the stress client, so excess server capacity and a single client testing lots of small messages. Lots of large messages. Stuff the servers can answer as fast as possible. The flip side of this workload is the same thing but for the server where you measure how many trivially answerable tiny queries you can shove through a cluster given excess client capacity. When testing the server this might also be when you test the matrix of replication and consistency levels. Benchmark perfomance of non-prepared statements. Benchmark performance of preparing statements? A full test matrix for data intensive workloads would test read, write, and 50/50, and for a bonus 90/10. Single cell partitions with a small value and a large value, and a range of wide rows (small, medium, large). All 3 compaction strategies with compression on/off. Data intensive workloads also need to run against a spinning rust and SSDs. CQL specific microbenchmarks against specific CQL datatypes. If there are interactions that are important we should capture those. Counters Lightweight transactions The matrix also needs to include different permutations of replication strategies and consistency levels. Maybe we can constrain those variations to parts of the matrix that would best reflect the impact of replication strategies and CL. Probably a subset of the data intensive workloads. Also want a workload targeting the row cache and key cache when everything is cached and when there is a realistic long tail of data not in the cache. For every workload to really get the value you would like a graph for throughput and a graph for latency at some percentile with a data point per revision tested going back to the beginning as well as a 90 day graph. A trend line also helps. Then someone has to be it for monitoring the graphs and poking people when there is an issue. The workflow usually goes something like the monitor tags the author of the suspected bad revision who triages it and either fixes it or hands it off to the correct person. Timeliness is really important because once regressions start stacking it's a pain to know whether you have done what you should to fix it. was (Author: aweisberg): I think there are two general classes of benchmarks you would run in CI. Representative user workloads, and targeted microbenchmark workloads. Targeted workloads are a huge help during ongoing development because they magnify the impact of regressions from code changes that are harder to notice in representative workloads. They also point to the specific subsystem being benchmarked. I will just cover the microbenchmarks. The full matrix is large so there is an element of wanting ponies, but the reality is that they are all interesting from a preventing performance regressions and understanding the impact of ongoing changes perspective. Benchmark the stress client, so excess server capacity and a single client testing lots of small messages. Lots of large messages. Stuff the servers can answer as fast as possible. The flip side of this workload is the same thing but for the server where you measure how many trivially answerable tiny queries you can shove through a cluster given excess client capacity. Benchmark perfomance of non-prepared statements. Benchmark performance of preparing statements? A full test matrix for data intensive workloads would test read, write, and 50/50, and for a bonus 90/10. Single cell partitions with a small value and a large value, and a range of wide rows (small, medium, large). All 3 compaction strategies with compression on/off. Data intensive workloads also need to run against a spinning rust and SSDs. CQL specific microbenchmarks against specific CQL datatypes. If there are interactions that are important we should capture those. Counters Lightweight transactions The matrix also needs to include different permutations of replication strategies and consistency levels. Maybe we can constrain those
[jira] [Commented] (CASSANDRA-7281) SELECT on tuple relations are broken for mixed ASC/DESC clustering order
[ https://issues.apache.org/jira/browse/CASSANDRA-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250245#comment-14250245 ] Benjamin Lerer commented on CASSANDRA-7281: --- I will try but the chances are low. SELECT on tuple relations are broken for mixed ASC/DESC clustering order Key: CASSANDRA-7281 URL: https://issues.apache.org/jira/browse/CASSANDRA-7281 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Marcin Szymaniuk Fix For: 2.0.12 Attachments: 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-.patch, 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-v2.patch, 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-v3.patch As noted on [CASSANDRA-6875|https://issues.apache.org/jira/browse/CASSANDRA-6875?focusedCommentId=13992153page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13992153], the tuple notation is broken when the clustering order mixes ASC and DESC directives because the range of data they describe don't correspond to a single continuous slice internally. To copy the example from CASSANDRA-6875: {noformat} cqlsh:ks create table foo (a int, b int, c int, PRIMARY KEY (a, b, c)) WITH CLUSTERING ORDER BY (b DESC, c ASC); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 2, 0); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 1, 0); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 1, 1); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 0, 0); cqlsh:ks SELECT * FROM foo WHERE a=0; a | b | c ---+---+--- 0 | 2 | 0 0 | 1 | 0 0 | 1 | 1 0 | 0 | 0 (4 rows) cqlsh:ks SELECT * FROM foo WHERE a=0 AND (b, c) (1, 0); a | b | c ---+---+--- 0 | 2 | 0 (1 rows) {noformat} The last query should really return {{(0, 2, 0)}} and {{(0, 1, 1)}}. For that specific example we should generate 2 internal slices, but I believe that with more clustering columns we may have more slices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7281) SELECT on tuple relations are broken for mixed ASC/DESC clustering order
[ https://issues.apache.org/jira/browse/CASSANDRA-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-7281: --- Fix Version/s: 2.1.3 SELECT on tuple relations are broken for mixed ASC/DESC clustering order Key: CASSANDRA-7281 URL: https://issues.apache.org/jira/browse/CASSANDRA-7281 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Marcin Szymaniuk Fix For: 2.0.12, 2.1.3 Attachments: 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-.patch, 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-v2.patch, 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-v3.patch As noted on [CASSANDRA-6875|https://issues.apache.org/jira/browse/CASSANDRA-6875?focusedCommentId=13992153page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13992153], the tuple notation is broken when the clustering order mixes ASC and DESC directives because the range of data they describe don't correspond to a single continuous slice internally. To copy the example from CASSANDRA-6875: {noformat} cqlsh:ks create table foo (a int, b int, c int, PRIMARY KEY (a, b, c)) WITH CLUSTERING ORDER BY (b DESC, c ASC); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 2, 0); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 1, 0); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 1, 1); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 0, 0); cqlsh:ks SELECT * FROM foo WHERE a=0; a | b | c ---+---+--- 0 | 2 | 0 0 | 1 | 0 0 | 1 | 1 0 | 0 | 0 (4 rows) cqlsh:ks SELECT * FROM foo WHERE a=0 AND (b, c) (1, 0); a | b | c ---+---+--- 0 | 2 | 0 (1 rows) {noformat} The last query should really return {{(0, 2, 0)}} and {{(0, 1, 1)}}. For that specific example we should generate 2 internal slices, but I believe that with more clustering columns we may have more slices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7281) SELECT on tuple relations are broken for mixed ASC/DESC clustering order
[ https://issues.apache.org/jira/browse/CASSANDRA-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250263#comment-14250263 ] Philip Thompson commented on CASSANDRA-7281: Okay, that is fine. Sylvain mentioned a possible 2.1.3 release by the end of the calendar year. If this patch makes it into that release, that is soon enough. SELECT on tuple relations are broken for mixed ASC/DESC clustering order Key: CASSANDRA-7281 URL: https://issues.apache.org/jira/browse/CASSANDRA-7281 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Marcin Szymaniuk Fix For: 2.0.12, 2.1.3 Attachments: 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-.patch, 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-v2.patch, 0001-CASSANDRA-7281-SELECT-on-tuple-relations-are-broken-v3.patch As noted on [CASSANDRA-6875|https://issues.apache.org/jira/browse/CASSANDRA-6875?focusedCommentId=13992153page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13992153], the tuple notation is broken when the clustering order mixes ASC and DESC directives because the range of data they describe don't correspond to a single continuous slice internally. To copy the example from CASSANDRA-6875: {noformat} cqlsh:ks create table foo (a int, b int, c int, PRIMARY KEY (a, b, c)) WITH CLUSTERING ORDER BY (b DESC, c ASC); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 2, 0); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 1, 0); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 1, 1); cqlsh:ks INSERT INTO foo (a, b, c) VALUES (0, 0, 0); cqlsh:ks SELECT * FROM foo WHERE a=0; a | b | c ---+---+--- 0 | 2 | 0 0 | 1 | 0 0 | 1 | 1 0 | 0 | 0 (4 rows) cqlsh:ks SELECT * FROM foo WHERE a=0 AND (b, c) (1, 0); a | b | c ---+---+--- 0 | 2 | 0 (1 rows) {noformat} The last query should really return {{(0, 2, 0)}} and {{(0, 1, 1)}}. For that specific example we should generate 2 internal slices, but I believe that with more clustering columns we may have more slices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8464) Support direct buffer decompression for reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250361#comment-14250361 ] T Jake Luciani commented on CASSANDRA-8464: --- Why would there be an issue using mmaped data for compaction? Throttling of compressed data is treated that way because it's in the aspect of streaming or compaction which works on uncompressed data. Support direct buffer decompression for reads - Key: CASSANDRA-8464 URL: https://issues.apache.org/jira/browse/CASSANDRA-8464 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: performance Fix For: 3.0 Currently when we read a compressed sstable we copy the data on heap then send it to be de-compressed to another on heap buffer (albeit pooled). But now both snappy and lz4 (with CASSANDRA-7039) allow decompression of direct byte buffers. This lets us mmap the data and decompress completely off heap (and avoids moving bytes over JNI). One issue is performing the checksum offheap but the Adler32 does support in java 8 (it's also in java 7 but marked private?!) This change yields a 10% boost in read performance on cstar. Locally I see upto 30% improvement. http://cstar.datastax.com/graph?stats=5ebcdd70-816b-11e4-aed6-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=200.09ymin=0ymax=135908.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-8503: -- Attachment: ycsb.yaml inmemory.yaml Here are a couple profiles. One matches the YCSB setup. the other is a inmemory tiny dataset. Collect important stress profiles for regression analysis done by jenkins - Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire Attachments: inmemory.yaml, ycsb.yaml We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an example: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250433#comment-14250433 ] T Jake Luciani commented on CASSANDRA-8503: --- We should also run with all these @ RF=3 and QUORUM read/writes Collect important stress profiles for regression analysis done by jenkins - Key: CASSANDRA-8503 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503 Project: Cassandra Issue Type: Task Reporter: Ryan McGuire Assignee: Ryan McGuire Attachments: inmemory.yaml, ycsb.yaml We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases. Here's an example: http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email. Ideas: * Timeseries (Can this be done with stress? not sure) * compact storage * compression off * ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8504) Stack trace is erroneously logged twice
Philip Thompson created CASSANDRA-8504: -- Summary: Stack trace is erroneously logged twice Key: CASSANDRA-8504 URL: https://issues.apache.org/jira/browse/CASSANDRA-8504 Project: Cassandra Issue Type: Bug Environment: OSX and Ubuntu Reporter: Philip Thompson Assignee: Brandon Williams Priority: Minor Fix For: 3.0 Attachments: node4.log The dtest {{replace_address_test.TestReplaceAddress.replace_active_node_test}} is failing on 3.0. The following can be seen in the log:{code}ERROR [main] 2014-12-17 15:12:33,871 CassandraDaemon.java:496 - Exception encountered during startup java.lang.UnsupportedOperationException: Cannot replace a live node... at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:773) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:593) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:483) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:356) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) [main/:na] ERROR [main] 2014-12-17 15:12:33,872 CassandraDaemon.java:584 - Exception encountered during startup java.lang.UnsupportedOperationException: Cannot replace a live node... at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:773) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:593) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:483) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:356) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) [main/:na] INFO [StorageServiceShutdownHook] 2014-12-17 15:12:33,873 Gossiper.java:1349 - Announcing shutdown INFO [StorageServiceShutdownHook] 2014-12-17 15:12:35,876 MessagingService.java:708 - Waiting for messaging service to quiesce{code} The test starts up a three node cluster, loads some data, then attempts to start a fourth node with replace_address against the IP of a live node. This is expected to fail, with one ERROR message in the log. In 3.0, we are seeing two messages. 2.1-HEAD is working as expected. Attached is the full log of the fourth node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7275) Errors in FlushRunnable may leave threads hung
[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250519#comment-14250519 ] Pavel Yaskevich commented on CASSANDRA-7275: Just to re-iterate, I still don't understand why we would prefer to crash the process if error happens on the system CF flush e.g. at the end of compaction which is not even essential for the operation like compactions_in_progress and still there is no clear answer how do we distinguish between FS{Read, Write}Error which is generated as a response to FS or system failure and the one which is generated as a response to incorrect call that Cassandra made e.g. duplicate hard-link? I would prefer that if the failure was in the system CF we log the message, leave commitlog and let everything carry on instead of just crashing because it could essentially result in dropping incoming data, the story is different for actual user memtables tho, as I mentioned couple of times in my previous comments, I'm total fine crashing if normal memtable flush fails and disk_failure_policy says so. Errors in FlushRunnable may leave threads hung -- Key: CASSANDRA-7275 URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Pavel Yaskevich Priority: Minor Fix For: 2.0.12 Attachments: 0001-Move-latch.countDown-into-finally-block.patch, 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch In Memtable.FlushRunnable, the CountDownLatch will never be counted down if there are errors, which results in hanging any threads that are waiting for the flush to complete. For example, an error like this causes the problem: {noformat} ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:474,5,main] java.lang.IllegalArgumentException at java.nio.Buffer.position(Unknown Source) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) at org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) at org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8504) Stack trace is erroneously logged twice
[ https://issues.apache.org/jira/browse/CASSANDRA-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250548#comment-14250548 ] Philip Thompson edited comment on CASSANDRA-8504 at 12/17/14 9:06 PM: -- {code}04:03 PM:~/cstar/cassandra[(no branch, bisect started on trunk)*]$ git bisect good 027006dcb0931e5b93f5378494831aadc3baa809 is the first bad commit commit 027006dcb0931e5b93f5378494831aadc3baa809 Author: Brandon Williams brandonwilli...@apache.org Date: Wed Oct 15 15:15:24 2014 -0500 Allow CassandraDaemon to be run as a managed service Patch by Heiko Braun, reviewed by brandonwilliams for CASSANDRA-7997 :04 04 6bfd6c7a41efff412d2a0b90ebad49fd2bc62942 0a9ae3dcb67fb97eb40512b2343d2d70079ecc09 M src{code} was (Author: philipthompson): 04:03 PM:~/cstar/cassandra[(no branch, bisect started on trunk)*]$ git bisect good 027006dcb0931e5b93f5378494831aadc3baa809 is the first bad commit commit 027006dcb0931e5b93f5378494831aadc3baa809 Author: Brandon Williams brandonwilli...@apache.org Date: Wed Oct 15 15:15:24 2014 -0500 Allow CassandraDaemon to be run as a managed service Patch by Heiko Braun, reviewed by brandonwilliams for CASSANDRA-7997 :04 04 6bfd6c7a41efff412d2a0b90ebad49fd2bc62942 0a9ae3dcb67fb97eb40512b2343d2d70079ecc09 M src Stack trace is erroneously logged twice --- Key: CASSANDRA-8504 URL: https://issues.apache.org/jira/browse/CASSANDRA-8504 Project: Cassandra Issue Type: Bug Environment: OSX and Ubuntu Reporter: Philip Thompson Assignee: Brandon Williams Priority: Minor Fix For: 3.0 Attachments: node4.log The dtest {{replace_address_test.TestReplaceAddress.replace_active_node_test}} is failing on 3.0. The following can be seen in the log:{code}ERROR [main] 2014-12-17 15:12:33,871 CassandraDaemon.java:496 - Exception encountered during startup java.lang.UnsupportedOperationException: Cannot replace a live node... at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:773) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:593) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:483) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:356) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) [main/:na] ERROR [main] 2014-12-17 15:12:33,872 CassandraDaemon.java:584 - Exception encountered during startup java.lang.UnsupportedOperationException: Cannot replace a live node... at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:773) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:593) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:483) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:356) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) [main/:na] INFO [StorageServiceShutdownHook] 2014-12-17 15:12:33,873 Gossiper.java:1349 - Announcing shutdown INFO [StorageServiceShutdownHook] 2014-12-17 15:12:35,876 MessagingService.java:708 - Waiting for messaging service to quiesce{code} The test starts up a three node cluster, loads some data, then attempts to start a fourth node with replace_address against the IP of a live node. This is expected to fail, with one ERROR message in the log. In 3.0, we are seeing two messages. 2.1-HEAD is working as expected. Attached is the full log of the fourth node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8504) Stack trace is erroneously logged twice
[ https://issues.apache.org/jira/browse/CASSANDRA-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250548#comment-14250548 ] Philip Thompson commented on CASSANDRA-8504: 04:03 PM:~/cstar/cassandra[(no branch, bisect started on trunk)*]$ git bisect good 027006dcb0931e5b93f5378494831aadc3baa809 is the first bad commit commit 027006dcb0931e5b93f5378494831aadc3baa809 Author: Brandon Williams brandonwilli...@apache.org Date: Wed Oct 15 15:15:24 2014 -0500 Allow CassandraDaemon to be run as a managed service Patch by Heiko Braun, reviewed by brandonwilliams for CASSANDRA-7997 :04 04 6bfd6c7a41efff412d2a0b90ebad49fd2bc62942 0a9ae3dcb67fb97eb40512b2343d2d70079ecc09 M src Stack trace is erroneously logged twice --- Key: CASSANDRA-8504 URL: https://issues.apache.org/jira/browse/CASSANDRA-8504 Project: Cassandra Issue Type: Bug Environment: OSX and Ubuntu Reporter: Philip Thompson Assignee: Brandon Williams Priority: Minor Fix For: 3.0 Attachments: node4.log The dtest {{replace_address_test.TestReplaceAddress.replace_active_node_test}} is failing on 3.0. The following can be seen in the log:{code}ERROR [main] 2014-12-17 15:12:33,871 CassandraDaemon.java:496 - Exception encountered during startup java.lang.UnsupportedOperationException: Cannot replace a live node... at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:773) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:593) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:483) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:356) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) [main/:na] ERROR [main] 2014-12-17 15:12:33,872 CassandraDaemon.java:584 - Exception encountered during startup java.lang.UnsupportedOperationException: Cannot replace a live node... at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:773) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:593) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:483) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:356) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) [main/:na] INFO [StorageServiceShutdownHook] 2014-12-17 15:12:33,873 Gossiper.java:1349 - Announcing shutdown INFO [StorageServiceShutdownHook] 2014-12-17 15:12:35,876 MessagingService.java:708 - Waiting for messaging service to quiesce{code} The test starts up a three node cluster, loads some data, then attempts to start a fourth node with replace_address against the IP of a live node. This is expected to fail, with one ERROR message in the log. In 3.0, we are seeing two messages. 2.1-HEAD is working as expected. Attached is the full log of the fourth node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8505) Invalid results are returned while secondary index are being build
Benjamin Lerer created CASSANDRA-8505: - Summary: Invalid results are returned while secondary index are being build Key: CASSANDRA-8505 URL: https://issues.apache.org/jira/browse/CASSANDRA-8505 Project: Cassandra Issue Type: Bug Reporter: Benjamin Lerer If you request an index creation and then execute a query that use the index the results returned might be invalid until the index is fully build. This is caused by the fact that the table column will be marked as indexed before the index is ready. The following unit tests can be use to reproduce the problem: {code} @Test public void testIndexCreatedAfterInsert() throws Throwable { createTable(CREATE TABLE %s (a int, b int, c int, primary key((a, b; execute(INSERT INTO %s (a, b, c) VALUES (0, 0, 0);); execute(INSERT INTO %s (a, b, c) VALUES (0, 1, 1);); execute(INSERT INTO %s (a, b, c) VALUES (0, 2, 2);); execute(INSERT INTO %s (a, b, c) VALUES (1, 0, 3);); execute(INSERT INTO %s (a, b, c) VALUES (1, 1, 4);); createIndex(CREATE INDEX ON %s(b)); assertRows(execute(SELECT * FROM %s WHERE b = ?;, 1), row(0, 1, 1), row(1, 1, 4)); } @Test public void testIndexCreatedBeforeInsert() throws Throwable { createTable(CREATE TABLE %s (a int, b int, c int, primary key((a, b; createIndex(CREATE INDEX ON %s(b)); execute(INSERT INTO %s (a, b, c) VALUES (0, 0, 0);); execute(INSERT INTO %s (a, b, c) VALUES (0, 1, 1);); execute(INSERT INTO %s (a, b, c) VALUES (0, 2, 2);); execute(INSERT INTO %s (a, b, c) VALUES (1, 0, 3);); execute(INSERT INTO %s (a, b, c) VALUES (1, 1, 4);); assertRows(execute(SELECT * FROM %s WHERE b = ?;, 1), row(0, 1, 1), row(1, 1, 4)); } {code} The first test will fail while the second will work. In my opinion the first test should reject the request as invalid (as if the index was not existing) until the index is fully build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7275) Errors in FlushRunnable may leave threads hung
[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250554#comment-14250554 ] Tupshin Harper commented on CASSANDRA-7275: --- Strongly in favor of the opt in policy based approach that [~jbellis] mentioned. There isn't a one size fits all approach to deal with this Errors in FlushRunnable may leave threads hung -- Key: CASSANDRA-7275 URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Pavel Yaskevich Priority: Minor Fix For: 2.0.12 Attachments: 0001-Move-latch.countDown-into-finally-block.patch, 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch In Memtable.FlushRunnable, the CountDownLatch will never be counted down if there are errors, which results in hanging any threads that are waiting for the flush to complete. For example, an error like this causes the problem: {noformat} ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:474,5,main] java.lang.IllegalArgumentException at java.nio.Buffer.position(Unknown Source) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) at org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) at org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8158) network_topology_test is failing with inconsistent results
[ https://issues.apache.org/jira/browse/CASSANDRA-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson reassigned CASSANDRA-8158: -- Assignee: Philip Thompson network_topology_test is failing with inconsistent results -- Key: CASSANDRA-8158 URL: https://issues.apache.org/jira/browse/CASSANDRA-8158 Project: Cassandra Issue Type: Test Reporter: Philip Thompson Assignee: Philip Thompson replication_test.py:ReplicationTest.network_topology_test is a no vnode test that has been failing in 2.0 and 2.1 for quite a while. Sample cassci output here: http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/lastCompletedBuild/testReport/replication_test/ReplicationTest/network_topology_test/ The missing replicas marked in the failure output are very inconsistent. Due to the fact that it is failing on practically every version, and no bugs have been filed relating to a failure of the feature this is testing, there is most likely an issue with the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8506) Use RefCounting to manage list of compacting sstables
Benedict created CASSANDRA-8506: --- Summary: Use RefCounting to manage list of compacting sstables Key: CASSANDRA-8506 URL: https://issues.apache.org/jira/browse/CASSANDRA-8506 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Fix For: 3.0 Building on CASSANDRA-7705, we can use debuggable ref counting to manage the marking and unmarking of compaction state, so that we can quickly track down errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Fix Version/s: (was: 2.1.3) 3.0 Labels: cql docs (was: cql) can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8507) Improved Consistency Level Feedback in cqlsh
Adam Holmberg created CASSANDRA-8507: Summary: Improved Consistency Level Feedback in cqlsh Key: CASSANDRA-8507 URL: https://issues.apache.org/jira/browse/CASSANDRA-8507 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Adam Holmberg Assignee: Adam Holmberg Priority: Minor Fix For: 2.1.3 cqlsh currently accepts names for CONSISTENCY, but reports back the enum. There was some confusion caused by this [on the users mailing list|http://mail-archives.apache.org/mod_mbox/cassandra-user/201412.mbox/%3CCADxGeP4q0Hc3seeCrJ_xuYvLyj2wf0RHcag0BvCwLQQi5BwOWw%40mail.gmail.com%3E]. It would probably be good to report the name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8507) Improved Consistency Level Feedback in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg updated CASSANDRA-8507: - Attachment: cqlsh_consistency.txt Improved Consistency Level Feedback in cqlsh Key: CASSANDRA-8507 URL: https://issues.apache.org/jira/browse/CASSANDRA-8507 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Adam Holmberg Assignee: Adam Holmberg Priority: Minor Fix For: 2.1.3 Attachments: cqlsh_consistency.txt cqlsh currently accepts names for CONSISTENCY, but reports back the enum. There was some confusion caused by this [on the users mailing list|http://mail-archives.apache.org/mod_mbox/cassandra-user/201412.mbox/%3CCADxGeP4q0Hc3seeCrJ_xuYvLyj2wf0RHcag0BvCwLQQi5BwOWw%40mail.gmail.com%3E]. It would probably be good to report the name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Attachment: CASSANDRA-7016-V4-trunk.txt The patch introduce a new {{PrimaryKeyRestrictions}} called {{TokenFilter}} that allow to merge token and non token restrictions for the partition key. The patch has been made for trunk as CASSANDRA-7981was a prerequisite to make the patch work for all the possible queries. can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, CASSANDRA-7016-V4-trunk.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8506) Improve management of DataTracker, esp. compacting
[ https://issues.apache.org/jira/browse/CASSANDRA-8506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-8506: Summary: Improve management of DataTracker, esp. compacting (was: Use RefCounting to manage list of compacting sstables) Improve management of DataTracker, esp. compacting -- Key: CASSANDRA-8506 URL: https://issues.apache.org/jira/browse/CASSANDRA-8506 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Fix For: 3.0 Building on CASSANDRA-7705, we can use debuggable ref counting to manage the marking and unmarking of compaction state, so that we can quickly track down errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8506) Improve management of DataTracker, esp. compacting
[ https://issues.apache.org/jira/browse/CASSANDRA-8506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-8506: Description: Building on CASSANDRA-7705, we can use debuggable ref counting to manage the marking and unmarking of compaction state, so that we can quickly track down errors. We should also simplify the logic wrt rewriters, by ignoring the descriptor type, perhaps for all sets. was:Building on CASSANDRA-7705, we can use debuggable ref counting to manage the marking and unmarking of compaction state, so that we can quickly track down errors. Improve management of DataTracker, esp. compacting -- Key: CASSANDRA-8506 URL: https://issues.apache.org/jira/browse/CASSANDRA-8506 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Fix For: 3.0 Building on CASSANDRA-7705, we can use debuggable ref counting to manage the marking and unmarking of compaction state, so that we can quickly track down errors. We should also simplify the logic wrt rewriters, by ignoring the descriptor type, perhaps for all sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8373) MOVED_NODE Topology Change event is never emitted
[ https://issues.apache.org/jira/browse/CASSANDRA-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg reassigned CASSANDRA-8373: Assignee: (was: Adam Holmberg) A quick check shows that NEW event emitting for a move in single-node clusters in 2.1 as well, independent of this patch. Interesting anomaly, but seems unrelated to the change I was proposing. With my current priorities, I'm not sure when I'll be able to dig into the secondary issue. I'm un-assigning for now. MOVED_NODE Topology Change event is never emitted - Key: CASSANDRA-8373 URL: https://issues.apache.org/jira/browse/CASSANDRA-8373 Project: Cassandra Issue Type: Bug Components: Core Reporter: Adam Holmberg Priority: Minor Fix For: 2.0.12, 2.1.3 Attachments: 8373.txt lifeCycleSubscribers.onMove never gets called because [this tokenMetadata.updateNormalTokens|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L1585] call [changes the endpoint moving status|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L190], making the later isMoving conditional always false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8507) Improved Consistency Level Feedback in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8507: -- Reviewer: Tyler Hobbs [~thobbs] to review Improved Consistency Level Feedback in cqlsh Key: CASSANDRA-8507 URL: https://issues.apache.org/jira/browse/CASSANDRA-8507 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Adam Holmberg Assignee: Adam Holmberg Priority: Minor Fix For: 2.1.3 Attachments: cqlsh_consistency.txt cqlsh currently accepts names for CONSISTENCY, but reports back the enum. There was some confusion caused by this [on the users mailing list|http://mail-archives.apache.org/mod_mbox/cassandra-user/201412.mbox/%3CCADxGeP4q0Hc3seeCrJ_xuYvLyj2wf0RHcag0BvCwLQQi5BwOWw%40mail.gmail.com%3E]. It would probably be good to report the name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8490) DISTINCT queries with LIMITs or paging are incorrect when partitions are deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-8490: --- Summary: DISTINCT queries with LIMITs or paging are incorrect when partitions are deleted (was: Tombstone stops paging through resultset when distinct keyword is used.) DISTINCT queries with LIMITs or paging are incorrect when partitions are deleted Key: CASSANDRA-8490 URL: https://issues.apache.org/jira/browse/CASSANDRA-8490 Project: Cassandra Issue Type: Bug Environment: Driver version: 2.1.3. Cassandra version: 2.0.11/2.1.2. Reporter: Frank Limstrand Assignee: Tyler Hobbs Fix For: 2.0.12, 2.1.3 Using paging demo code from https://github.com/PatrickCallaghan/datastax-paging-demo The code creates and populates a table with 1000 entries and pages through them with setFetchSize set to 100. If we then delete one entry with 'cqlsh': {noformat} cqlsh:datastax_paging_demo delete from datastax_paging_demo.products where productId = 'P142'; (The specified productid is number 6 in the resultset.) {noformat} and run the same query (Select * from) again we get: {noformat} [com.datastax.paging.Main.main()] INFO com.datastax.paging.Main - Paging demo took 0 secs. Total Products : 999 {noformat} which is what we would expect. If we then change the select statement in dao/ProductDao.java (line 70) from Select * from to Select DISTINCT productid from we get this result: {noformat} [com.datastax.paging.Main.main()] INFO com.datastax.paging.Main - Paging demo took 0 secs. Total Products : 99 {noformat} So it looks like the tombstone stops the paging behaviour. Is this a bug? {noformat} DEBUG [Native-Transport-Requests:788] 2014-12-16 10:09:13,431 Message.java (line 319) Received: QUERY Select DISTINCT productid from datastax_paging_demo.products, v=2 DEBUG [Native-Transport-Requests:788] 2014-12-16 10:09:13,434 AbstractQueryPager.java (line 98) Fetched 99 live rows DEBUG [Native-Transport-Requests:788] 2014-12-16 10:09:13,434 AbstractQueryPager.java (line 115) Got result (99) smaller than page size (100), considering pager exhausted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8288) cqlsh describe needs to show 'sstable_compression': ''
[ https://issues.apache.org/jira/browse/CASSANDRA-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250745#comment-14250745 ] Adam Holmberg edited comment on CASSANDRA-8288 at 12/17/14 10:56 PM: - The required change is merged in python-driver was (Author: aholmber): This change has been merged. cqlsh describe needs to show 'sstable_compression': '' -- Key: CASSANDRA-8288 URL: https://issues.apache.org/jira/browse/CASSANDRA-8288 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jeremiah Jordan Assignee: Tyler Hobbs Labels: cqlsh Fix For: 2.1.3 Time Spent: 4m Remaining Estimate: 0h For uncompressed tables cqlsh describe schema should show AND compression = {'sstable_compression': ''} otherwise when you replay the schema you get the default of LZ4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8288) cqlsh describe needs to show 'sstable_compression': ''
[ https://issues.apache.org/jira/browse/CASSANDRA-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250745#comment-14250745 ] Adam Holmberg commented on CASSANDRA-8288: -- This change has been merged. cqlsh describe needs to show 'sstable_compression': '' -- Key: CASSANDRA-8288 URL: https://issues.apache.org/jira/browse/CASSANDRA-8288 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jeremiah Jordan Assignee: Tyler Hobbs Labels: cqlsh Fix For: 2.1.3 Time Spent: 4m Remaining Estimate: 0h For uncompressed tables cqlsh describe schema should show AND compression = {'sstable_compression': ''} otherwise when you replay the schema you get the default of LZ4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8043) Native Protocol V4
[ https://issues.apache.org/jira/browse/CASSANDRA-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg updated CASSANDRA-8043: - Labels: client-impacting protocolv4 (was: protocolv4) Native Protocol V4 -- Key: CASSANDRA-8043 URL: https://issues.apache.org/jira/browse/CASSANDRA-8043 Project: Cassandra Issue Type: Task Reporter: Sylvain Lebresne Labels: client-impacting, protocolv4 Fix For: 3.0 We have a bunch of issues that will require a protocol v4, this ticket is just a meta ticket to group them all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6038) Make the inter-major schema migration barrier more targeted, and, ideally, non-existent
[ https://issues.apache.org/jira/browse/CASSANDRA-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg updated CASSANDRA-6038: - Labels: client-impacting (was: ) Make the inter-major schema migration barrier more targeted, and, ideally, non-existent --- Key: CASSANDRA-6038 URL: https://issues.apache.org/jira/browse/CASSANDRA-6038 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Labels: client-impacting Fix For: 3.0 CASSANDRA-5845 made it so that schema changes in a major-mixed cluster are not propagated to the minorer-major nodes. This lets us perform backwards-incompatible schema changes in major releases safely - like adding the schema_triggers table, moving all the aliases to schema_columns, getting rid of the deprecated schema columns, etc. Even this limitation might be too strict, however, and should be avoided if possible (starting with at least versioning schema separately from messaging service versioning, and resorting to major-minor schema block only when truly necessary and not for every x-y pair). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6717) Modernize schema tables
[ https://issues.apache.org/jira/browse/CASSANDRA-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg updated CASSANDRA-6717: - Labels: client-impacting (was: ) Modernize schema tables --- Key: CASSANDRA-6717 URL: https://issues.apache.org/jira/browse/CASSANDRA-6717 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Aleksey Yeschenko Priority: Minor Labels: client-impacting Fix For: 3.0 There is a few problems/improvements that can be done with the way we store schema: # CASSANDRA-4988: as explained on the ticket, storing the comparator is now redundant (or almost, we'd need to store whether the table is COMPACT or not too, which we don't currently is easy and probably a good idea anyway), it can be entirely reconstructed from the infos in schema_columns (the same is true of key_validator and subcomparator, and replacing default_validator by a COMPACT_VALUE column in all case is relatively simple). And storing the comparator as an opaque string broke concurrent updates of sub-part of said comparator (concurrent collection addition or altering 2 separate clustering columns typically) so it's really worth removing it. # CASSANDRA-4603: it's time to get rid of those ugly json maps. I'll note that schema_keyspaces is a problem due to its use of COMPACT STORAGE, but I think we should fix it once and for-all nonetheless (see below). # For CASSANDRA-6382 and to allow indexing both map keys and values at the same time, we'd need to be able to have more than one index definition for a given column. # There is a few mismatches in table options between the one stored in the schema and the one used when declaring/altering a table which would be nice to fix. The compaction, compression and replication maps are one already mentioned from CASSANDRA-4603, but also for some reason 'dclocal_read_repair_chance' in CQL is called just 'local_read_repair_chance' in the schema table, and 'min/max_compaction_threshold' are column families option in the schema but just compaction options for CQL (which makes more sense). None of those issues are major, and we could probably deal with them independently but it might be simpler to just fix them all in one shot so I wanted to sum them all up here. In particular, the fact that 'schema_keyspaces' uses COMPACT STORAGE is annoying (for the replication map, but it may limit future stuff too) which suggest we should migrate it to a new, non COMPACT table. And while that's arguably a detail, it wouldn't hurt to rename schema_columnfamilies to schema_tables for the years to come since that's the prefered vernacular for CQL. Overall, what I would suggest is to move all schema tables to a new keyspace, named 'schema' for instance (or 'system_schema' but I prefer the shorter version), and fix all the issues above at once. Since we currently don't exchange schema between nodes of different versions, all we'd need to do that is a one shot startup migration, and overall, I think it could be simpler for clients to deal with one clear migration than to have to handle minor individual changes all over the place. I also think it's somewhat cleaner conceptually to have schema tables in their own keyspace since they are replicated through a different mechanism than other system tables. If we do that, we could, for instance, migrate to the following schema tables (details up for discussion of course): {noformat} CREATE TYPE user_type ( name text, column_names listtext, column_types listtext ) CREATE TABLE keyspaces ( name text PRIMARY KEY, durable_writes boolean, replication mapstring, string, user_types mapstring, user_type ) CREATE TYPE trigger_definition ( name text, options maptex, text ) CREATE TABLE tables ( keyspace text, name text, id uuid, table_type text, // COMPACT, CQL or SUPER dropped_columns maptext, bigint, triggers maptext, trigger_definition, // options comment text, compaction maptext, text, compression maptext, text, read_repair_chance double, dclocal_read_repair_chance double, gc_grace_seconds int, caching text, rows_per_partition_to_cache text, default_time_to_live int, min_index_interval int, max_index_interval int, speculative_retry text, populate_io_cache_on_flush boolean, bloom_filter_fp_chance double memtable_flush_period_in_ms int, PRIMARY KEY (keyspace, name) ) CREATE TYPE index_definition ( name text, index_type text, options maptext, text ) CREATE TABLE columns ( keyspace text, table text, name text, kind text, // PARTITION_KEY, CLUSTERING_COLUMN, REGULAR or COMPACT_VALUE component_index int; type text,
[jira] [Updated] (CASSANDRA-8495) Add data type serialization formats to native protocol specs
[ https://issues.apache.org/jira/browse/CASSANDRA-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg updated CASSANDRA-8495: - Labels: client-impacting (was: ) Add data type serialization formats to native protocol specs Key: CASSANDRA-8495 URL: https://issues.apache.org/jira/browse/CASSANDRA-8495 Project: Cassandra Issue Type: Task Components: Documentation website Reporter: Tyler Hobbs Assignee: Tyler Hobbs Priority: Minor Labels: client-impacting We currently describe the serialization format for collections, UDTs, and tuples in the native protocol spec. We should expand that to include all data types supported by Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7536) High resolution types for timestamp and time
[ https://issues.apache.org/jira/browse/CASSANDRA-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg updated CASSANDRA-7536: - Labels: client-impacting (was: ) High resolution types for timestamp and time Key: CASSANDRA-7536 URL: https://issues.apache.org/jira/browse/CASSANDRA-7536 Project: Cassandra Issue Type: New Feature Reporter: Robert Stupp Priority: Minor Labels: client-impacting CASSANDRA-7523 adds support for _date_ and _time_ data types using the same precision as current _timestamp_ type. This ticket is about to add high resolution (nanosecond precision) types for timestamp and time. It should be easy to use with Joda API and Java 8 {{java.time}} API. Additionally support for time zone / offset (might need be handled differently) could be introduced using new time and timestamp types. Idea for binary serialization format (from java.time API): {{(int)year (short)month (short)day (byte)hour (byte)minute (byte)second (int)nano (int)offsetInSeconds}} Additional thinking is required to make even the time zone / offset types comparable (even for different TZ offsets) - so these might not be covered by this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7523) add date and time types
[ https://issues.apache.org/jira/browse/CASSANDRA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg updated CASSANDRA-7523: - Labels: client-impacting docs (was: docs) add date and time types --- Key: CASSANDRA-7523 URL: https://issues.apache.org/jira/browse/CASSANDRA-7523 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Joshua McKenzie Priority: Minor Labels: client-impacting, docs Fix For: 2.1.3 http://www.postgresql.org/docs/9.1/static/datatype-datetime.html (we already have timestamp; interval is out of scope for now, and see CASSANDRA-6350 for discussion on timestamp-with-time-zone. but date/time should be pretty easy to add.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)