[jira] [Created] (CASSANDRA-8405) Is there a way to override the current MAX_TTL value from 20 yrs to a value 20 yrs.
Parth Setya created CASSANDRA-8405: -- Summary: Is there a way to override the current MAX_TTL value from 20 yrs to a value 20 yrs. Key: CASSANDRA-8405 URL: https://issues.apache.org/jira/browse/CASSANDRA-8405 Project: Cassandra Issue Type: Wish Components: Core Environment: Linux(RH) Reporter: Parth Setya Priority: Blocker We are migrating data from Oracle to C*. The expiration date for a certain column was set to 90 years in Oracle. Here we are not able to make that value go beyond 20 years. Could reccomend a way to override this value? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately
[ https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231151#comment-14231151 ] Jason Brown commented on CASSANDRA-7203: Hmm, interesting idea. However, I suspect any install base large enough is going to throw a shitte tonne of cache (of the memcache/redis ilk) in front of any database, and thus probably skewing the (read) distribution of traffic that actually makes it to the database. I have not much solid evidence to back up this assertion, but I also feel in my gut that there is not a one-to-one between who knocks on the front door and who actually goes to a database (or to many of many databases). As regards the memtable flush, you'd have to have some partitioning scheme, and flush on those bounds - hypothetically, semi-reasonable. However, I think the gnarly work will be in the compaction code. We now have three officially supported compaction strategies, and I wonder how complication would be added there. Remembering what happened with incremental repair (and the special casing of LCS vs STCS), I'd be a bit concerned the complexity creep. Perhaps this wouldn't apply to all the strategies (I would have to think more about DateTiered), but even that can be seen as a special casing. At the end of the day, if you want to go down this path, sure, we can see where it leads, and we can evaluate the results vs. the costs involved. TBH, though, this doesn't appear to be a huge win (I think we can all agree incremental, at best), and I think we have bigger fish to fry. Flush (and Compact) High Traffic Partitions Separately -- Key: CASSANDRA-7203 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: compaction, performance An idea possibly worth exploring is the use of streaming count-min sketches to collect data over the up-time of a server to estimating the velocity of different partitions, so that high-volume partitions can be flushed separately on the assumption that they will be much smaller in number, thus reducing write amplification by permitting compaction independently of any low-velocity data. Whilst the idea is reasonably straight forward, it seems that the biggest problem here will be defining any success metric. Obviously any workload following an exponential/zipf/extreme distribution is likely to benefit from such an approach, but whether or not that would translate in real terms is another matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8399) Reference Counter exception when dropping user type
[ https://issues.apache.org/jira/browse/CASSANDRA-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson reassigned CASSANDRA-8399: -- Assignee: Joshua McKenzie (was: Marcus Eriksson) could you have a look [~JoshuaMcKenzie]? Reference Counter exception when dropping user type --- Key: CASSANDRA-8399 URL: https://issues.apache.org/jira/browse/CASSANDRA-8399 Project: Cassandra Issue Type: Bug Reporter: Philip Thompson Assignee: Joshua McKenzie Fix For: 2.1.3 Attachments: node2.log When running the dtest {{user_types_test.py:TestUserTypes.test_type_keyspace_permission_isolation}} with the current 2.1-HEAD code, very frequently, but not always, when dropping a type, the following exception is seen:{code} ERROR [MigrationStage:1] 2014-12-01 13:54:54,824 CassandraDaemon.java:170 - Exception in thread Thread[MigrationStage:1,5,main] java.lang.AssertionError: Reference counter -1 for /var/folders/v3/z4wf_34n1q506_xjdy49gb78gn/T/dtest-eW2RXj/test/node2/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-sche ma_keyspaces-ka-14-Data.db at org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1662) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:164) ~[main/:na] at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:62) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore$8.close(ColumnFamilyStore.java:1943) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2116) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2029) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1963) ~[main/:na] at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:744) ~[main/:na] at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:731) ~[main/:na] at org.apache.cassandra.config.Schema.updateVersion(Schema.java:374) ~[main/:na] at org.apache.cassandra.config.Schema.updateVersionAndAnnounce(Schema.java:399) ~[main/:na] at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:167) ~[main/:na] at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49) ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_67] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]{code} Log of the node with the error is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8397) Support UPDATE with IN requirement for clustering key
[ https://issues.apache.org/jira/browse/CASSANDRA-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer reassigned CASSANDRA-8397: - Assignee: Benjamin Lerer Support UPDATE with IN requirement for clustering key - Key: CASSANDRA-8397 URL: https://issues.apache.org/jira/browse/CASSANDRA-8397 Project: Cassandra Issue Type: Wish Reporter: Jens Rantil Assignee: Benjamin Lerer Priority: Minor {noformat} CREATE TABLE events ( userid uuid, id timeuuid, content text, type text, PRIMARY KEY (userid, id) ) # Add data cqlsh:mykeyspace UPDATE events SET content='Hello' WHERE userid=57b47f85-56c4-4968-83cf-4c4e533944e9 AND id IN (046e9da0-7945-11e4-a76f-770773bbbf7e, 046e0160-7945-11e4-a76f-770773bbbf7e); code=2200 [Invalid query] message=Invalid operator IN for PRIMARY KEY part id {noformat} I was surprised this doesn't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8403) limit disregarded when paging with IN clause under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer reassigned CASSANDRA-8403: - Assignee: Benjamin Lerer limit disregarded when paging with IN clause under certain conditions - Key: CASSANDRA-8403 URL: https://issues.apache.org/jira/browse/CASSANDRA-8403 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Benjamin Lerer This issue was originally reported on the python-driver userlist and confirmed by [~aholmber] When: page_size limit data size, the limit value is disregarded and all rows are paged back. to repro: create a table and populate it with two partitions CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) Add data: in one partition create 10 rows, an in a second partition create 20 rows perform a query with page_size of 10 and a LIMIT of 20, like so: SELECT * FROM paging_test where id in (1,2) LIMIT 20; The limit is disregarded and three pages of 10 records each will be returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately
[ https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231203#comment-14231203 ] Benedict commented on CASSANDRA-7203: - I was _mostly_ hoping to get your and [~kohlisankalp]'s views on _if those workload skews occur_. Then we could at some point later get into the nitty gritty of if it would be worth it :-) The idea wouldn't really be to special case anything except flush, and to depend on (and implement after) `improvements we have either envisaged or could later envisage to avoid compacting sstables with low predicted overlap of partitions. i.e. it would have the potential to improve the benefit of such schemes, by increasing the number of sstable pairings they can rule out. Flush (and Compact) High Traffic Partitions Separately -- Key: CASSANDRA-7203 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: compaction, performance An idea possibly worth exploring is the use of streaming count-min sketches to collect data over the up-time of a server to estimating the velocity of different partitions, so that high-volume partitions can be flushed separately on the assumption that they will be much smaller in number, thus reducing write amplification by permitting compaction independently of any low-velocity data. Whilst the idea is reasonably straight forward, it seems that the biggest problem here will be defining any success metric. Obviously any workload following an exponential/zipf/extreme distribution is likely to benefit from such an approach, but whether or not that would translate in real terms is another matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6952) Cannot bind variables to USE statements
[ https://issues.apache.org/jira/browse/CASSANDRA-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-6952: -- Attachment: CASSANDRA-6952-2.1.txt CASSANDRA-6952-2.0.txt The patch change the error message to Bind variables cannot be used for keyspace or table names. The patch for 2.1 add also a unit test to the behavior. Cannot bind variables to USE statements --- Key: CASSANDRA-6952 URL: https://issues.apache.org/jira/browse/CASSANDRA-6952 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Matt Stump Assignee: Benjamin Lerer Priority: Minor Labels: cql3 Attachments: CASSANDRA-6952-2.0.txt, CASSANDRA-6952-2.1.txt Attempting to bind a variable for a USE query results in a syntax error. Example Invocation: {code} ResultSet result = session.execute(USE ?, system); {code} Error: {code} ERROR SYNTAX_ERROR: line 1:4 no viable alternative at input '?', v=2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
cassandra git commit: Make sure we release sstable references after anticompaction
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 9c0f5753f - d15c9187a Make sure we release sstable references after anticompaction Patch by marcuse; reviewed by yukim for CASSANDRA-8386 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d15c9187 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d15c9187 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d15c9187 Branch: refs/heads/cassandra-2.1 Commit: d15c9187a4b66645bf0575a7c3bfdbb9b10a263d Parents: 9c0f575 Author: Marcus Eriksson marc...@apache.org Authored: Thu Nov 27 18:12:24 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Tue Dec 2 10:10:33 2014 +0100 -- CHANGES.txt | 1 + .../db/compaction/CompactionManager.java| 66 +++- .../cassandra/io/sstable/SSTableReader.java | 2 +- .../db/compaction/AntiCompactionTest.java | 55 4 files changed, 82 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/d15c9187/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index d454ba2..7df396d 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.3 + * Release sstable references after anticompaction (CASSANDRA-8386) * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) * Fix high size calculations for prepared statements (CASSANDRA-8231) * Centralize shared executors (CASSANDRA-8055) http://git-wip-us.apache.org/repos/asf/cassandra/blob/d15c9187/src/java/org/apache/cassandra/db/compaction/CompactionManager.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java index 61628ff..d85ffd7 100644 --- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java @@ -391,6 +391,8 @@ public class CompactionManager implements CompactionManagerMBean /** * Make sure the {validatedForRepair} are marked for compaction before calling this. * + * Caller must reference the validatedForRepair sstables (via ParentRepairSession.getAndReferenceSSTables(..)). + * * @param cfs * @param ranges Ranges that the repair was carried out on * @param validatedForRepair SSTables containing the repaired ranges. Should be referenced before passing them. @@ -407,40 +409,48 @@ public class CompactionManager implements CompactionManagerMBean SetSSTableReader mutatedRepairStatuses = new HashSet(); SetSSTableReader nonAnticompacting = new HashSet(); IteratorSSTableReader sstableIterator = sstables.iterator(); -while (sstableIterator.hasNext()) +try { -SSTableReader sstable = sstableIterator.next(); -for (RangeToken r : Range.normalize(ranges)) +while (sstableIterator.hasNext()) { -RangeToken sstableRange = new Range(sstable.first.getToken(), sstable.last.getToken(), sstable.partitioner); -if (r.contains(sstableRange)) +SSTableReader sstable = sstableIterator.next(); +for (RangeToken r : Range.normalize(ranges)) { -logger.info(SSTable {} fully contained in range {}, mutating repairedAt instead of anticompacting, sstable, r); - sstable.descriptor.getMetadataSerializer().mutateRepairedAt(sstable.descriptor, repairedAt); -sstable.reloadSSTableMetadata(); -mutatedRepairStatuses.add(sstable); -sstableIterator.remove(); -break; -} -else if (!sstableRange.intersects(r)) -{ -logger.info(SSTable {} ({}) does not intersect repaired range {}, not touching repairedAt., sstable, sstableRange, r); -nonAnticompacting.add(sstable); -sstableIterator.remove(); -break; -} -else -{ -logger.info(SSTable {} ({}) will be anticompacted on range {}, sstable, sstableRange, r); +RangeToken sstableRange = new Range(sstable.first.getToken(), sstable.last.getToken(), sstable.partitioner); +if (r.contains(sstableRange)) +{ +logger.info(SSTable {} fully contained in range {}, mutating repairedAt instead of anticompacting, sstable, r); +
[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Conflicts: src/java/org/apache/cassandra/db/compaction/CompactionManager.java test/unit/org/apache/cassandra/db/compaction/AntiCompactionTest.java Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/06f626ac Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/06f626ac Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/06f626ac Branch: refs/heads/trunk Commit: 06f626acd27b051222616c0c91f7dd8d556b8d45 Parents: 25314c2 d15c918 Author: Marcus Eriksson marc...@apache.org Authored: Tue Dec 2 10:49:59 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Tue Dec 2 10:50:18 2014 +0100 -- CHANGES.txt | 1 + .../db/compaction/CompactionManager.java| 66 +++-- .../db/compaction/AntiCompactionTest.java | 99 ++-- 3 files changed, 108 insertions(+), 58 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/06f626ac/CHANGES.txt -- diff --cc CHANGES.txt index 22cc598,7df396d..141c3a8 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,42 -1,7 +1,43 @@@ +3.0 + * Support UDTs, tuples, and collections in user-defined + functions (CASSANDRA-7563) + * Fix aggregate fn results on empty selection, result column name, + and cqlsh parsing (CASSANDRA-8229) + * Mark sstables as repaired after full repair (CASSANDRA-7586) + * Extend Descriptor to include a format value and refactor reader/writer apis (CASSANDRA-7443) + * Integrate JMH for microbenchmarks (CASSANDRA-8151) + * Keep sstable levels when bootstrapping (CASSANDRA-7460) + * Add Sigar library and perform basic OS settings check on startup (CASSANDRA-7838) + * Support for aggregation functions (CASSANDRA-4914) + * Remove cassandra-cli (CASSANDRA-7920) + * Accept dollar quoted strings in CQL (CASSANDRA-7769) + * Make assassinate a first class command (CASSANDRA-7935) + * Support IN clause on any clustering column (CASSANDRA-4762) + * Improve compaction logging (CASSANDRA-7818) + * Remove YamlFileNetworkTopologySnitch (CASSANDRA-7917) + * Do anticompaction in groups (CASSANDRA-6851) + * Support pure user-defined functions (CASSANDRA-7395, 7526, 7562, 7740, 7781, 7929, + 7924, 7812, 8063, 7813) + * Permit configurable timestamps with cassandra-stress (CASSANDRA-7416) + * Move sstable RandomAccessReader to nio2, which allows using the + FILE_SHARE_DELETE flag on Windows (CASSANDRA-4050) + * Remove CQL2 (CASSANDRA-5918) + * Add Thrift get_multi_slice call (CASSANDRA-6757) + * Optimize fetching multiple cells by name (CASSANDRA-6933) + * Allow compilation in java 8 (CASSANDRA-7028) + * Make incremental repair default (CASSANDRA-7250) + * Enable code coverage thru JaCoCo (CASSANDRA-7226) + * Switch external naming of 'column families' to 'tables' (CASSANDRA-4369) + * Shorten SSTable path (CASSANDRA-6962) + * Use unsafe mutations for most unit tests (CASSANDRA-6969) + * Fix race condition during calculation of pending ranges (CASSANDRA-7390) + * Fail on very large batch sizes (CASSANDRA-8011) + * Improve concurrency of repair (CASSANDRA-6455, 8208) + + 2.1.3 + * Release sstable references after anticompaction (CASSANDRA-8386) * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) - * Fix high size calculations for prepared statements (CASSANDRA-8231) * Centralize shared executors (CASSANDRA-8055) * Fix filtering for CONTAINS (KEY) relations on frozen collection clustering columns when the query is restricted to a single http://git-wip-us.apache.org/repos/asf/cassandra/blob/06f626ac/src/java/org/apache/cassandra/db/compaction/CompactionManager.java -- diff --cc src/java/org/apache/cassandra/db/compaction/CompactionManager.java index a9a4773,d85ffd7..ed875b8 --- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java @@@ -407,40 -409,48 +409,48 @@@ public class CompactionManager implemen SetSSTableReader mutatedRepairStatuses = new HashSet(); SetSSTableReader nonAnticompacting = new HashSet(); IteratorSSTableReader sstableIterator = sstables.iterator(); - while (sstableIterator.hasNext()) + try { - SSTableReader sstable = sstableIterator.next(); - for (RangeToken r : Range.normalize(ranges)) + while (sstableIterator.hasNext()) { - RangeToken sstableRange = new Range(sstable.first.getToken(), sstable.last.getToken()); - if (r.contains(sstableRange)) - { -
[1/2] cassandra git commit: Make sure we release sstable references after anticompaction
Repository: cassandra Updated Branches: refs/heads/trunk 25314c204 - 06f626acd Make sure we release sstable references after anticompaction Patch by marcuse; reviewed by yukim for CASSANDRA-8386 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d15c9187 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d15c9187 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d15c9187 Branch: refs/heads/trunk Commit: d15c9187a4b66645bf0575a7c3bfdbb9b10a263d Parents: 9c0f575 Author: Marcus Eriksson marc...@apache.org Authored: Thu Nov 27 18:12:24 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Tue Dec 2 10:10:33 2014 +0100 -- CHANGES.txt | 1 + .../db/compaction/CompactionManager.java| 66 +++- .../cassandra/io/sstable/SSTableReader.java | 2 +- .../db/compaction/AntiCompactionTest.java | 55 4 files changed, 82 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/d15c9187/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index d454ba2..7df396d 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.3 + * Release sstable references after anticompaction (CASSANDRA-8386) * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) * Fix high size calculations for prepared statements (CASSANDRA-8231) * Centralize shared executors (CASSANDRA-8055) http://git-wip-us.apache.org/repos/asf/cassandra/blob/d15c9187/src/java/org/apache/cassandra/db/compaction/CompactionManager.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java index 61628ff..d85ffd7 100644 --- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java @@ -391,6 +391,8 @@ public class CompactionManager implements CompactionManagerMBean /** * Make sure the {validatedForRepair} are marked for compaction before calling this. * + * Caller must reference the validatedForRepair sstables (via ParentRepairSession.getAndReferenceSSTables(..)). + * * @param cfs * @param ranges Ranges that the repair was carried out on * @param validatedForRepair SSTables containing the repaired ranges. Should be referenced before passing them. @@ -407,40 +409,48 @@ public class CompactionManager implements CompactionManagerMBean SetSSTableReader mutatedRepairStatuses = new HashSet(); SetSSTableReader nonAnticompacting = new HashSet(); IteratorSSTableReader sstableIterator = sstables.iterator(); -while (sstableIterator.hasNext()) +try { -SSTableReader sstable = sstableIterator.next(); -for (RangeToken r : Range.normalize(ranges)) +while (sstableIterator.hasNext()) { -RangeToken sstableRange = new Range(sstable.first.getToken(), sstable.last.getToken(), sstable.partitioner); -if (r.contains(sstableRange)) +SSTableReader sstable = sstableIterator.next(); +for (RangeToken r : Range.normalize(ranges)) { -logger.info(SSTable {} fully contained in range {}, mutating repairedAt instead of anticompacting, sstable, r); - sstable.descriptor.getMetadataSerializer().mutateRepairedAt(sstable.descriptor, repairedAt); -sstable.reloadSSTableMetadata(); -mutatedRepairStatuses.add(sstable); -sstableIterator.remove(); -break; -} -else if (!sstableRange.intersects(r)) -{ -logger.info(SSTable {} ({}) does not intersect repaired range {}, not touching repairedAt., sstable, sstableRange, r); -nonAnticompacting.add(sstable); -sstableIterator.remove(); -break; -} -else -{ -logger.info(SSTable {} ({}) will be anticompacted on range {}, sstable, sstableRange, r); +RangeToken sstableRange = new Range(sstable.first.getToken(), sstable.last.getToken(), sstable.partitioner); +if (r.contains(sstableRange)) +{ +logger.info(SSTable {} fully contained in range {}, mutating repairedAt instead of anticompacting, sstable, r); +
[jira] [Updated] (CASSANDRA-8267) Only stream from unrepaired sstables during incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8267: --- Reviewer: Yuki Morishita [~yukim] could you review? https://github.com/krummas/cassandra/commits/marcuse/8267 Only stream from unrepaired sstables during incremental repair -- Key: CASSANDRA-8267 URL: https://issues.apache.org/jira/browse/CASSANDRA-8267 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 2.1.3 Seems we stream from all sstables even if we do incremental repair, we should limit this to only stream from the unrepaired sstables if we do incremental repair -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
[ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oded Peer updated CASSANDRA-4476: - Attachment: 4476-3.patch Added new patch 4476-3.patch # I switched my development env to avoid these types of errors in the future. # Right, I fixed it. Instead of comparing to {{EQ}} I added a method to {{Operator}} that identifies a relational operator with the notion of order. # The simple index selection algorithm is described in Sylvain’s comment from 4/Dec/13. I improved the algorithm in this patch to estimate the amount of rows the slice operator returns. # I see it as a trade-off between code complexity and query performance. As Sylvain explained in his earlier comment ??more than one indexed column means ALLOW FILTERING, for which all bets are off in terms of performance anyway??. While it is good to strive and deliver the optimal performance altogether I think the use case you are describing is rare. Jonathan Ellis described “When Not to Use Secondary Indexes” in a blog post ??Do not use secondary indexes to query a huge volume of records for a small number of results?? so for the proper use of indexed queries this shouldn't have a significant effect but it would make the code more complex. # I added comments. # I thought the purpose of the test was obvious from the class name. The bug description in the method name was meant to put it into context. I guess it wasn't obvious. I changed the names. # I chose to use an instance variable to be consistent with {{usePrepared}} usage. I understand your point and I have made this an explicit variable to a new {{execute}} method In addition as part of committing the patch a change the wiki page describing Secondary Indexes needs to be changed (http://wiki.apache.org/cassandra/SecondaryIndexes). How are wiki changes usually handled? Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE) Key: CASSANDRA-4476 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476 Project: Cassandra Issue Type: Improvement Components: API, Core Reporter: Sylvain Lebresne Assignee: Oded Peer Priority: Minor Labels: cql Fix For: 3.0 Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed column). Given that indexed CFs are local (and use LocalPartitioner that order the row by the type of the indexed column), we should extend 2ndary indexes to allow querying indexed columns even when no EQ clause is provided. As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do that estimate reasonably accurately, this might provide better performance even for index queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better selectivity than EQ ones (say you index both the user country and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate 'Jan 2009' AND birtdate 'July 2009', you'd better use the birthdate index first). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8018) Cassandra seems to insert twice in custom PerColumnSecondaryIndex
[ https://issues.apache.org/jira/browse/CASSANDRA-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231249#comment-14231249 ] Benedict commented on CASSANDRA-8018: - Good catch. A few nits on the patch, but I'll make them and commit: * iff is never a typo, it means if and only if * we should remove the call from inside addNewKey, rather than outside it, as that is the call that was originally meant to be removed. this way all of the calls happen in the same logical unit of code Cassandra seems to insert twice in custom PerColumnSecondaryIndex - Key: CASSANDRA-8018 URL: https://issues.apache.org/jira/browse/CASSANDRA-8018 Project: Cassandra Issue Type: Bug Components: Core Reporter: Pavel Chlupacek Assignee: Benjamin Lerer Fix For: 2.1.3 Attachments: CASSANDRA-8018.txt When inserting data into Cassandra 2.1.0 into table with custom secondary index, the Cell is inserted twice, if inserting new entry into row with same rowId, but different cluster index columns. CREATE KEYSPACE fulltext WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1}; CREATE TABLE fulltext.test ( id uuid, name text, name2 text, json varchar, lucene text, primary key ( id , name)); sCREATE CUSTOM INDEX lucene_idx on fulltext.test(lucene) using 'com.spinoco.fulltext.cassandra.TestIndex'; // this causes only one insert insertInto(fulltext,test) .value(id, id1.uuid) .value(name, goosh1) .value(json, TestContent.message1.asJson) // this causes 2 inserts to be done insertInto(fulltext,test) .value(id, id1.uuid) .value(name, goosh2) .value(json, TestContent.message2.asJson) /// stacktraces for inserts (always same, for 1st and 2nd insert) custom indexer stacktraces and then at org.apache.cassandra.db.index.SecondaryIndexManager$StandardUpdater.insert(SecondaryIndexManager.java:707) at org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:344) at org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:319) at org.apache.cassandra.utils.btree.NodeBuilder.addNewKey(NodeBuilder.java:323) at org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:191) at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) at org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:189) at org.apache.cassandra.db.Memtable.put(Memtable.java:194) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1142) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:394) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) at org.apache.cassandra.service.StorageProxy$7.runMayThrow(StorageProxy.java:970) at org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2080) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:163) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:103) at java.lang.Thread.run(Thread.java:744) Note that cell, rowkey and Group in public abstract void insert(ByteBuffer rowKey, Cell col, OpOrder.Group opGroup); are having for both successive calls same identity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8406) Add option to set max_sstable_age in seconds in DTCS
Marcus Eriksson created CASSANDRA-8406: -- Summary: Add option to set max_sstable_age in seconds in DTCS Key: CASSANDRA-8406 URL: https://issues.apache.org/jira/browse/CASSANDRA-8406 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Using days as the unit for max_sstable_age in DTCS might be too much, add option to set it in seconds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8406) Add option to set max_sstable_age in seconds in DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8406: --- Attachment: 0001-patch.patch _days is still prioritized above _seconds, you get a warning if you set both Add option to set max_sstable_age in seconds in DTCS Key: CASSANDRA-8406 URL: https://issues.apache.org/jira/browse/CASSANDRA-8406 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 2.0.12 Attachments: 0001-patch.patch Using days as the unit for max_sstable_age in DTCS might be too much, add option to set it in seconds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting
[ https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231297#comment-14231297 ] Marcus Eriksson commented on CASSANDRA-8371: Created CASSANDRA-8406 for the max_sstable_age config which is mostly unrelated to this issue DateTieredCompactionStrategy is always compacting -- Key: CASSANDRA-8371 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371 Project: Cassandra Issue Type: Bug Components: Core Reporter: mck Assignee: Björn Hegerfors Labels: compaction, performance Attachments: java_gc_counts_rate-month.png, read-latency-recommenders-adview.png, read-latency.png, sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png Running 2.0.11 and having switched a table to [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that disk IO and gc count increase, along with the number of reads happening in the compaction hump of cfhistograms. Data, and generally performance, looks good, but compactions are always happening, and pending compactions are building up. The schema for this is {code}CREATE TABLE search ( loginid text, searchid timeuuid, description text, searchkey text, searchurl text, PRIMARY KEY ((loginid), searchid) );{code} We're sitting on about 82G (per replica) across 6 nodes in 4 DCs. CQL executed against this keyspace, and traffic patterns, can be seen in slides 7+8 of https://prezi.com/b9-aj6p2esft/ Attached are sstables-per-read and read-latency graphs from cfhistograms, and screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), to DTCS (week ~46). These screenshots are also found in the prezi on slides 9-11. [~pmcfadin], [~Bj0rn], Can this be a consequence of occasional deleted rows, as is described under (3) in the description of CASSANDRA-6602 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231327#comment-14231327 ] Vijay commented on CASSANDRA-7438: -- [~snazy] I was trying to compare the OHC and found few major bugs. 1) You have individual method synchronization on the Map, which doesn't ensure that your get is locked before a put is performed (same with clean, hot(N), remove etc), look at SynchronizedMap source code to do it right else will crash soon. 2) Even after i fix it, there is correctness in the hashing algorithm i think. Get returns a lot of error and looks like there is some memory leaks too. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-8405) Is there a way to override the current MAX_TTL value from 20 yrs to a value 20 yrs.
[ https://issues.apache.org/jira/browse/CASSANDRA-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-8405. - Resolution: Won't Fix There is no way to override that MAX_TTL (or to add such override) since that's the maximum resolution allowed by the internal representation of TTLs. We could internally switch to a bigger resolution, but that would be quite a bit of work and could at best be done for 3.0. Honestly, for a TTL as long as 90 years, I'd just set no TTL. I'm confident that in 90 years, the price of the TB will be low enough that you won't mind having you're data from today not actually expiring. Is there a way to override the current MAX_TTL value from 20 yrs to a value 20 yrs. - Key: CASSANDRA-8405 URL: https://issues.apache.org/jira/browse/CASSANDRA-8405 Project: Cassandra Issue Type: Wish Components: Core Environment: Linux(RH) Reporter: Parth Setya Priority: Blocker Labels: MAX_TTL, date, expiration, ttl We are migrating data from Oracle to C*. The expiration date for a certain column was set to 90 years in Oracle. Here we are not able to make that value go beyond 20 years. Could reccomend a way to override this value? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8403) limit disregarded when paging with IN clause under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-8403: Priority: Minor (was: Major) limit disregarded when paging with IN clause under certain conditions - Key: CASSANDRA-8403 URL: https://issues.apache.org/jira/browse/CASSANDRA-8403 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Benjamin Lerer Priority: Minor This issue was originally reported on the python-driver userlist and confirmed by [~aholmber] When: page_size limit data size, the limit value is disregarded and all rows are paged back. to repro: create a table and populate it with two partitions CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) Add data: in one partition create 10 rows, an in a second partition create 20 rows perform a query with page_size of 10 and a LIMIT of 20, like so: SELECT * FROM paging_test where id in (1,2) LIMIT 20; The limit is disregarded and three pages of 10 records each will be returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231327#comment-14231327 ] Vijay edited comment on CASSANDRA-7438 at 12/2/14 11:33 AM: [~snazy] I was trying to compare the OHC and found few major bugs. There is correctness in the hashing algorithm i think. Get returns a lot of error and looks like there is some memory leaks too. was (Author: vijay2...@yahoo.com): [~snazy] I was trying to compare the OHC and found few major bugs. 1) You have individual method synchronization on the Map, which doesn't ensure that your get is locked before a put is performed (same with clean, hot(N), remove etc), look at SynchronizedMap source code to do it right else will crash soon. 2) Even after i fix it, there is correctness in the hashing algorithm i think. Get returns a lot of error and looks like there is some memory leaks too. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7882) Memtable slab allocation should scale logarithmically to improve occupancy rate
[ https://issues.apache.org/jira/browse/CASSANDRA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231374#comment-14231374 ] Benedict commented on CASSANDRA-7882: - Hi Jay, I've been away for the past two months, so sorry this got left by the wayside in the meantime. I'll get around to reviewing it shortly. Memtable slab allocation should scale logarithmically to improve occupancy rate --- Key: CASSANDRA-7882 URL: https://issues.apache.org/jira/browse/CASSANDRA-7882 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jay Patel Assignee: Jay Patel Labels: performance Fix For: 2.1.3 Attachments: trunk-7882.txt CASSANDRA-5935 allows option to disable region-based allocation for on-heap memtables but there is no option to disable it for off-heap memtables (memtable_allocation_type: offheap_objects). Disabling region-based allocation will allow us to pack more tables in the schema since minimum of 1MB region won't be allocated per table. Downside can be more fragmentation which should be controllable by using better allocator like JEMalloc. How about below option in yaml?: memtable_allocation_type: unslabbed_offheap_objects Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231398#comment-14231398 ] Marcus Eriksson commented on CASSANDRA-7019: WDYT [~jbellis] should I finish up the patch as a major compaction for LCS? Major tombstone compaction -- Key: CASSANDRA-7019 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Labels: compaction Fix For: 3.0 It should be possible to do a major tombstone compaction by including all sstables, but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables after the compaction with the same data, minus all the expired tombstones. We could do this in two ways: # a nodetool command that includes _all_ sstables # once we detect that an sstable has more than x% (20%?) expired tombstones, we start one of these compactions, and include all overlapping sstables that contain older data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231419#comment-14231419 ] Branimir Lambov commented on CASSANDRA-7032: [~benedict], can you point me to some more information on the imbalance that is known to appear and its behaviour with increasing number of nodes? I'd like to get a better understanding of the problem, and how much of it is caused by replica selection rather than token imbalance. It seems to me that the best approach here is to build in some replication strategy / network topology awareness into the algorithm to be able to account for replica selection. This will complicate the algorithm but, in addition to getting better balance, could also improve the time spent finding replicas (e.g. CASSANDRA-6976). Improve vnode allocation Key: CASSANDRA-7032 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Branimir Lambov Labels: performance, vnodes Fix For: 3.0 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better. If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231425#comment-14231425 ] Benedict commented on CASSANDRA-7032: - It's plain old statistics. Have a look at the java code I attached that simulates and reports the level of imbalance. Currently we randomly assign the tokens, and this results in some nodes happening to fall with all of their token ranges narrow vs the other existing tokens, and others wider. Consistent hashing is what Riak uses to achieve balance, which is one approach. Rendezvous hashing is another. But these would likely involve changing the tokens of every node in the cluster on adding a new node. This would be acceptable, but I expect with the amount of state space to work with we can design an algorithm that guarantees low bounds of imbalance without having to change the tokens assigned to any existing nodes. Improve vnode allocation Key: CASSANDRA-7032 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Branimir Lambov Labels: performance, vnodes Fix For: 3.0 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better. If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231433#comment-14231433 ] Benedict commented on CASSANDRA-7032: - I should note that the dovetailing with CASSANDRA-6696 is very important. Acceptable imbalance _per node_ is actually not _too_ tricky to deliver. But ensuring each disk on each node will have a fair share of the pie is a little harder Improve vnode allocation Key: CASSANDRA-7032 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Branimir Lambov Labels: performance, vnodes Fix For: 3.0 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better. If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231434#comment-14231434 ] Jeremy Hanna commented on CASSANDRA-7402: - Any update on this? Seems like a valuable addition to help identify problems. Add metrics to track memory used by client requests --- Key: CASSANDRA-7402 URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: ops, performance, stability Fix For: 2.1.3 Attachments: 7402.txt When running a production cluster one common operational issue is quantifying GC pauses caused by ongoing requests. Since different queries return varying amount of data you can easily get your self into a situation where you Stop the world from a couple of bad actors in the system. Or more likely the aggregate garbage generated on a single node across all in flight requests causes a GC. It would be very useful for operators to see how much garbage the system is using to handle in flight mutations and queries. It would also be nice to have either a log of queries which generate the most garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7882) Memtable slab allocation should scale logarithmically to improve occupancy rate
[ https://issues.apache.org/jira/browse/CASSANDRA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231436#comment-14231436 ] Jay Patel commented on CASSANDRA-7882: -- Awesome. Thanks Benedict! Let me know if you face any issue while applying patch. Memtable slab allocation should scale logarithmically to improve occupancy rate --- Key: CASSANDRA-7882 URL: https://issues.apache.org/jira/browse/CASSANDRA-7882 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jay Patel Assignee: Jay Patel Labels: performance Fix For: 2.1.3 Attachments: trunk-7882.txt CASSANDRA-5935 allows option to disable region-based allocation for on-heap memtables but there is no option to disable it for off-heap memtables (memtable_allocation_type: offheap_objects). Disabling region-based allocation will allow us to pack more tables in the schema since minimum of 1MB region won't be allocated per table. Downside can be more fragmentation which should be controllable by using better allocator like JEMalloc. How about below option in yaml?: memtable_allocation_type: unslabbed_offheap_objects Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231457#comment-14231457 ] Alexander Bulaev commented on CASSANDRA-7186: - Also reproduces on 2.0.10. alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Ryan McGuire Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
[ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231477#comment-14231477 ] Benjamin Lerer commented on CASSANDRA-4476: --- {quote}I see it as a trade-off between code complexity and query performance. As Sylvain explained in his earlier comment more than one indexed column means ALLOW FILTERING, for which all bets are off in terms of performance anyway.{quote} In the query {{Select * from myTable where a 1 and a 3}} there is only one indexed column {{a}} and as such this query does not need filtering and the performance should be predictable. {quote}While it is good to strive and deliver the optimal performance altogether I think the use case you are describing is rare.{quote} It is common use case. It is used a lot with time series data for example. When people want to analyse what happened for a range of dates. {quote}Jonathan Ellis described “When Not to Use Secondary Indexes” in a blog post Do not use secondary indexes to query a huge volume of records for a small number of results{quote} The statement of Jonathan is true but it has nothing to do with the ability to perform range query on an index. It is about choosing the right tool to query data based on your data distribution. {quote} so for the proper use of indexed queries this shouldn't have a significant effect but it would make the code more complex.{quote} Actually, if you think about it you will realize that it can have a big impact. Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE) Key: CASSANDRA-4476 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476 Project: Cassandra Issue Type: Improvement Components: API, Core Reporter: Sylvain Lebresne Assignee: Oded Peer Priority: Minor Labels: cql Fix For: 3.0 Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed column). Given that indexed CFs are local (and use LocalPartitioner that order the row by the type of the indexed column), we should extend 2ndary indexes to allow querying indexed columns even when no EQ clause is provided. As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do that estimate reasonably accurately, this might provide better performance even for index queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better selectivity than EQ ones (say you index both the user country and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate 'Jan 2009' AND birtdate 'July 2009', you'd better use the birthdate index first). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231496#comment-14231496 ] Branimir Lambov commented on CASSANDRA-7032: Per-disk balance is just another level in the hierarchy. Ideally we would like per-disk, per-node, per-rack and per-datacentre balance (configurable by number of vnodes), wouldn't we? Presumably with highest emphasis on the lower levels. Ignoring replica selection, this all comes for free if we can ensure equal vnode size (e.g. by reassigning all tokens on adding a node). With reassignment it should also be trivial to build the network topology into the token assignment. As I see it there are two separate objectives: - to build clusters incrementally by introducing and maintaining _some_ imbalance in the ring, which can be exploited to avoid reassignment. - to improve the balance in existing, probably highly unbalanced clusters, built without the above algorithm in mind. The former might be a solution to the latter, but it is not necessary that it is. In any case I intend to look at it in isolation first and then think how it would apply to existing clusters. Improve vnode allocation Key: CASSANDRA-7032 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Branimir Lambov Labels: performance, vnodes Fix For: 3.0 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better. If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231530#comment-14231530 ] Alan Boudreault commented on CASSANDRA-8316: [~yukim] Thank you for taking a look at the snapshot. That's great if the patch is only a thread dispatching! I'll wait a patch from you, (or [~krummas] ?) And will re-run my entire tests. Let me know if I can do anything else to help. Did not get positive replies from all endpoints error on incremental repair -- Key: CASSANDRA-8316 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.1.2 Reporter: Loic Lambiel Assignee: Alan Boudreault Fix For: 2.1.3 Attachments: CassandraDaemon-2014-11-25-2.snapshot.tar.gz, test.sh Hi, I've got an issue with incremental repairs on our production 15 nodes 2.1.2 (new cluster, not yet loaded, RF=3) After having successfully performed an incremental repair (-par -inc) on 3 nodes, I started receiving Repair failed with error Did not get positive replies from all endpoints. from nodetool on all remaining nodes : [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges for keyspace (seq=false, full=false) [2014-11-14 09:12:47,919] Repair failed with error Did not get positive replies from all endpoints. All the nodes are up and running and the local system log shows that the repair commands got started and that's it. I've also noticed that soon after the repair, several nodes started having more cpu load indefinitely without any particular reason (no tasks / queries, nothing in the logs). I then restarted C* on these nodes and retried the repair on several nodes, which were successful until facing the issue again. I tried to repro on our 3 nodes preproduction cluster without success It looks like I'm not the only one having this issue: http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html Any idea? Thanks Loic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8403) limit disregarded when paging with IN clause under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231552#comment-14231552 ] Philip Thompson commented on CASSANDRA-8403: [~rhatch] What happens if the larger partition has 25 rows and {{LIMIT 20}} is still used? With your example its unclear if the limit size is not being respected, or if its being used as a limit per partition. limit disregarded when paging with IN clause under certain conditions - Key: CASSANDRA-8403 URL: https://issues.apache.org/jira/browse/CASSANDRA-8403 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Benjamin Lerer Priority: Minor This issue was originally reported on the python-driver userlist and confirmed by [~aholmber] When: page_size limit data size, the limit value is disregarded and all rows are paged back. to repro: create a table and populate it with two partitions CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) Add data: in one partition create 10 rows, an in a second partition create 20 rows perform a query with page_size of 10 and a LIMIT of 20, like so: SELECT * FROM paging_test where id in (1,2) LIMIT 20; The limit is disregarded and three pages of 10 records each will be returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231559#comment-14231559 ] Jeremiah Jordan commented on CASSANDRA-7032: reassignment of existing tokens is not really an acceptable way to achieve this. That requires shuffling the data which belongs to those tokens as well. We have already seen that shuffle is hard to get right and resource intensive, and removed the previous attempt at that from the code base. Improve vnode allocation Key: CASSANDRA-7032 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Branimir Lambov Labels: performance, vnodes Fix For: 3.0 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better. If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8403) limit disregarded when paging with IN clause under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231581#comment-14231581 ] Adam Holmberg commented on CASSANDRA-8403: -- Thanks, [~rhatch] for reporting this. I was giving the OP a little time to submit. Looks like he declined. [~philipthompson] there is a little more detail about the boundaries in my [response on the mailing list|https://groups.google.com/a/lists.datastax.com/d/msg/python-driver-user/bjo_mplo6p4/ij0eJ-1lIB4J]: limit disregarded when paging with IN clause under certain conditions - Key: CASSANDRA-8403 URL: https://issues.apache.org/jira/browse/CASSANDRA-8403 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Benjamin Lerer Priority: Minor This issue was originally reported on the python-driver userlist and confirmed by [~aholmber] When: page_size limit data size, the limit value is disregarded and all rows are paged back. to repro: create a table and populate it with two partitions CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) Add data: in one partition create 10 rows, an in a second partition create 20 rows perform a query with page_size of 10 and a LIMIT of 20, like so: SELECT * FROM paging_test where id in (1,2) LIMIT 20; The limit is disregarded and three pages of 10 records each will be returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231627#comment-14231627 ] Ariel Weisberg commented on CASSANDRA-7438: --- Robert I don't seem to be getting the latest code for your work on master? For instance the key comparison code does 8 bytes at a time and doesn't handle trailing bytes as far as I can tell. To Vijay's point. A pseudo-random test against the map that does say 200 million operations against a keyspace of several million entries and mirrors the operations on a regular hash map and checks they have the same contents periodically would be helpful in having some confidence in the map. Size it so the LRU doesn't do anything. Print the seed at the beginning of the test so it can be reproduced. I think this basically duplicates the benchmark, but having it as a unit test is nice. We can tune the number of operations and keys down for running in CI. You could also look a the unit tests for Guava's cache or j.u.HashMap and borrow those. Nice thing about data structure APIs is that the tests already exist. bq. Yes, basically from JDK. Could not get that via inheritance. What are the licensing and attribution requirements for that code? bq. IMO hash code should be 64 bits because 32 bits might not be sufficient. [~benedict] might have some opinions on how to get the best bits out of MurmurHash3. 32 bits is 256-512 gigabytes of cache for 128 byte entries which is not bad. I don't feel strongly either way since I don't know whether callers will have the hash precomputed. bq. Nope - would not be. But it's 2^27 (limited by a stupid constant used for both max# of segments and max# of buckets). Worth taking a look at it - it's weird, yes. In OffHeapMap line 222 it seems to have a gate preventing rehashing to 2 ^ 24 buckets. bq. (Hope I caught all of your comments) I'll check them once you update. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson reassigned CASSANDRA-8316: -- Assignee: Marcus Eriksson (was: Alan Boudreault) Did not get positive replies from all endpoints error on incremental repair -- Key: CASSANDRA-8316 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.1.2 Reporter: Loic Lambiel Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: CassandraDaemon-2014-11-25-2.snapshot.tar.gz, test.sh Hi, I've got an issue with incremental repairs on our production 15 nodes 2.1.2 (new cluster, not yet loaded, RF=3) After having successfully performed an incremental repair (-par -inc) on 3 nodes, I started receiving Repair failed with error Did not get positive replies from all endpoints. from nodetool on all remaining nodes : [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges for keyspace (seq=false, full=false) [2014-11-14 09:12:47,919] Repair failed with error Did not get positive replies from all endpoints. All the nodes are up and running and the local system log shows that the repair commands got started and that's it. I've also noticed that soon after the repair, several nodes started having more cpu load indefinitely without any particular reason (no tasks / queries, nothing in the logs). I then restarted C* on these nodes and retried the repair on several nodes, which were successful until facing the issue again. I tried to repro on our 3 nodes preproduction cluster without success It looks like I'm not the only one having this issue: http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html Any idea? Thanks Loic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8386) Make sure we release references to sstables after incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231691#comment-14231691 ] Alan Boudreault commented on CASSANDRA-8386: This patch doesn't seem to fix the storage size issue of CASSANDRA-8366. However, I'm getting many errors Repair session Sync failed between So I think It's better to wait the patch of CASSANDRA-8316 and give it another try. Make sure we release references to sstables after incremental repair Key: CASSANDRA-8386 URL: https://issues.apache.org/jira/browse/CASSANDRA-8386 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: 0001-make-sure-we-release-references-after-anticompaction.patch We don't release references to all sstables after anticompaction. If they are not anticompacted or are contained fully within the repaired range, we never release the reference. Patch attached fixes this and improves the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8403) limit disregarded when paging with IN clause under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231709#comment-14231709 ] Russ Hatch commented on CASSANDRA-8403: --- The limit does not appear to be respected even for a single partition. If I add another 30 row partition to my data set, and use limit 20 again, I still get all rows (60 in total). limit disregarded when paging with IN clause under certain conditions - Key: CASSANDRA-8403 URL: https://issues.apache.org/jira/browse/CASSANDRA-8403 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Benjamin Lerer Priority: Minor This issue was originally reported on the python-driver userlist and confirmed by [~aholmber] When: page_size limit data size, the limit value is disregarded and all rows are paged back. to repro: create a table and populate it with two partitions CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) Add data: in one partition create 10 rows, an in a second partition create 20 rows perform a query with page_size of 10 and a LIMIT of 20, like so: SELECT * FROM paging_test where id in (1,2) LIMIT 20; The limit is disregarded and three pages of 10 records each will be returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a604b14b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a604b14b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a604b14b Branch: refs/heads/trunk Commit: a604b14bf422709175322ee913b9fee247e74710 Parents: 06f626a 5ab1d95 Author: Benedict Elliott Smith bened...@apache.org Authored: Tue Dec 2 17:18:56 2014 + Committer: Benedict Elliott Smith bened...@apache.org Committed: Tue Dec 2 17:18:56 2014 + -- CHANGES.txt | 1 + .../apache/cassandra/utils/btree/Builder.java | 2 +- .../cassandra/utils/btree/NodeBuilder.java | 6 +- .../org/apache/cassandra/utils/BTreeTest.java | 120 +-- 4 files changed, 117 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a604b14b/CHANGES.txt -- diff --cc CHANGES.txt index 141c3a8,c5ac66c..3cb1c0f --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,43 -1,8 +1,44 @@@ +3.0 + * Support UDTs, tuples, and collections in user-defined + functions (CASSANDRA-7563) + * Fix aggregate fn results on empty selection, result column name, + and cqlsh parsing (CASSANDRA-8229) + * Mark sstables as repaired after full repair (CASSANDRA-7586) + * Extend Descriptor to include a format value and refactor reader/writer apis (CASSANDRA-7443) + * Integrate JMH for microbenchmarks (CASSANDRA-8151) + * Keep sstable levels when bootstrapping (CASSANDRA-7460) + * Add Sigar library and perform basic OS settings check on startup (CASSANDRA-7838) + * Support for aggregation functions (CASSANDRA-4914) + * Remove cassandra-cli (CASSANDRA-7920) + * Accept dollar quoted strings in CQL (CASSANDRA-7769) + * Make assassinate a first class command (CASSANDRA-7935) + * Support IN clause on any clustering column (CASSANDRA-4762) + * Improve compaction logging (CASSANDRA-7818) + * Remove YamlFileNetworkTopologySnitch (CASSANDRA-7917) + * Do anticompaction in groups (CASSANDRA-6851) + * Support pure user-defined functions (CASSANDRA-7395, 7526, 7562, 7740, 7781, 7929, + 7924, 7812, 8063, 7813) + * Permit configurable timestamps with cassandra-stress (CASSANDRA-7416) + * Move sstable RandomAccessReader to nio2, which allows using the + FILE_SHARE_DELETE flag on Windows (CASSANDRA-4050) + * Remove CQL2 (CASSANDRA-5918) + * Add Thrift get_multi_slice call (CASSANDRA-6757) + * Optimize fetching multiple cells by name (CASSANDRA-6933) + * Allow compilation in java 8 (CASSANDRA-7028) + * Make incremental repair default (CASSANDRA-7250) + * Enable code coverage thru JaCoCo (CASSANDRA-7226) + * Switch external naming of 'column families' to 'tables' (CASSANDRA-4369) + * Shorten SSTable path (CASSANDRA-6962) + * Use unsafe mutations for most unit tests (CASSANDRA-6969) + * Fix race condition during calculation of pending ranges (CASSANDRA-7390) + * Fail on very large batch sizes (CASSANDRA-8011) + * Improve concurrency of repair (CASSANDRA-6455, 8208) + + 2.1.3 + * BTree updates may call provided update function twice (CASSANDRA-8018) * Release sstable references after anticompaction (CASSANDRA-8386) * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) - * Fix high size calculations for prepared statements (CASSANDRA-8231) * Centralize shared executors (CASSANDRA-8055) * Fix filtering for CONTAINS (KEY) relations on frozen collection clustering columns when the query is restricted to a single http://git-wip-us.apache.org/repos/asf/cassandra/blob/a604b14b/src/java/org/apache/cassandra/utils/btree/Builder.java --
[1/2] cassandra git commit: BTree updates may call provided update function twice
Repository: cassandra Updated Branches: refs/heads/trunk 06f626acd - a604b14bf BTree updates may call provided update function twice patch by Benjamin; reviewed by Benedict for CASSANDRA-8018 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5ab1d95b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5ab1d95b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5ab1d95b Branch: refs/heads/trunk Commit: 5ab1d95b29509ee5a061eddda39d7f4189abbe37 Parents: d15c918 Author: Benedict Elliott Smith bened...@apache.org Authored: Tue Dec 2 17:17:35 2014 + Committer: Benedict Elliott Smith bened...@apache.org Committed: Tue Dec 2 17:18:51 2014 + -- CHANGES.txt | 1 + .../apache/cassandra/utils/btree/Builder.java | 2 +- .../cassandra/utils/btree/NodeBuilder.java | 6 +- .../org/apache/cassandra/utils/BTreeTest.java | 120 +-- 4 files changed, 117 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ab1d95b/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 7df396d..c5ac66c 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.3 + * BTree updates may call provided update function twice (CASSANDRA-8018) * Release sstable references after anticompaction (CASSANDRA-8386) * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) * Fix high size calculations for prepared statements (CASSANDRA-8231) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ab1d95b/src/java/org/apache/cassandra/utils/btree/Builder.java -- diff --git a/src/java/org/apache/cassandra/utils/btree/Builder.java b/src/java/org/apache/cassandra/utils/btree/Builder.java index f6677d4..0f2fd5b 100644 --- a/src/java/org/apache/cassandra/utils/btree/Builder.java +++ b/src/java/org/apache/cassandra/utils/btree/Builder.java @@ -109,7 +109,7 @@ final class Builder current.reset(EMPTY_LEAF, POSITIVE_INFINITY, updateF, null); for (V key : source) -current.addNewKey(key); +current.addNewKey(updateF.apply(key)); current = current.ascendToRoot(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ab1d95b/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java -- diff --git a/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java b/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java index 9d57182..c715873 100644 --- a/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java +++ b/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java @@ -133,7 +133,7 @@ final class NodeBuilder int i = copyFromKeyPosition; boolean found; // exact key match? -boolean owns = true; // true iff this node (or a child) should contain the key +boolean owns = true; // true if this node (or a child) should contain the key if (i == copyFromKeyEnd) { found = false; @@ -185,7 +185,7 @@ final class NodeBuilder } else { -// if not found, we need to apply updateFunction still +// if not found, we still need to apply the update function key = updateFunction.apply(key); addNewKey(key); // handles splitting parent if necessary via ensureRoom } @@ -319,7 +319,7 @@ final class NodeBuilder void addNewKey(Object key) { ensureRoom(buildKeyPosition + 1); -buildKeys[buildKeyPosition++] = updateFunction.apply(key); +buildKeys[buildKeyPosition++] = key; } // copies children from copyf to the builder, up to the provided index in copyf (exclusive) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ab1d95b/test/unit/org/apache/cassandra/utils/BTreeTest.java -- diff --git a/test/unit/org/apache/cassandra/utils/BTreeTest.java b/test/unit/org/apache/cassandra/utils/BTreeTest.java index a6d4528..e1bf388 100644 --- a/test/unit/org/apache/cassandra/utils/BTreeTest.java +++ b/test/unit/org/apache/cassandra/utils/BTreeTest.java @@ -17,22 +17,21 @@ */ package org.apache.cassandra.utils; -import java.util.ArrayList; -import java.util.Comparator; -import java.util.List; -import java.util.Random; +import java.util.*; import java.util.concurrent.ThreadLocalRandom; import org.junit.Test; -import junit.framework.Assert; import org.apache.cassandra.utils.btree.BTree; import org.apache.cassandra.utils.btree.BTreeSet;
[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
[ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231775#comment-14231775 ] Benjamin Lerer commented on CASSANDRA-4476: --- Here are my review feedback on the latest patch: * I think that you should use {{isRange}} or {{isSlice}} instead of {{isRelationalOrderOperator}} as it is clearer. * The name of test class: {{SecondaryIndexNonEqTest}} is misleading. {{CONTAINS}} an {{CONTAINS KEY}} operator are also non eq tests. * In {{getRelationalOrderEstimatedSize}} I do not understand why you do not return 0 if {{estimatedKeysForRange}} return 0. Could you explain? * Instead of doing some dangerous casting in {{getRelationalOrderEstimatedSize}}, you should change the type from {{bestMeanCount}} from int to long. * In {{computeNext}} I do not understand why you do not check for stale data for range queries? Could you explain? * I think it would be nicer to have also an iterator for EQ and use polymorphism instead of if else. * The close method of the {{AbstractScanIterator}} returned by {{getSequentialIterator}} should be called from the close method. * The Unit tests are only covering a subset of the possible queries. Could you add more (a 3 and a 4, a 3 and a 4 ...) * When testing for InvalidRequestException you should use {{assertInvalidMessage}} Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE) Key: CASSANDRA-4476 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476 Project: Cassandra Issue Type: Improvement Components: API, Core Reporter: Sylvain Lebresne Assignee: Oded Peer Priority: Minor Labels: cql Fix For: 3.0 Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed column). Given that indexed CFs are local (and use LocalPartitioner that order the row by the type of the indexed column), we should extend 2ndary indexes to allow querying indexed columns even when no EQ clause is provided. As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do that estimate reasonably accurately, this might provide better performance even for index queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better selectivity than EQ ones (say you index both the user country and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate 'Jan 2009' AND birtdate 'July 2009', you'd better use the birthdate index first). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately
[ https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231795#comment-14231795 ] sankalp kohli commented on CASSANDRA-7203: -- and I think we have bigger fish to fry. I agree with Jason here :). I have not though about all the use cases we have but this is not currently a problem. Flush (and Compact) High Traffic Partitions Separately -- Key: CASSANDRA-7203 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: compaction, performance An idea possibly worth exploring is the use of streaming count-min sketches to collect data over the up-time of a server to estimating the velocity of different partitions, so that high-volume partitions can be flushed separately on the assumption that they will be much smaller in number, thus reducing write amplification by permitting compaction independently of any low-velocity data. Whilst the idea is reasonably straight forward, it seems that the biggest problem here will be defining any success metric. Obviously any workload following an exponential/zipf/extreme distribution is likely to benefit from such an approach, but whether or not that would translate in real terms is another matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8399) Reference Counter exception when dropping user type
[ https://issues.apache.org/jira/browse/CASSANDRA-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231803#comment-14231803 ] Joshua McKenzie commented on CASSANDRA-8399: An initial scan of the code surrounding the creation and closing of MergeIterators in the above code path doesn't pop up with any likely culprits. The iterators are fairly cleanly created, nobody mutates / calls close on the contents of the iterator list inside the mergeiterator while it's in use, and they're then closed in a finally block. I'll reproduce locally and put in some stack tracing around the ref counting to see where our culprit is. Reference Counter exception when dropping user type --- Key: CASSANDRA-8399 URL: https://issues.apache.org/jira/browse/CASSANDRA-8399 Project: Cassandra Issue Type: Bug Reporter: Philip Thompson Assignee: Joshua McKenzie Fix For: 2.1.3 Attachments: node2.log When running the dtest {{user_types_test.py:TestUserTypes.test_type_keyspace_permission_isolation}} with the current 2.1-HEAD code, very frequently, but not always, when dropping a type, the following exception is seen:{code} ERROR [MigrationStage:1] 2014-12-01 13:54:54,824 CassandraDaemon.java:170 - Exception in thread Thread[MigrationStage:1,5,main] java.lang.AssertionError: Reference counter -1 for /var/folders/v3/z4wf_34n1q506_xjdy49gb78gn/T/dtest-eW2RXj/test/node2/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-sche ma_keyspaces-ka-14-Data.db at org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1662) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:164) ~[main/:na] at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:62) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore$8.close(ColumnFamilyStore.java:1943) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2116) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2029) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1963) ~[main/:na] at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:744) ~[main/:na] at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:731) ~[main/:na] at org.apache.cassandra.config.Schema.updateVersion(Schema.java:374) ~[main/:na] at org.apache.cassandra.config.Schema.updateVersionAndAnnounce(Schema.java:399) ~[main/:na] at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:167) ~[main/:na] at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49) ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_67] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]{code} Log of the node with the error is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231817#comment-14231817 ] Robert Stupp commented on CASSANDRA-7438: - [~vijay2...@yahoo.com] can you explain what kind of bugs? bq. licensing and attribution requirements It's already in C* code base in exactly the same way. Also pushed some changes: * increased max# of segment and buckets to 2^30 (means approx 1B segments times 1B bucket) * add some prototype for direct I/O for row cache serialization (zero copy) - just as a demo (just coded, not tested yet) * uses Unsafe for value (de)serialization * move (most) statistic counters to OffHeapMap to reduce contention caused by volatile (really makes sense) * remove use of guava cache API * corrected and improved key comparison Regarding the 64 bit hash. It's 64 bit since OHC takes the the most significant bits for the segment and the least significant bits for the hash inside a segment. Both are limited to 30 bits = 60 bits. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8384) Consider changing CREATE TABLE syntax for compression options in 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231823#comment-14231823 ] Tyler Hobbs commented on CASSANDRA-8384: +1 Consider changing CREATE TABLE syntax for compression options in 3.0 Key: CASSANDRA-8384 URL: https://issues.apache.org/jira/browse/CASSANDRA-8384 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 3.0 Currently, `compression` table options are inconsistent with the likes of it (table `compaction`, keyspace `replication`). I suggest we change it for 3.0, like we did change `caching` syntax for 2.1 (while continuing to accept the old syntax for a release). I recommend the following changes: 1. rename `sstable_compression` to `class`, to make it consistent `compression` and `replication` 2. rename `chunk_length_kb` to `chunk_length_in_kb`, to match `memtable_flush_period_in_ms`, or, alternatively, to just `chunk_length`, with `memtable_flush_period_in_ms` renamed to `memtable_flush_period` - consistent with every other CQL option everywhere else 3. add a boolean `enabled` option, to match `compaction`. Currently, the official way to disable comression is an ugly, ugly hack (see CASSANDRA-8288) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5483) Repair tracing
[ https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231840#comment-14231840 ] Ben Chan commented on CASSANDRA-5483: - Sorry; limited computer access last week, zero access to dev box (network setup issues), turkey. Branch 5483_squashed (currently at commit 78686c61e38e, merges cleanly with trunk at 06f626acd27b) looks fine to me; looks like there were only mechanical changes needed to resolve the merge conflicts. Ran my standard simplistic test, and no obvious problems (for both 78686c61e38e and again when merged with 06f626acd27b). Any problem is likely to be something I did, or some unforseen interaction with code in the updated trunk. Hopefully neither of those possibilities is true. I'll keep watching this thread/issue just in case. Repair tracing -- Key: CASSANDRA-5483 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Yuki Morishita Assignee: Ben Chan Priority: Minor Labels: repair Fix For: 3.0 Attachments: 5483-full-trunk.txt, 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch, 5483-v07-08-Fix-brace-style.patch, 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch, 5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 5483-v08-14-Poll-system_traces.events.patch, 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 5483-v10-17-minor-bugfixes-and-changes.patch, 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 5483-v14-01-squashed.patch, 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch, tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch, v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch I think it would be nice to log repair stats and results like query tracing stores traces to system keyspace. With it, you don't have to lookup each log file to see what was the status and how it performed the repair you invoked. Instead, you can query the repair log with session ID to see the state and stats of all nodes involved in that repair session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8094) Heavy writes in RangeSlice read requests
[ https://issues.apache.org/jira/browse/CASSANDRA-8094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minh Do updated CASSANDRA-8094: --- Due Date: 15/Jan/15 (was: 14/Nov/14) Heavy writes in RangeSlice read requests -- Key: CASSANDRA-8094 URL: https://issues.apache.org/jira/browse/CASSANDRA-8094 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Minh Do Assignee: Minh Do Fix For: 2.0.12 RangeSlice requests always do a scheduled read repair when coordinators try to resolve replicas' responses no matter read_repair_chance is set or not. Because of this, in low writes and high reads clusters, there are very high write requests going on between nodes. We should have an option to turn this off and this can be different than the read_repair_chance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8132) Save or stream hints to a safe place in node replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minh Do updated CASSANDRA-8132: --- Due Date: 15/Jan/15 (was: 28/Nov/14) Save or stream hints to a safe place in node replacement Key: CASSANDRA-8132 URL: https://issues.apache.org/jira/browse/CASSANDRA-8132 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Minh Do Assignee: Minh Do Fix For: 2.1.3 Often, we need to replace a node with a new instance in the cloud environment where we have all nodes are still alive. To be safe without losing data, we usually make sure all hints are gone before we do this operation. Replacement means we just want to shutdown C* process on a node and bring up another instance to take over that node's token. However, if a node to be replaced has a lot of stored hints, its HintedHandofManager seems very slow to send the hints to other nodes. In our case, we tried to replace a node and had to wait for several days before its stored hints are clear out. As mentioned above, we need all hints on this node to clear out before we can terminate it and replace it by a new instance/machine. Since this is not a decommission, I am proposing that we have the same hints-streaming mechanism as in the decommission code. Furthermore, there needs to be a cmd for NodeTool to trigger this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8407) cqlsh: handle timestamps outside of 1970 - 2038 range
Tyler Hobbs created CASSANDRA-8407: -- Summary: cqlsh: handle timestamps outside of 1970 - 2038 range Key: CASSANDRA-8407 URL: https://issues.apache.org/jira/browse/CASSANDRA-8407 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Tyler Hobbs Assignee: Tyler Hobbs Priority: Minor Currently, cqlsh can't handle {{timestamp}} values that fall outside of the 1970 - 2038 range due to a python limitation with {{utcfromtimestamp()}}. This is being addressed by [PYTHON-198|https://datastax-oss.atlassian.net/browse/PYTHON-189] in the python driver. Once that's complete, we should update the bundled python driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8407) cqlsh: handle timestamps outside of 1970 - 2038 range
[ https://issues.apache.org/jira/browse/CASSANDRA-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-8407: --- Labels: cqlsh (was: ) cqlsh: handle timestamps outside of 1970 - 2038 range - Key: CASSANDRA-8407 URL: https://issues.apache.org/jira/browse/CASSANDRA-8407 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Tyler Hobbs Assignee: Tyler Hobbs Priority: Minor Labels: cqlsh Currently, cqlsh can't handle {{timestamp}} values that fall outside of the 1970 - 2038 range due to a python limitation with {{utcfromtimestamp()}}. This is being addressed by [PYTHON-198|https://datastax-oss.atlassian.net/browse/PYTHON-189] in the python driver. Once that's complete, we should update the bundled python driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8342) Remove historical guidance for concurrent reader and writer tunings.
[ https://issues.apache.org/jira/browse/CASSANDRA-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McGuire reassigned CASSANDRA-8342: --- Assignee: (was: Ryan McGuire) Remove historical guidance for concurrent reader and writer tunings. Key: CASSANDRA-8342 URL: https://issues.apache.org/jira/browse/CASSANDRA-8342 Project: Cassandra Issue Type: Improvement Reporter: Matt Stump The cassandra.yaml and documentation provide guidance on tuning concurrent readers or concurrent writers to system resources (cores, spindles). Testing performed by both myself and customers demonstrates no benefit for thread pool sizes above 64 in size, and for thread pools greater than 128 in size a decrease in throughput. This is due to thread scheduling and synchronization bottlenecks within Cassandra. Additionally, for lower end systems reducing these thread pools provides very little benefit because the bottleneck is typically moved to either IO or CPU. I propose that we set the default value to 64 (current default is 32), and remove all guidance/recommendations regarding tuning. This recommendation may change in 3.0, but that would require further experimentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8408) limit appears to replace page size under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch updated CASSANDRA-8408: -- Summary: limit appears to replace page size under certain conditions (was: limit appears to replace page size) limit appears to replace page size under certain conditions --- Key: CASSANDRA-8408 URL: https://issues.apache.org/jira/browse/CASSANDRA-8408 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Priority: Minor This seems it could be related to CASSANDRA-8403. When paging a query with: limit page size data size, and querying using an 'IN' clause across several partitions, I get back several pages of size=limit (instead of the page size being used). So the limit is being exceeded and it seems to supplant the page size value, but something is still keeping the total rows returned down. To repro, create a table: CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) And add data across several partitions (I used 6 partitions). Add a bunch of rows to each partition (I have 80 total across all partitions). Perform a paged query using an 'IN' clause across all the partitions, where: limit page_size data size. I used something like: SELECT * FROM paging_test where id in (1,2,3,4,5,6) LIMIT 9; (with a page_size of 20 for the query). What I get returned is three pages of sizes: 9, 9, 8 -- 26 rows in total but I'm uncertain why. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8408) limit appears to replace page size
Russ Hatch created CASSANDRA-8408: - Summary: limit appears to replace page size Key: CASSANDRA-8408 URL: https://issues.apache.org/jira/browse/CASSANDRA-8408 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Priority: Minor This seems it could be related to CASSANDRA-8403. When paging a query with: limit page size data size, and querying using an 'IN' clause across several partitions, I get back several pages of size=limit (instead of the page size being used). So the limit is being exceeded and it seems to supplant the page size value, but something is still keeping the total rows returned down. To repro, create a table: CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) And add data across several partitions (I used 6 partitions). Add a bunch of rows to each partition (I have 80 total across all partitions). Perform a paged query using an 'IN' clause across all the partitions, where: limit page_size data size. I used something like: SELECT * FROM paging_test where id in (1,2,3,4,5,6) LIMIT 9; (with a page_size of 20 for the query). What I get returned is three pages of sizes: 9, 9, 8 -- 26 rows in total but I'm uncertain why. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8408) limit appears to replace page size under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231871#comment-14231871 ] Russ Hatch commented on CASSANDRA-8408: --- Forgot to mention above that the issue seems more likely to occur when I query across several partitions than when querying across just 2 or 3. limit appears to replace page size under certain conditions --- Key: CASSANDRA-8408 URL: https://issues.apache.org/jira/browse/CASSANDRA-8408 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Priority: Minor This seems it could be related to CASSANDRA-8403. When paging a query with: limit page size data size, and querying using an 'IN' clause across several partitions, I get back several pages of size=limit (instead of the page size being used). So the limit is being exceeded and it seems to supplant the page size value, but something is still keeping the total rows returned down. To repro, create a table: CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) And add data across several partitions (I used 6 partitions). Add a bunch of rows to each partition (I have 80 total across all partitions). Perform a paged query using an 'IN' clause across all the partitions, where: limit page_size data size. I used something like: SELECT * FROM paging_test where id in (1,2,3,4,5,6) LIMIT 9; (with a page_size of 20 for the query). What I get returned is three pages of sizes: 9, 9, 8 -- 26 rows in total but I'm uncertain why. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson reassigned CASSANDRA-7186: -- Assignee: Philip Thompson (was: Ryan McGuire) alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231878#comment-14231878 ] Vijay commented on CASSANDRA-7438: -- Never mind, my bad it was related the below (which needs to be more configurable instead) and the items where going missing earlier than i thought it should and looks you just evict the items per segment (If a segment is used more more items will disappear from that segment and the lest used segment items will remain). {code} // 12.5% if capacity less than 8GB // 10% if capacity less than 16 GB // 5% if capacity is higher than 16GB {code} Also noticed you don't have replace which Cassandra uses. Anyways i am going to stop working on this for now, let me know if someone wants any other info. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
[ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231898#comment-14231898 ] Jeremiah Jordan commented on CASSANDRA-4476: I think you need to re-visit the issue of the result ordering. Without the full result set being in token order you cannot page through the results from the secondary index. Internal and user driven paging rely on being able to start the next page by knowing the token the previous page ended on. With an implementation that does not return the results in token order, you cannot send the end token of the previous result as the start token for the next page, or you will skip all values for following index rows that have a token before that. For example: Dataset: {noformat} (token(key), indexed) (1, 6), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5), (7, 6), (8, 6) {noformat} {noformat} select token(key),indexed from temp where indexed 4 limit 3; 3, 5 4, 5 5, 5 {noformat} Then without proper token order results: {noformat} select token(key),indexed from temp where indexed 4 and token(key) 5 limit 3; 6, 5 7, 6 8, 6 {noformat} You just skipped (1, 6) and (2, 6) and can not get them. Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE) Key: CASSANDRA-4476 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476 Project: Cassandra Issue Type: Improvement Components: API, Core Reporter: Sylvain Lebresne Assignee: Oded Peer Priority: Minor Labels: cql Fix For: 3.0 Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed column). Given that indexed CFs are local (and use LocalPartitioner that order the row by the type of the indexed column), we should extend 2ndary indexes to allow querying indexed columns even when no EQ clause is provided. As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do that estimate reasonably accurately, this might provide better performance even for index queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better selectivity than EQ ones (say you index both the user country and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate 'Jan 2009' AND birtdate 'July 2009', you'd better use the birthdate index first). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231899#comment-14231899 ] Philip Thompson commented on CASSANDRA-7186: How many nodes do you have in this cluster? alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
[ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231898#comment-14231898 ] Jeremiah Jordan edited comment on CASSANDRA-4476 at 12/2/14 7:04 PM: - I think you need to re-visit the issue of the result ordering. Without the full result set being in token order you cannot page through the results from the secondary index. Internal and user driven paging rely on being able to start the next page by knowing the token the previous page ended on. With an implementation that does not return the results in token order, you cannot send the end token of the previous result as the start token for the next page, or you will skip all values for following index rows that have a token before that. For example: Dataset: {noformat} (token(key), indexed) (1, 6), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5), (7, 6), (8, 6) {noformat} {noformat} select token(key),indexed from temp where indexed 4 limit 3; 3, 5 4, 5 5, 5 {noformat} Then without proper token order results: {noformat} select token(key),indexed from temp where indexed 4 and token(key) 5 limit 3; 6, 5 7, 6 8, 6 {noformat} You just skipped (1, 6) and (2, 6) and can not get them. The next issue is that the result set merging code relies on the fact that things will be in token order. So when you run the query at anything higher than ONE and need to merge results from multiple nodes, that code will get screwed up when you transition from (6,5) to (1,6). was (Author: jjordan): I think you need to re-visit the issue of the result ordering. Without the full result set being in token order you cannot page through the results from the secondary index. Internal and user driven paging rely on being able to start the next page by knowing the token the previous page ended on. With an implementation that does not return the results in token order, you cannot send the end token of the previous result as the start token for the next page, or you will skip all values for following index rows that have a token before that. For example: Dataset: {noformat} (token(key), indexed) (1, 6), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5), (7, 6), (8, 6) {noformat} {noformat} select token(key),indexed from temp where indexed 4 limit 3; 3, 5 4, 5 5, 5 {noformat} Then without proper token order results: {noformat} select token(key),indexed from temp where indexed 4 and token(key) 5 limit 3; 6, 5 7, 6 8, 6 {noformat} You just skipped (1, 6) and (2, 6) and can not get them. Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE) Key: CASSANDRA-4476 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476 Project: Cassandra Issue Type: Improvement Components: API, Core Reporter: Sylvain Lebresne Assignee: Oded Peer Priority: Minor Labels: cql Fix For: 3.0 Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed column). Given that indexed CFs are local (and use LocalPartitioner that order the row by the type of the indexed column), we should extend 2ndary indexes to allow querying indexed columns even when no EQ clause is provided. As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do that estimate reasonably accurately, this might provide better performance even for index queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better selectivity than EQ ones (say you index both the user country and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate 'Jan 2009' AND birtdate 'July 2009', you'd better use the birthdate index first). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[2/5] cassandra git commit: Refactor SelectStatement and Restrictions
http://git-wip-us.apache.org/repos/asf/cassandra/blob/65a7088e/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java -- diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java index 3360d40..022105c 100644 --- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java +++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java @@ -20,41 +20,50 @@ package org.apache.cassandra.cql3.statements; import java.nio.ByteBuffer; import java.util.*; -import com.google.common.base.Joiner; import com.google.common.base.Objects; import com.google.common.base.Predicate; -import com.google.common.collect.AbstractIterator; import com.google.common.collect.Iterables; import com.google.common.collect.Iterators; import org.apache.cassandra.auth.Permission; +import org.apache.cassandra.config.CFMetaData; +import org.apache.cassandra.config.ColumnDefinition; import org.apache.cassandra.cql3.*; -import org.apache.cassandra.cql3.statements.SingleColumnRestriction.Contains; +import org.apache.cassandra.cql3.restrictions.StatementRestrictions; import org.apache.cassandra.cql3.selection.RawSelector; import org.apache.cassandra.cql3.selection.Selection; -import org.apache.cassandra.db.composites.*; -import org.apache.cassandra.db.composites.Composite.EOC; -import org.apache.cassandra.transport.messages.ResultMessage; -import org.apache.cassandra.config.CFMetaData; -import org.apache.cassandra.config.ColumnDefinition; import org.apache.cassandra.db.*; -import org.apache.cassandra.db.filter.*; -import org.apache.cassandra.db.index.SecondaryIndex; +import org.apache.cassandra.db.composites.CellName; +import org.apache.cassandra.db.composites.CellNameType; +import org.apache.cassandra.db.composites.Composite; +import org.apache.cassandra.db.composites.Composites; +import org.apache.cassandra.db.filter.ColumnSlice; +import org.apache.cassandra.db.filter.IDiskAtomFilter; +import org.apache.cassandra.db.filter.NamesQueryFilter; +import org.apache.cassandra.db.filter.SliceQueryFilter; import org.apache.cassandra.db.index.SecondaryIndexManager; -import org.apache.cassandra.db.marshal.*; -import org.apache.cassandra.dht.*; +import org.apache.cassandra.db.marshal.CollectionType; +import org.apache.cassandra.db.marshal.CompositeType; +import org.apache.cassandra.db.marshal.Int32Type; +import org.apache.cassandra.dht.AbstractBounds; import org.apache.cassandra.exceptions.*; +import org.apache.cassandra.serializers.MarshalException; import org.apache.cassandra.service.ClientState; import org.apache.cassandra.service.QueryState; import org.apache.cassandra.service.StorageProxy; -import org.apache.cassandra.service.StorageService; -import org.apache.cassandra.service.pager.*; -import org.apache.cassandra.db.ConsistencyLevel; +import org.apache.cassandra.service.pager.Pageable; +import org.apache.cassandra.service.pager.QueryPager; +import org.apache.cassandra.service.pager.QueryPagers; import org.apache.cassandra.thrift.ThriftValidation; -import org.apache.cassandra.serializers.MarshalException; +import org.apache.cassandra.transport.messages.ResultMessage; import org.apache.cassandra.utils.ByteBufferUtil; import org.apache.cassandra.utils.FBUtilities; +import static org.apache.cassandra.cql3.statements.RequestValidations.checkFalse; +import static org.apache.cassandra.cql3.statements.RequestValidations.checkNotNull; +import static org.apache.cassandra.cql3.statements.RequestValidations.checkTrue; +import static org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest; + /** * Encapsulates a completely parsed SELECT query, including the target * column family, expression, result count, and ordering clause. @@ -70,96 +79,43 @@ public class SelectStatement implements CQLStatement private final Selection selection; private final Term limit; -/** Restrictions on partitioning columns */ -private final Restriction[] keyRestrictions; - -/** Restrictions on clustering columns */ -private final Restriction[] columnRestrictions; - -/** Restrictions on non-primary key columns (i.e. secondary index restrictions) */ -private final MapColumnIdentifier, Restriction metadataRestrictions = new HashMapColumnIdentifier, Restriction(); - -// All restricted columns not covered by the key or index filter -private final SetColumnDefinition restrictedColumns = new HashSetColumnDefinition(); -private Restriction.Slice sliceRestriction; - -private boolean isReversed; -private boolean onToken; -private boolean isKeyRange; -private boolean keyIsInRelation; -private boolean usesSecondaryIndexing; +private final StatementRestrictions restrictions; -private MapColumnIdentifier, Integer orderingIndexes; +private final boolean isReversed; -private
[5/5] cassandra git commit: Refactor SelectStatement and Restrictions
Refactor SelectStatement and Restrictions Patch by Benjamin Lerer; reviewed by Tyler Hobbs for CASSANDRA-7981 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/65a7088e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/65a7088e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/65a7088e Branch: refs/heads/trunk Commit: 65a7088e71061b876e9cd51140f31c92ded92777 Parents: a604b14 Author: blerer b_le...@hotmail.com Authored: Tue Dec 2 13:08:25 2014 -0600 Committer: Tyler Hobbs ty...@datastax.com Committed: Tue Dec 2 13:08:25 2014 -0600 -- CHANGES.txt |2 + NEWS.txt|3 + .../cassandra/config/ColumnDefinition.java | 40 + .../cassandra/cql3/ColumnSpecification.java |6 + src/java/org/apache/cassandra/cql3/Cql.g|5 +- .../cassandra/cql3/MultiColumnRelation.java | 130 +- .../org/apache/cassandra/cql3/Operator.java | 119 +- .../org/apache/cassandra/cql3/Relation.java | 221 ++- .../cassandra/cql3/SingleColumnRelation.java| 181 +- .../apache/cassandra/cql3/TokenRelation.java| 164 ++ src/java/org/apache/cassandra/cql3/Tuples.java |3 +- .../cassandra/cql3/VariableSpecifications.java | 10 + .../AbstractPrimaryKeyRestrictions.java | 36 + .../cql3/restrictions/AbstractRestriction.java | 129 ++ .../ForwardingPrimaryKeyRestrictions.java | 159 ++ .../restrictions/MultiColumnRestriction.java| 520 ++ .../restrictions/PrimaryKeyRestrictions.java| 40 + .../cql3/restrictions/Restriction.java | 97 ++ .../cql3/restrictions/Restrictions.java | 82 + .../ReversedPrimaryKeyRestrictions.java | 77 + .../SingleColumnPrimaryKeyRestrictions.java | 307 .../restrictions/SingleColumnRestriction.java | 477 ++ .../restrictions/SingleColumnRestrictions.java | 209 +++ .../restrictions/StatementRestrictions.java | 600 +++ .../cassandra/cql3/restrictions/TermSlice.java | 167 ++ .../cql3/restrictions/TokenRestriction.java | 224 +++ .../cassandra/cql3/selection/Selection.java | 123 +- .../apache/cassandra/cql3/statements/Bound.java | 14 +- .../cql3/statements/DeleteStatement.java|1 + .../cql3/statements/ModificationStatement.java | 35 +- .../cql3/statements/MultiColumnRestriction.java | 137 -- .../cql3/statements/RequestValidations.java | 194 +++ .../cassandra/cql3/statements/Restriction.java | 79 - .../cql3/statements/SelectStatement.java| 1597 +++--- .../statements/SingleColumnRestriction.java | 486 -- .../cassandra/db/composites/Composites.java | 22 +- .../db/composites/CompositesBuilder.java| 15 +- .../cassandra/db/marshal/CollectionType.java| 34 +- .../exceptions/UnrecognizedEntityException.java | 49 + .../org/apache/cassandra/cql3/AliasTest.java| 40 + .../cassandra/cql3/ContainsRelationTest.java| 39 +- .../cassandra/cql3/FrozenCollectionsTest.java | 16 +- .../cassandra/cql3/MultiColumnRelationTest.java | 161 +- .../cql3/SelectWithTokenFunctionTest.java | 39 +- .../cql3/SingleColumnRelationTest.java | 218 ++- .../cassandra/cql3/ThriftCompatibilityTest.java |2 +- 46 files changed, 5031 insertions(+), 2278 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/65a7088e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 3cb1c0f..6761c31 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,6 @@ 3.0 + * Refactor SelectStatement, return IN results in natural order instead + of IN value list order (CASSANDRA-7981) * Support UDTs, tuples, and collections in user-defined functions (CASSANDRA-7563) * Fix aggregate fn results on empty selection, result column name, http://git-wip-us.apache.org/repos/asf/cassandra/blob/65a7088e/NEWS.txt -- diff --git a/NEWS.txt b/NEWS.txt index 1d168f0..8d8ebdc 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -33,6 +33,9 @@ Upgrading in 2.0.0). Please switch to CQL3 if you haven't already done so. - Very large batches will now be rejected (defaults to 50kb). This can be customized by modifying batch_size_fail_threshold_in_kb. + - The results of CQL3 queries containing an IN restriction will be ordered + in the normal order and not anymore in the order in which the column values were + specified in the IN restriction. 2.1.2 = http://git-wip-us.apache.org/repos/asf/cassandra/blob/65a7088e/src/java/org/apache/cassandra/config/ColumnDefinition.java
[3/5] cassandra git commit: Refactor SelectStatement and Restrictions
http://git-wip-us.apache.org/repos/asf/cassandra/blob/65a7088e/src/java/org/apache/cassandra/cql3/selection/Selection.java -- diff --git a/src/java/org/apache/cassandra/cql3/selection/Selection.java b/src/java/org/apache/cassandra/cql3/selection/Selection.java index 6ad36e9..e44a39f 100644 --- a/src/java/org/apache/cassandra/cql3/selection/Selection.java +++ b/src/java/org/apache/cassandra/cql3/selection/Selection.java @@ -35,17 +35,36 @@ import org.apache.cassandra.db.context.CounterContext; import org.apache.cassandra.exceptions.InvalidRequestException; import org.apache.cassandra.utils.ByteBufferUtil; +import com.google.common.base.Predicate; +import com.google.common.collect.Iterables; import com.google.common.collect.Iterators; public abstract class Selection { +/** + * A predicate that returns codetrue/code for static columns. + */ +private static final PredicateColumnDefinition STATIC_COLUMN_FILTER = new PredicateColumnDefinition() +{ +public boolean apply(ColumnDefinition def) +{ +return def.isStatic(); +} +}; + +private final CFMetaData cfm; private final CollectionColumnDefinition columns; private final ResultSet.Metadata metadata; private final boolean collectTimestamps; private final boolean collectTTLs; -protected Selection(CollectionColumnDefinition columns, ListColumnSpecification metadata, boolean collectTimestamps, boolean collectTTLs) +protected Selection(CFMetaData cfm, +CollectionColumnDefinition columns, +ListColumnSpecification metadata, +boolean collectTimestamps, +boolean collectTTLs) { +this.cfm = cfm; this.columns = columns; this.metadata = new ResultSet.Metadata(metadata); this.collectTimestamps = collectTimestamps; @@ -56,6 +75,76 @@ public abstract class Selection public boolean isWildcard() { return false; +} + +/** + * Checks if this selection contains static columns. + * @return codetrue/code if this selection contains static columns, codefalse/code otherwise; + */ +public boolean containsStaticColumns() +{ +if (!cfm.hasStaticColumns()) +return false; + +if (isWildcard()) +return true; + +return !Iterables.isEmpty(Iterables.filter(columns, STATIC_COLUMN_FILTER)); +} + +/** + * Checks if this selection contains only static columns. + * @return codetrue/code if this selection contains only static columns, codefalse/code otherwise; + */ +public boolean containsOnlyStaticColumns() +{ +if (!containsStaticColumns()) +return false; + +if (isWildcard()) +return false; + +for (ColumnDefinition def : getColumns()) +{ +if (!def.isPartitionKey() !def.isStatic()) +return false; +} + +return true; +} + +/** + * Checks if this selection contains a collection. + * + * @return codetrue/code if this selection contains a collection, codefalse/code otherwise. + */ +public boolean containsACollection() +{ +if (!cfm.comparator.hasCollections()) +return false; + +for (ColumnDefinition def : getColumns()) +if (def.type.isCollection() def.type.isMultiCell()) +return true; + +return false; +} + +/** + * Returns the index of the specified column. + * + * @param def the column definition + * @return the index of the specified column + */ +public int indexOf(final ColumnDefinition def) +{ +return Iterators.indexOf(getColumns().iterator(), new PredicateColumnDefinition() + { + public boolean apply(ColumnDefinition n) + { + return def.name.equals(n.name); + } + }); } public ResultSet.Metadata getResultMetadata() @@ -67,12 +156,12 @@ public abstract class Selection { ListColumnDefinition all = new ArrayListColumnDefinition(cfm.allColumns().size()); Iterators.addAll(all, cfm.allColumnsInSelectOrder()); -return new SimpleSelection(all, true); +return new SimpleSelection(cfm, all, true); } -public static Selection forColumns(CollectionColumnDefinition columns) +public static Selection forColumns(CFMetaData cfm, CollectionColumnDefinition columns) { -return new SimpleSelection(columns, false); +return new SimpleSelection(cfm, columns, false); } public int addColumnForOrdering(ColumnDefinition c) @@ -105,8 +194,8 @@ public abstract class Selection
[4/5] cassandra git commit: Refactor SelectStatement and Restrictions
http://git-wip-us.apache.org/repos/asf/cassandra/blob/65a7088e/src/java/org/apache/cassandra/cql3/restrictions/Restriction.java -- diff --git a/src/java/org/apache/cassandra/cql3/restrictions/Restriction.java b/src/java/org/apache/cassandra/cql3/restrictions/Restriction.java new file mode 100644 index 000..d0ed193 --- /dev/null +++ b/src/java/org/apache/cassandra/cql3/restrictions/Restriction.java @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.cql3.restrictions; + +import java.nio.ByteBuffer; +import java.util.List; + +import org.apache.cassandra.cql3.QueryOptions; +import org.apache.cassandra.cql3.statements.Bound; +import org.apache.cassandra.db.IndexExpression; +import org.apache.cassandra.db.index.SecondaryIndexManager; +import org.apache.cassandra.exceptions.InvalidRequestException; + +/** + * A restriction/clause on a column. + * The goal of this class being to group all conditions for a column in a SELECT. + */ +public interface Restriction +{ +public boolean isOnToken(); +public boolean isSlice(); +public boolean isEQ(); +public boolean isIN(); +public boolean isContains(); +public boolean isMultiColumn(); + +public ListByteBuffer values(QueryOptions options) throws InvalidRequestException; + +/** + * Returns codetrue/code if one of the restrictions use the specified function. + * + * @param ksName the keyspace name + * @param functionName the function name + * @return codetrue/code if one of the restrictions use the specified function, codefalse/code otherwise. + */ +public boolean usesFunction(String ksName, String functionName); + +/** + * Checks if the specified bound is set or not. + * @param b the bound type + * @return codetrue/code if the specified bound is set, codefalse/code otherwise + */ +public boolean hasBound(Bound b); + +public ListByteBuffer bounds(Bound b, QueryOptions options) throws InvalidRequestException; + +/** + * Checks if the specified bound is inclusive or not. + * @param b the bound type + * @return codetrue/code if the specified bound is inclusive, codefalse/code otherwise + */ +public boolean isInclusive(Bound b); + +/** + * Merges this restriction with the specified one. + * + * @param otherRestriction the restriction to merge into this one + * @return the restriction resulting of the merge + * @throws InvalidRequestException if the restrictions cannot be merged + */ +public Restriction mergeWith(Restriction otherRestriction) throws InvalidRequestException; + +/** + * Check if the restriction is on indexed columns. + * + * @param indexManager the index manager + * @return codetrue/code if the restriction is on indexed columns, codefalse/code + */ +public boolean hasSupportingIndex(SecondaryIndexManager indexManager); + +/** + * Adds to the specified list the codeIndexExpression/codes corresponding to this codeRestriction/code. + * + * @param expressions the list to add the codeIndexExpression/codes to + * @param options the query options + * @throws InvalidRequestException if this codeRestriction/code cannot be converted into + * codeIndexExpression/codes + */ +public void addIndexExpressionTo(ListIndexExpression expressions, + QueryOptions options) + throws InvalidRequestException; +} http://git-wip-us.apache.org/repos/asf/cassandra/blob/65a7088e/src/java/org/apache/cassandra/cql3/restrictions/Restrictions.java -- diff --git a/src/java/org/apache/cassandra/cql3/restrictions/Restrictions.java b/src/java/org/apache/cassandra/cql3/restrictions/Restrictions.java new file mode 100644 index 000..cf2555e --- /dev/null +++ b/src/java/org/apache/cassandra/cql3/restrictions/Restrictions.java @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + *
[1/5] cassandra git commit: Refactor SelectStatement and Restrictions
Repository: cassandra Updated Branches: refs/heads/trunk a604b14bf - 65a7088e7 http://git-wip-us.apache.org/repos/asf/cassandra/blob/65a7088e/src/java/org/apache/cassandra/cql3/statements/SingleColumnRestriction.java -- diff --git a/src/java/org/apache/cassandra/cql3/statements/SingleColumnRestriction.java b/src/java/org/apache/cassandra/cql3/statements/SingleColumnRestriction.java deleted file mode 100644 index b6ca640..000 --- a/src/java/org/apache/cassandra/cql3/statements/SingleColumnRestriction.java +++ /dev/null @@ -1,486 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.cassandra.cql3.statements; - -import org.apache.cassandra.cql3.*; -import org.apache.cassandra.exceptions.InvalidRequestException; - -import java.nio.ByteBuffer; -import java.util.ArrayList; -import java.util.Collections; -import java.util.List; - -public abstract class SingleColumnRestriction implements Restriction -{ -public boolean isMultiColumn() -{ -return false; -} - -public static class EQ extends SingleColumnRestriction implements Restriction.EQ -{ -protected final Term value; -private final boolean onToken; - -public EQ(Term value, boolean onToken) -{ -this.value = value; -this.onToken = onToken; -} - -public boolean usesFunction(String ksName, String functionName) -{ -return value != null value.usesFunction(ksName, functionName); -} - -public ListByteBuffer values(QueryOptions options) throws InvalidRequestException -{ -return Collections.singletonList(value.bindAndGet(options)); -} - -public boolean isSlice() -{ -return false; -} - -public boolean isEQ() -{ -return true; -} - -public boolean isIN() -{ -return false; -} - -public boolean isContains() -{ -return false; -} - -public boolean isOnToken() -{ -return onToken; -} - -public boolean canEvaluateWithSlices() -{ -return true; -} - -@Override -public String toString() -{ -return String.format(EQ(%s)%s, value, onToken ? * : ); -} -} - -public static class InWithValues extends SingleColumnRestriction implements Restriction.IN -{ -protected final ListTerm values; - -public InWithValues(ListTerm values) -{ -this.values = values; -} - -public boolean usesFunction(String ksName, String functionName) -{ -if (values != null) -for (Term value : values) -if (value != null value.usesFunction(ksName, functionName)) -return true; -return false; -} - -public ListByteBuffer values(QueryOptions options) throws InvalidRequestException -{ -ListByteBuffer buffers = new ArrayList(values.size()); -for (Term value : values) -buffers.add(value.bindAndGet(options)); -return buffers; -} - -public boolean canHaveOnlyOneValue() -{ -return values.size() == 1; -} - -public boolean isSlice() -{ -return false; -} - -public boolean isEQ() -{ -return false; -} - -public boolean isIN() -{ -return true; -} - -public boolean isContains() -{ -return false; -} - -public boolean isOnToken() -{ -return false; -} - -public boolean canEvaluateWithSlices() -{ -return true; -} - -@Override -public String toString() -{ -return String.format(IN(%s), values); -} -} - -public static class InWithMarker extends SingleColumnRestriction implements Restriction.IN -{ -protected final
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231945#comment-14231945 ] Alexander Bulaev commented on CASSANDRA-7186: - We have 9 nodes total in 3 DCs for our production cluster. Even after applying the workaround posted here (run ALTER on each node), we have multiple schema version in the cluster: {noformat} Cluster Information: Name: music-cass cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: c9039c04-afee-3e5e-bd78-b3d7201cb154: [2a02:6b8:0:c08:0:0:0:22, 2a02:6b8:0:c08:0:0:0:23, 2a02:6b8:0:c08:0:0:0:21, 2a02:6b8:0:2514:0:0:0:39, 2a02:6b8:0:2514:0:0:0:40, 2a02:6b8:0:2514:0:0:0:41] 3dff776f-fad6-3150-b7c3-0415366cc85e: [2a02:6b8:0:1a1b:0:0:0:30, 2a02:6b8:0:1a1b:0:0:0:31, 2a02:6b8:0:1a1b:0:0:0:29] {noformat} This also reproduced on both our testing clusters (3 nodes total in 3 DCs). alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231945#comment-14231945 ] Alexander Bulaev edited comment on CASSANDRA-7186 at 12/2/14 7:15 PM: -- We have 9 nodes total in 3 DCs for our production cluster. Even after applying the workaround posted here (run ALTER on each node), we have multiple schema versions in the cluster: {noformat} Cluster Information: Name: music-cass cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: c9039c04-afee-3e5e-bd78-b3d7201cb154: [2a02:6b8:0:c08:0:0:0:22, 2a02:6b8:0:c08:0:0:0:23, 2a02:6b8:0:c08:0:0:0:21, 2a02:6b8:0:2514:0:0:0:39, 2a02:6b8:0:2514:0:0:0:40, 2a02:6b8:0:2514:0:0:0:41] 3dff776f-fad6-3150-b7c3-0415366cc85e: [2a02:6b8:0:1a1b:0:0:0:30, 2a02:6b8:0:1a1b:0:0:0:31, 2a02:6b8:0:1a1b:0:0:0:29] {noformat} This also reproduced on both our testing clusters (3 nodes total in 3 DCs). was (Author: alexbool): We have 9 nodes total in 3 DCs for our production cluster. Even after applying the workaround posted here (run ALTER on each node), we have multiple schema version in the cluster: {noformat} Cluster Information: Name: music-cass cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: c9039c04-afee-3e5e-bd78-b3d7201cb154: [2a02:6b8:0:c08:0:0:0:22, 2a02:6b8:0:c08:0:0:0:23, 2a02:6b8:0:c08:0:0:0:21, 2a02:6b8:0:2514:0:0:0:39, 2a02:6b8:0:2514:0:0:0:40, 2a02:6b8:0:2514:0:0:0:41] 3dff776f-fad6-3150-b7c3-0415366cc85e: [2a02:6b8:0:1a1b:0:0:0:30, 2a02:6b8:0:1a1b:0:0:0:31, 2a02:6b8:0:1a1b:0:0:0:29] {noformat} This also reproduced on both our testing clusters (3 nodes total in 3 DCs). alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-7653) Add role based access control to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe reassigned CASSANDRA-7653: -- Assignee: Sam Tunnicliffe (was: Mike Adamson) Add role based access control to Cassandra -- Key: CASSANDRA-7653 URL: https://issues.apache.org/jira/browse/CASSANDRA-7653 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Mike Adamson Assignee: Sam Tunnicliffe Fix For: 3.0 Attachments: 7653.patch The current authentication model supports granting permissions to individual users. While this is OK for small or medium organizations wanting to implement authorization, it does not work well in large organizations because of the overhead of having to maintain the permissions for each user. Introducing roles into the authentication model would allow sets of permissions to be controlled in one place as a role and then the role granted to users. Roles should also be able to be granted to other roles to allow hierarchical sets of permissions to be built up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7653) Add role based access control to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231961#comment-14231961 ] Hans van der Linde commented on CASSANDRA-7653: --- Dear sender, I am attending a conference and have limmited access to my email. Your email will not be forwarded. I will be back in the office on tuesday 9-dec-2014. For urgent issue's regarding the RTPE/LOM project, please contact Peter van de Koolwijk peter.van.de.koolw...@mail.ing.nl (mobile 06-54660211). Best regards, Hans van der Linde - ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. - Add role based access control to Cassandra -- Key: CASSANDRA-7653 URL: https://issues.apache.org/jira/browse/CASSANDRA-7653 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Mike Adamson Assignee: Sam Tunnicliffe Fix For: 3.0 Attachments: 7653.patch The current authentication model supports granting permissions to individual users. While this is OK for small or medium organizations wanting to implement authorization, it does not work well in large organizations because of the overhead of having to maintain the permissions for each user. Introducing roles into the authentication model would allow sets of permissions to be controlled in one place as a role and then the role granted to users. Roles should also be able to be granted to other roles to allow hierarchical sets of permissions to be built up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7653) Add role based access control to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231960#comment-14231960 ] Hans van der Linde commented on CASSANDRA-7653: --- Dear sender, I am attending a conference and have limmited access to my email. Your email will not be forwarded. I will be back in the office on tuesday 9-dec-2014. For urgent issue's regarding the RTPE/LOM project, please contact Peter van de Koolwijk peter.van.de.koolw...@mail.ing.nl (mobile 06-54660211). Best regards, Hans van der Linde - ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. - Add role based access control to Cassandra -- Key: CASSANDRA-7653 URL: https://issues.apache.org/jira/browse/CASSANDRA-7653 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Mike Adamson Assignee: Sam Tunnicliffe Fix For: 3.0 Attachments: 7653.patch The current authentication model supports granting permissions to individual users. While this is OK for small or medium organizations wanting to implement authorization, it does not work well in large organizations because of the overhead of having to maintain the permissions for each user. Introducing roles into the authentication model would allow sets of permissions to be controlled in one place as a role and then the role granted to users. Roles should also be able to be granted to other roles to allow hierarchical sets of permissions to be built up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process
[ https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231977#comment-14231977 ] Joshua McKenzie commented on CASSANDRA-8390: You should be able to find DiskAccessMode in your system.log file for cassandra - not talking eventvwr.msc / Event Viewer. The process cannot access the file because it is being used by another process -- Key: CASSANDRA-8390 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390 Project: Cassandra Issue Type: Bug Reporter: Ilya Komolkin Assignee: Joshua McKenzie 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[NonPeriodicTasks:1,5,main] org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) ~[cassandra-all-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[na:1.7.0_71] at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) ~[na:1.7.0_71] at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) ~[na:1.7.0_71] at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~[cassandra-all-2.1.1.jar:2.1.1] ... 11 common frames omitted -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-7653) Add role based access control to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-7653: Comment: was deleted (was: Dear sender, I am attending a conference and have limmited access to my email. Your email will not be forwarded. I will be back in the office on tuesday 9-dec-2014. For urgent issue's regarding the RTPE/LOM project, please contact Peter van de Koolwijk peter.van.de.koolw...@mail.ing.nl (mobile 06-54660211). Best regards, Hans van der Linde - ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. - ) Add role based access control to Cassandra -- Key: CASSANDRA-7653 URL: https://issues.apache.org/jira/browse/CASSANDRA-7653 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Mike Adamson Assignee: Sam Tunnicliffe Fix For: 3.0 Attachments: 7653.patch The current authentication model supports granting permissions to individual users. While this is OK for small or medium organizations wanting to implement authorization, it does not work well in large organizations because of the overhead of having to maintain the permissions for each user. Introducing roles into the authentication model would allow sets of permissions to be controlled in one place as a role and then the role granted to users. Roles should also be able to be granted to other roles to allow hierarchical sets of permissions to be built up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-7653) Add role based access control to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-7653: Comment: was deleted (was: Dear sender, I am attending a conference and have limmited access to my email. Your email will not be forwarded. I will be back in the office on tuesday 9-dec-2014. For urgent issue's regarding the RTPE/LOM project, please contact Peter van de Koolwijk peter.van.de.koolw...@mail.ing.nl (mobile 06-54660211). Best regards, Hans van der Linde - ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. - ) Add role based access control to Cassandra -- Key: CASSANDRA-7653 URL: https://issues.apache.org/jira/browse/CASSANDRA-7653 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Mike Adamson Assignee: Sam Tunnicliffe Fix For: 3.0 Attachments: 7653.patch The current authentication model supports granting permissions to individual users. While this is OK for small or medium organizations wanting to implement authorization, it does not work well in large organizations because of the overhead of having to maintain the permissions for each user. Introducing roles into the authentication model would allow sets of permissions to be controlled in one place as a role and then the role granted to users. Roles should also be able to be granted to other roles to allow hierarchical sets of permissions to be built up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232031#comment-14232031 ] Philip Thompson commented on CASSANDRA-7186: Is the keyspace in question using the NetworkTopologyStrategy and does it have an RF of at least 1 in every DC, or at least in the DC where the failed propagation is occurring? alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately
[ https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232032#comment-14232032 ] Jason Brown commented on CASSANDRA-7203: bq. I was mostly hoping to get your and sankalp kohli's views on if those workload skews occur I think it's completely dependent upon an organization's systems' implementation as to what traffic actually goes to a database vs. cache vs. whatever, and I think trying to be incredibly clever here is not worth the implementation costs. Again, I'll reiterate, we have much bigger fish to fry than this. Flush (and Compact) High Traffic Partitions Separately -- Key: CASSANDRA-7203 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: compaction, performance An idea possibly worth exploring is the use of streaming count-min sketches to collect data over the up-time of a server to estimating the velocity of different partitions, so that high-volume partitions can be flushed separately on the assumption that they will be much smaller in number, thus reducing write amplification by permitting compaction independently of any low-velocity data. Whilst the idea is reasonably straight forward, it seems that the biggest problem here will be defining any success metric. Obviously any workload following an exponential/zipf/extreme distribution is likely to benefit from such an approach, but whether or not that would translate in real terms is another matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232035#comment-14232035 ] Alexander Bulaev commented on CASSANDRA-7186: - Yep, we're using NetworkTopologyStrategy and we place one replica in each of 3 DCs. alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately
[ https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232045#comment-14232045 ] Benedict commented on CASSANDRA-7203: - It wasn't intended to be an immediate focus, I just wanted an idea if such data distributions occurred to see if it might _ever_ be worth investigating. But I can see I'm fighting a losing battle! Flush (and Compact) High Traffic Partitions Separately -- Key: CASSANDRA-7203 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: compaction, performance An idea possibly worth exploring is the use of streaming count-min sketches to collect data over the up-time of a server to estimating the velocity of different partitions, so that high-volume partitions can be flushed separately on the assumption that they will be much smaller in number, thus reducing write amplification by permitting compaction independently of any low-velocity data. Whilst the idea is reasonably straight forward, it seems that the biggest problem here will be defining any success metric. Obviously any workload following an exponential/zipf/extreme distribution is likely to benefit from such an approach, but whether or not that would translate in real terms is another matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8373) MOVED_NODE Topology Change event is never emitted
[ https://issues.apache.org/jira/browse/CASSANDRA-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232047#comment-14232047 ] Tyler Hobbs commented on CASSANDRA-8373: On 2.0, this gives the correct behavior with a multi-node cluster, but with a single-node cluster a {{NEW_NODE}} notification is sent instead of a {{MOVED_NODE}} notification. It looks the single-node behavior is also present in 2.0, but it would be nice to fix that here as well. MOVED_NODE Topology Change event is never emitted - Key: CASSANDRA-8373 URL: https://issues.apache.org/jira/browse/CASSANDRA-8373 Project: Cassandra Issue Type: Bug Components: Core Reporter: Adam Holmberg Assignee: Adam Holmberg Priority: Minor Fix For: 2.0.12, 2.1.3 Attachments: 8373.txt lifeCycleSubscribers.onMove never gets called because [this tokenMetadata.updateNormalTokens|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L1585] call [changes the endpoint moving status|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L190], making the later isMoving conditional always false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232048#comment-14232048 ] Philip Thompson commented on CASSANDRA-7186: It probably won't matter, but can I see an obfuscated schema of a table that this occurs on? alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8408) limit appears to replace page size under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs reassigned CASSANDRA-8408: -- Assignee: Tyler Hobbs limit appears to replace page size under certain conditions --- Key: CASSANDRA-8408 URL: https://issues.apache.org/jira/browse/CASSANDRA-8408 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Tyler Hobbs Priority: Minor This seems it could be related to CASSANDRA-8403. When paging a query with: limit page size data size, and querying using an 'IN' clause across several partitions, I get back several pages of size=limit (instead of the page size being used). So the limit is being exceeded and it seems to supplant the page size value, but something is still keeping the total rows returned down. To repro, create a table: CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) And add data across several partitions (I used 6 partitions). Add a bunch of rows to each partition (I have 80 total across all partitions). Perform a paged query using an 'IN' clause across all the partitions, where: limit page_size data size. I used something like: SELECT * FROM paging_test where id in (1,2,3,4,5,6) LIMIT 9; (with a page_size of 20 for the query). What I get returned is three pages of sizes: 9, 9, 8 -- 26 rows in total but I'm uncertain why. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232062#comment-14232062 ] Tyler Hobbs commented on CASSANDRA-7032: I agree with [~jjordan] here. One of the main motivations for moving to multiple tokens was to avoid expensive rebalance operations when nodes are added or removed. Changing that is likely to be a no-go. Improve vnode allocation Key: CASSANDRA-7032 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Branimir Lambov Labels: performance, vnodes Fix For: 3.0 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better. If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232077#comment-14232077 ] Alexander Bulaev commented on CASSANDRA-7186: - {noformat} CREATE TABLE our_table ( column1 bigint, column2 timestamp, column3 text, column4 text, column5 double, column6 boolean, PRIMARY KEY (column1, column2) ) WITH CLUSTERING ORDER BY (column2 DESC) AND gc_grace_seconds = 86400; // 1 day {noformat} We were just adding column6. alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232076#comment-14232076 ] Branimir Lambov commented on CASSANDRA-7032: Perhaps I wasn't clear enough. Reassigning is not an option I am considering, I mentioned it as a way to imagine what an optimal balance would look like. The point is to get close enough results without it. Improve vnode allocation Key: CASSANDRA-7032 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Branimir Lambov Labels: performance, vnodes Fix For: 3.0 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better. If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232078#comment-14232078 ] Alexander Bulaev commented on CASSANDRA-7186: - The cluster is in this state right NOW, so feel free to ask for any diagnostics that may be relevant :) alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8408) limit appears to replace page size under certain conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232085#comment-14232085 ] Russ Hatch commented on CASSANDRA-8408: --- This issue can be reliably reproduced with my (semi-experimental) dtest branch here: https://github.com/riptano/cassandra-dtest/tree/scenarios_with_page_limit {noformat} export PRINT_DEBUG=true nosetests -vs paging_test.py:TestPagingWithModifiers.test_with_limit {noformat} limit appears to replace page size under certain conditions --- Key: CASSANDRA-8408 URL: https://issues.apache.org/jira/browse/CASSANDRA-8408 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Tyler Hobbs Priority: Minor This seems it could be related to CASSANDRA-8403. When paging a query with: limit page size data size, and querying using an 'IN' clause across several partitions, I get back several pages of size=limit (instead of the page size being used). So the limit is being exceeded and it seems to supplant the page size value, but something is still keeping the total rows returned down. To repro, create a table: CREATE TABLE paging_test ( id int, value text, PRIMARY KEY (id, value) ) And add data across several partitions (I used 6 partitions). Add a bunch of rows to each partition (I have 80 total across all partitions). Perform a paged query using an 'IN' clause across all the partitions, where: limit page_size data size. I used something like: SELECT * FROM paging_test where id in (1,2,3,4,5,6) LIMIT 9; (with a page_size of 20 for the query). What I get returned is three pages of sizes: 9, 9, 8 -- 26 rows in total but I'm uncertain why. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231878#comment-14231878 ] Vijay edited comment on CASSANDRA-7438 at 12/2/14 8:31 PM: --- EDIT: Here is the explanation. run the benchmark with the following options (lruc benchmark). {code}java -Djava.library.path=/usr/local/lib/ -jar ~/lrucTest.jar -t 30 -s 6147483648 -c ohc{code} And you will see something like this (errors == not found from the cache even though you have all the items you need is in the cache). {code} Memory consumed: 3 GB / 5 GB or 427170 / 6147483648, size 4980, queued (LRU q size) 0 VM total:2 GB VM free:2 GB Get Operation (micros) time_taken, count, mean, median, 99thPercentile, 999thPercentile, error 4734724, 166, 2.42, 1.93, 8.58, 24.74, 166 4804375, 166, 2.40, 1.92, 4.56, 106.23, 166 4805858, 166, 2.45, 1.95, 3.94, 11.76, 166 4842886, 166, 2.40, 1.92, 7.46, 26.73, 166 {code} You really need test cases :) Anyways i am going to stop working on this ticket now, let me know if someone wants any other info. was (Author: vijay2...@yahoo.com): Never mind, my bad it was related the below (which needs to be more configurable instead) and the items where going missing earlier than i thought it should and looks you just evict the items per segment (If a segment is used more more items will disappear from that segment and the lest used segment items will remain). {code} // 12.5% if capacity less than 8GB // 10% if capacity less than 16 GB // 5% if capacity is higher than 16GB {code} Also noticed you don't have replace which Cassandra uses. Anyways i am going to stop working on this for now, let me know if someone wants any other info. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8383) Memtable flush may expire records from the commit log that are in a later memtable
[ https://issues.apache.org/jira/browse/CASSANDRA-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232130#comment-14232130 ] Ariel Weisberg commented on CASSANDRA-8383: --- Does this deserve a regression test? I almost wish ReplayPosition implemented method wrappers for GT, GTE, LT, LTE, rather then using compareTo. For me there is mental overhead in parsing that kind of condition. If I understand correctly if this race occurs and the writing thread loses it will be kicked forward to the next memtable despite the fact that the op group says it could go into the current memtable. So for a memtable to accept a write (either no barrier must exist || the barrier exists but is after the op group) if a last replay position is set it must be = the replay position of the write If it is not set the replay position will be updated by the writer so the flusher gets the position of the last write to the memtable correctly. If the replay position is finalized even though the op group says that the write could go into this memtable it is kicked into the next one which is harmless and op order still works since it chains dependencies in order. In effect the last replay position is frozen earlier so that when the second op group is created and starts interleaving in the CL anything beyond the frozen position is not considered for truncation after the memtable flushes. I think this does what I just said and I think that fixes the problem that is described where upon create of the next op group CL entries from different op groups interleave with the truncation point used for the CL. Freezing the truncation point before creating the second op group solves the problem. Memtable flush may expire records from the commit log that are in a later memtable -- Key: CASSANDRA-8383 URL: https://issues.apache.org/jira/browse/CASSANDRA-8383 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Priority: Critical Labels: commitlog Fix For: 2.1.3 This is a pretty obvious bug with any care of thought, so not sure how I managed to introduce it. We use OpOrder to ensure all writes to a memtable have finished before flushing, however we also use this OpOrder to direct writes to the correct memtable. However this is insufficient, since the OpOrder is only a partial order; an operation from the future (i.e. for the next memtable) could still interleave with the past operations in such a way that they grab a CL entry inbetween the past operations. Since we simply take the max ReplayPosition of those in the past, this would mean any interleaved future operations would be expired even though they haven't been persisted to disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8409) Node generating a huge number of tiny sstable_activity flushes
Fred Wulff created CASSANDRA-8409: - Summary: Node generating a huge number of tiny sstable_activity flushes Key: CASSANDRA-8409 URL: https://issues.apache.org/jira/browse/CASSANDRA-8409 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.0, Oracle JDK 1.8.0_25, Ubuntu 12.04 Reporter: Fred Wulff Attachments: system-sstable_activity-ka-67802-Data.db On one of my nodes, I’m seeing hundreds per second of “INFO 21:28:05 Enqueuing flush of sstable_activity: 0 (0%) on-heap, 33 (0%) off-heap”. tpstats shows a steadily climbing # of pending MemtableFlushWriter/MemtablePostFlush until the node OOMs. When the flushes actually happen the sstable written is invariably 121 bytes. I’m writing pretty aggressively to one of my user tables (sev.mdb_group_pit), but that table's flushing behavior seems reasonable. tpstats: {quote} frew@hostname:~/s_dist/apache-cassandra-2.1.0$ bin/nodetool -h sjd-rn1-10b tpstats Pool NameActive Pending Completed Blocked All time blocked MutationStage 128 4429 36810 0 0 ReadStage 0 0 1205 0 0 RequestResponseStage 0 0 24910 0 0 ReadRepairStage 0 0 26 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 9 0 0 GossipStage 0 0 5157 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor428429 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtableFlushWriter 8 38644 8987 0 0 MemtablePostFlush 1 38940 8735 0 0 MemtableReclaimMemory 0 0 8987 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 10457 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR208 {quote} I've attached one of the produced sstables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8285) OOME in Cassandra 2.0.11
[ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232205#comment-14232205 ] Jonathan Ellis commented on CASSANDRA-8285: --- Was this referring to a patch you meant to attach? OOME in Cassandra 2.0.11 Key: CASSANDRA-8285 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT Cassandra 2.0.11 + ruby-driver 1.0-beta Reporter: Pierre Laporte Assignee: Aleksey Yeschenko Attachments: OOME_node_system.log, gc-1416849312.log.gz, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png, system.log.gz We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating
[ https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232212#comment-14232212 ] Philip Thompson commented on CASSANDRA-7186: I was finally able to sort of reproduce this. I used Alexander's set up of 3 nodes each in 3 data centers. I wasn't able to reproduce the delay until I increased the load very significantly on the cluster, even then it took tens of seconds for the schemas to resolve rather than tens of minutes. I used 2.0's cassandra-stress, there was nothing special about Alexander's table schema that caused the issue. [~iamaleksey], thoughts on the issue? I'm told you are the 'schema problem expert'. alter table add column not always propogating - Key: CASSANDRA-7186 URL: https://issues.apache.org/jira/browse/CASSANDRA-7186 Project: Cassandra Issue Type: Bug Reporter: Martin Meyer Assignee: Philip Thompson Fix For: 2.0.12 I've been many times in Cassandra 2.0.6 that adding columns to existing tables seems to not fully propagate to our entire cluster. We add an extra column to various tables maybe 0-2 times a week, and so far many of these ALTERs have resulted in at least one node showing the old table description a pretty long time (~30 mins) after the original ALTER command was issued. We originally identified this issue when a connected clients would complain that a column it issued a SELECT for wasn't a known column, at which point we have to ask each node to describe the most recently altered table. One of them will not know about the newly added field. Issuing the original ALTER statement on that node makes everything work correctly. We have seen this issue on multiple tables (we don't always alter the same one). It has affected various nodes in the cluster (not always the same one is not getting the mutation propagated). No new nodes have been added to the cluster recently. All nodes are homogenous (hardware and software), running 2.0.6. We don't see any particular errors or exceptions on the node that didn't get the schema update, only the later error from a Java client about asking for an unknown column in a SELECT. We have to check each node manually to find the offender. The tables he have seen this on are under fairly heavy read and write load, but we haven't altered any tables that are not, so that might not be important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8409) Node generating a huge number of tiny sstable_activity flushes
[ https://issues.apache.org/jira/browse/CASSANDRA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fred Wulff updated CASSANDRA-8409: -- Description: On one of my nodes, I’m seeing hundreds per second of “INFO 21:28:05 Enqueuing flush of sstable_activity: 0 (0%) on-heap, 33 (0%) off-heap”. tpstats shows a steadily climbing # of pending MemtableFlushWriter/MemtablePostFlush until the node OOMs. When the flushes actually happen the sstable written is invariably 121 bytes. I’m writing pretty aggressively to one of my user tables (sev.mdb_group_pit), but that table's flushing behavior seems reasonable. tpstats: {quote} frew@hostname:~/s_dist/apache-cassandra-2.1.0$ bin/nodetool -h hostname tpstats Pool NameActive Pending Completed Blocked All time blocked MutationStage 128 4429 36810 0 0 ReadStage 0 0 1205 0 0 RequestResponseStage 0 0 24910 0 0 ReadRepairStage 0 0 26 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 9 0 0 GossipStage 0 0 5157 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor428429 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtableFlushWriter 8 38644 8987 0 0 MemtablePostFlush 1 38940 8735 0 0 MemtableReclaimMemory 0 0 8987 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 10457 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR208 {quote} I've attached one of the produced sstables. was: On one of my nodes, I’m seeing hundreds per second of “INFO 21:28:05 Enqueuing flush of sstable_activity: 0 (0%) on-heap, 33 (0%) off-heap”. tpstats shows a steadily climbing # of pending MemtableFlushWriter/MemtablePostFlush until the node OOMs. When the flushes actually happen the sstable written is invariably 121 bytes. I’m writing pretty aggressively to one of my user tables (sev.mdb_group_pit), but that table's flushing behavior seems reasonable. tpstats: {quote} frew@hostname:~/s_dist/apache-cassandra-2.1.0$ bin/nodetool -h sjd-rn1-10b tpstats Pool NameActive Pending Completed Blocked All time blocked MutationStage 128 4429 36810 0 0 ReadStage 0 0 1205 0 0 RequestResponseStage 0 0 24910 0 0 ReadRepairStage 0 0 26 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 9 0 0 GossipStage 0 0 5157 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor428429 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0
[3/6] cassandra git commit: cancel latency-sampling task when CF is dropped patch by jbellis; reviewed by ayeschenko for CASSANDRA-8401
cancel latency-sampling task when CF is dropped patch by jbellis; reviewed by ayeschenko for CASSANDRA-8401 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4030088e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4030088e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4030088e Branch: refs/heads/trunk Commit: 4030088ec1df44a666ce73306eae5daf664b25ab Parents: eb0424e Author: Jonathan Ellis jbel...@apache.org Authored: Tue Dec 2 16:04:47 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Tue Dec 2 16:04:47 2014 -0600 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4030088e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 3febed0..dc3896d 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.12: + * cancel latency-sampling task when CF is dropped (CASSANDRA-8401) * don't block SocketThread for MessagingService (CASSANDRA-8188) * Increase quarantine delay on replacement (CASSANDRA-8260) * Expose off-heap memory usage stats (CASSANDRA-7897) http://git-wip-us.apache.org/repos/asf/cassandra/blob/4030088e/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 06520ab..6cdf9e9 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -107,6 +107,7 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean public final ColumnFamilyMetrics metric; public volatile long sampleLatencyNanos; +private final ScheduledFuture? latencyCalculator; public void reload() { @@ -292,7 +293,7 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean throw new RuntimeException(e); } logger.debug(retryPolicy for {} is {}, name, this.metadata.getSpeculativeRetry()); -StorageService.optionalTasks.scheduleWithFixedDelay(new Runnable() +latencyCalculator = StorageService.optionalTasks.scheduleWithFixedDelay(new Runnable() { public void run() { @@ -331,8 +332,8 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean logger.warn(Failed unregistering mbean: + mbeanName, e); } +latencyCalculator.cancel(false); compactionStrategy.shutdown(); - SystemKeyspace.removeTruncationRecord(metadata.cfId); data.unreferenceSSTables(); indexManager.invalidate();
[2/6] cassandra git commit: cancel latency-sampling task when CF is dropped patch by jbellis; reviewed by ayeschenko for CASSANDRA-8401
cancel latency-sampling task when CF is dropped patch by jbellis; reviewed by ayeschenko for CASSANDRA-8401 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4030088e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4030088e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4030088e Branch: refs/heads/cassandra-2.1 Commit: 4030088ec1df44a666ce73306eae5daf664b25ab Parents: eb0424e Author: Jonathan Ellis jbel...@apache.org Authored: Tue Dec 2 16:04:47 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Tue Dec 2 16:04:47 2014 -0600 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4030088e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 3febed0..dc3896d 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.12: + * cancel latency-sampling task when CF is dropped (CASSANDRA-8401) * don't block SocketThread for MessagingService (CASSANDRA-8188) * Increase quarantine delay on replacement (CASSANDRA-8260) * Expose off-heap memory usage stats (CASSANDRA-7897) http://git-wip-us.apache.org/repos/asf/cassandra/blob/4030088e/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 06520ab..6cdf9e9 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -107,6 +107,7 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean public final ColumnFamilyMetrics metric; public volatile long sampleLatencyNanos; +private final ScheduledFuture? latencyCalculator; public void reload() { @@ -292,7 +293,7 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean throw new RuntimeException(e); } logger.debug(retryPolicy for {} is {}, name, this.metadata.getSpeculativeRetry()); -StorageService.optionalTasks.scheduleWithFixedDelay(new Runnable() +latencyCalculator = StorageService.optionalTasks.scheduleWithFixedDelay(new Runnable() { public void run() { @@ -331,8 +332,8 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean logger.warn(Failed unregistering mbean: + mbeanName, e); } +latencyCalculator.cancel(false); compactionStrategy.shutdown(); - SystemKeyspace.removeTruncationRecord(metadata.cfId); data.unreferenceSSTables(); indexManager.invalidate();
[1/6] cassandra git commit: cancel latency-sampling task when CF is dropped patch by jbellis; reviewed by ayeschenko for CASSANDRA-8401
Repository: cassandra Updated Branches: refs/heads/cassandra-2.0 eb0424ecd - 4030088ec refs/heads/cassandra-2.1 d15c9187a - 587657d37 refs/heads/trunk 65a7088e7 - 3916e4867 cancel latency-sampling task when CF is dropped patch by jbellis; reviewed by ayeschenko for CASSANDRA-8401 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4030088e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4030088e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4030088e Branch: refs/heads/cassandra-2.0 Commit: 4030088ec1df44a666ce73306eae5daf664b25ab Parents: eb0424e Author: Jonathan Ellis jbel...@apache.org Authored: Tue Dec 2 16:04:47 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Tue Dec 2 16:04:47 2014 -0600 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4030088e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 3febed0..dc3896d 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.12: + * cancel latency-sampling task when CF is dropped (CASSANDRA-8401) * don't block SocketThread for MessagingService (CASSANDRA-8188) * Increase quarantine delay on replacement (CASSANDRA-8260) * Expose off-heap memory usage stats (CASSANDRA-7897) http://git-wip-us.apache.org/repos/asf/cassandra/blob/4030088e/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 06520ab..6cdf9e9 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -107,6 +107,7 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean public final ColumnFamilyMetrics metric; public volatile long sampleLatencyNanos; +private final ScheduledFuture? latencyCalculator; public void reload() { @@ -292,7 +293,7 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean throw new RuntimeException(e); } logger.debug(retryPolicy for {} is {}, name, this.metadata.getSpeculativeRetry()); -StorageService.optionalTasks.scheduleWithFixedDelay(new Runnable() +latencyCalculator = StorageService.optionalTasks.scheduleWithFixedDelay(new Runnable() { public void run() { @@ -331,8 +332,8 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean logger.warn(Failed unregistering mbean: + mbeanName, e); } +latencyCalculator.cancel(false); compactionStrategy.shutdown(); - SystemKeyspace.removeTruncationRecord(metadata.cfId); data.unreferenceSSTables(); indexManager.invalidate();
[6/6] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3916e486 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3916e486 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3916e486 Branch: refs/heads/trunk Commit: 3916e48674c75af6f03f8ff762c8d6e00342dab8 Parents: 65a7088 587657d Author: Jonathan Ellis jbel...@apache.org Authored: Tue Dec 2 16:07:24 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Tue Dec 2 16:07:24 2014 -0600 -- CHANGES.txt | 2 ++ src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3916e486/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3916e486/src/java/org/apache/cassandra/db/ColumnFamilyStore.java --
[4/6] cassandra git commit: merge from 2.0
merge from 2.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/587657d3 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/587657d3 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/587657d3 Branch: refs/heads/trunk Commit: 587657d372c8d1ba71d03a4e6c0c765be915d3bc Parents: d15c918 4030088 Author: Jonathan Ellis jbel...@apache.org Authored: Tue Dec 2 16:07:18 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Tue Dec 2 16:07:18 2014 -0600 -- CHANGES.txt | 2 ++ src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/587657d3/CHANGES.txt -- diff --cc CHANGES.txt index 7df396d,dc3896d..041c1e1 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,21 -1,6 +1,23 @@@ -2.0.12: +2.1.3 + * Release sstable references after anticompaction (CASSANDRA-8386) + * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) + * Fix high size calculations for prepared statements (CASSANDRA-8231) + * Centralize shared executors (CASSANDRA-8055) + * Fix filtering for CONTAINS (KEY) relations on frozen collection + clustering columns when the query is restricted to a single + partition (CASSANDRA-8203) + * Do more aggressive entire-sstable TTL expiry checks (CASSANDRA-8243) + * Add more log info if readMeter is null (CASSANDRA-8238) + * add check of the system wall clock time at startup (CASSANDRA-8305) + * Support for frozen collections (CASSANDRA-7859) + * Fix overflow on histogram computation (CASSANDRA-8028) + * Have paxos reuse the timestamp generation of normal queries (CASSANDRA-7801) + * Fix incremental repair not remove parent session on remote (CASSANDRA-8291) + * Improve JBOD disk utilization (CASSANDRA-7386) + * Log failed host when preparing incremental repair (CASSANDRA-8228) +Merged from 2.0: + * cancel latency-sampling task when CF is dropped (CASSANDRA-8401) + * don't block SocketThread for MessagingService (CASSANDRA-8188) * Increase quarantine delay on replacement (CASSANDRA-8260) * Expose off-heap memory usage stats (CASSANDRA-7897) * Ignore Paxos commits for truncated tables (CASSANDRA-7538) http://git-wip-us.apache.org/repos/asf/cassandra/blob/587657d3/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --cc src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 1bd5f2a,6cdf9e9..0507973 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@@ -135,13 -107,8 +135,14 @@@ public class ColumnFamilyStore implemen public final ColumnFamilyMetrics metric; public volatile long sampleLatencyNanos; + private final ScheduledFuture? latencyCalculator; +public static void shutdownPostFlushExecutor() throws InterruptedException +{ +postFlushExecutor.shutdown(); +postFlushExecutor.awaitTermination(60, TimeUnit.SECONDS); +} + public void reload() { // metadata object has been mutated directly. make all the members jibe with new settings. @@@ -318,7 -293,7 +319,7 @@@ throw new RuntimeException(e); } logger.debug(retryPolicy for {} is {}, name, this.metadata.getSpeculativeRetry()); - ScheduledExecutors.optionalTasks.scheduleWithFixedDelay(new Runnable() -latencyCalculator = StorageService.optionalTasks.scheduleWithFixedDelay(new Runnable() ++latencyCalculator = ScheduledExecutors.optionalTasks.scheduleWithFixedDelay(new Runnable() { public void run() { @@@ -353,13 -328,12 +354,13 @@@ } catch (Exception e) { +JVMStabilityInspector.inspectThrowable(e); // this shouldn't block anything. -logger.warn(Failed unregistering mbean: + mbeanName, e); +logger.warn(Failed unregistering mbean: {}, mbeanName, e); } + latencyCalculator.cancel(false); -compactionStrategy.shutdown(); +compactionStrategyWrapper.shutdown(); - SystemKeyspace.removeTruncationRecord(metadata.cfId); data.unreferenceSSTables(); indexManager.invalidate();
[5/6] cassandra git commit: merge from 2.0
merge from 2.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/587657d3 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/587657d3 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/587657d3 Branch: refs/heads/cassandra-2.1 Commit: 587657d372c8d1ba71d03a4e6c0c765be915d3bc Parents: d15c918 4030088 Author: Jonathan Ellis jbel...@apache.org Authored: Tue Dec 2 16:07:18 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Tue Dec 2 16:07:18 2014 -0600 -- CHANGES.txt | 2 ++ src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/587657d3/CHANGES.txt -- diff --cc CHANGES.txt index 7df396d,dc3896d..041c1e1 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,21 -1,6 +1,23 @@@ -2.0.12: +2.1.3 + * Release sstable references after anticompaction (CASSANDRA-8386) + * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) + * Fix high size calculations for prepared statements (CASSANDRA-8231) + * Centralize shared executors (CASSANDRA-8055) + * Fix filtering for CONTAINS (KEY) relations on frozen collection + clustering columns when the query is restricted to a single + partition (CASSANDRA-8203) + * Do more aggressive entire-sstable TTL expiry checks (CASSANDRA-8243) + * Add more log info if readMeter is null (CASSANDRA-8238) + * add check of the system wall clock time at startup (CASSANDRA-8305) + * Support for frozen collections (CASSANDRA-7859) + * Fix overflow on histogram computation (CASSANDRA-8028) + * Have paxos reuse the timestamp generation of normal queries (CASSANDRA-7801) + * Fix incremental repair not remove parent session on remote (CASSANDRA-8291) + * Improve JBOD disk utilization (CASSANDRA-7386) + * Log failed host when preparing incremental repair (CASSANDRA-8228) +Merged from 2.0: + * cancel latency-sampling task when CF is dropped (CASSANDRA-8401) + * don't block SocketThread for MessagingService (CASSANDRA-8188) * Increase quarantine delay on replacement (CASSANDRA-8260) * Expose off-heap memory usage stats (CASSANDRA-7897) * Ignore Paxos commits for truncated tables (CASSANDRA-7538) http://git-wip-us.apache.org/repos/asf/cassandra/blob/587657d3/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --cc src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 1bd5f2a,6cdf9e9..0507973 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@@ -135,13 -107,8 +135,14 @@@ public class ColumnFamilyStore implemen public final ColumnFamilyMetrics metric; public volatile long sampleLatencyNanos; + private final ScheduledFuture? latencyCalculator; +public static void shutdownPostFlushExecutor() throws InterruptedException +{ +postFlushExecutor.shutdown(); +postFlushExecutor.awaitTermination(60, TimeUnit.SECONDS); +} + public void reload() { // metadata object has been mutated directly. make all the members jibe with new settings. @@@ -318,7 -293,7 +319,7 @@@ throw new RuntimeException(e); } logger.debug(retryPolicy for {} is {}, name, this.metadata.getSpeculativeRetry()); - ScheduledExecutors.optionalTasks.scheduleWithFixedDelay(new Runnable() -latencyCalculator = StorageService.optionalTasks.scheduleWithFixedDelay(new Runnable() ++latencyCalculator = ScheduledExecutors.optionalTasks.scheduleWithFixedDelay(new Runnable() { public void run() { @@@ -353,13 -328,12 +354,13 @@@ } catch (Exception e) { +JVMStabilityInspector.inspectThrowable(e); // this shouldn't block anything. -logger.warn(Failed unregistering mbean: + mbeanName, e); +logger.warn(Failed unregistering mbean: {}, mbeanName, e); } + latencyCalculator.cancel(false); -compactionStrategy.shutdown(); +compactionStrategyWrapper.shutdown(); - SystemKeyspace.removeTruncationRecord(metadata.cfId); data.unreferenceSSTables(); indexManager.invalidate();
[jira] [Updated] (CASSANDRA-7886) Coordinator should not wait for read timeouts when replicas hit Exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-7886: --- Summary: Coordinator should not wait for read timeouts when replicas hit Exceptions (was: TombstoneOverwhelmingException should not wait for timeout) Coordinator should not wait for read timeouts when replicas hit Exceptions -- Key: CASSANDRA-7886 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 Project: Cassandra Issue Type: Improvement Components: Core Environment: Tested with Cassandra 2.0.8 Reporter: Christian Spriegel Assignee: Christian Spriegel Priority: Minor Fix For: 3.0 Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt *Issue* When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the query to be simply dropped on every data-node, but no response is sent back to the coordinator. Instead the coordinator waits for the specified read_request_timeout_in_ms. On the application side this can cause memory issues, since the application is waiting for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions, then (sooner or later) our entire application cluster goes down :-( *Proposed solution* I think the data nodes should send a error message to the coordinator when they run into a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7886) Coordinator should not wait for read timeouts when replicas hit Exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-7886: --- Labels: protocolv4 (was: ) Coordinator should not wait for read timeouts when replicas hit Exceptions -- Key: CASSANDRA-7886 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 Project: Cassandra Issue Type: Improvement Components: Core Environment: Tested with Cassandra 2.0.8 Reporter: Christian Spriegel Assignee: Christian Spriegel Priority: Minor Labels: protocolv4 Fix For: 3.0 Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt *Issue* When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the query to be simply dropped on every data-node, but no response is sent back to the coordinator. Instead the coordinator waits for the specified read_request_timeout_in_ms. On the application side this can cause memory issues, since the application is waiting for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions, then (sooner or later) our entire application cluster goes down :-( *Proposed solution* I think the data nodes should send a error message to the coordinator when they run into a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8043) Native Protocol V4
[ https://issues.apache.org/jira/browse/CASSANDRA-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-8043: --- Labels: protocolv4 (was: ) Native Protocol V4 -- Key: CASSANDRA-8043 URL: https://issues.apache.org/jira/browse/CASSANDRA-8043 Project: Cassandra Issue Type: Task Reporter: Sylvain Lebresne Labels: protocolv4 Fix For: 3.0 We have a bunch of issues that will require a protocol v4, this ticket is just a meta ticket to group them all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8409) Node generating a huge number of tiny sstable_activity flushes
[ https://issues.apache.org/jira/browse/CASSANDRA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8409: --- Reproduced In: 2.1.0 Fix Version/s: 2.1.3 Node generating a huge number of tiny sstable_activity flushes -- Key: CASSANDRA-8409 URL: https://issues.apache.org/jira/browse/CASSANDRA-8409 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.0, Oracle JDK 1.8.0_25, Ubuntu 12.04 Reporter: Fred Wulff Fix For: 2.1.3 Attachments: system-sstable_activity-ka-67802-Data.db On one of my nodes, I’m seeing hundreds per second of “INFO 21:28:05 Enqueuing flush of sstable_activity: 0 (0%) on-heap, 33 (0%) off-heap”. tpstats shows a steadily climbing # of pending MemtableFlushWriter/MemtablePostFlush until the node OOMs. When the flushes actually happen the sstable written is invariably 121 bytes. I’m writing pretty aggressively to one of my user tables (sev.mdb_group_pit), but that table's flushing behavior seems reasonable. tpstats: {quote} frew@hostname:~/s_dist/apache-cassandra-2.1.0$ bin/nodetool -h hostname tpstats Pool NameActive Pending Completed Blocked All time blocked MutationStage 128 4429 36810 0 0 ReadStage 0 0 1205 0 0 RequestResponseStage 0 0 24910 0 0 ReadRepairStage 0 0 26 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 9 0 0 GossipStage 0 0 5157 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor428429 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtableFlushWriter 8 38644 8987 0 0 MemtablePostFlush 1 38940 8735 0 0 MemtableReclaimMemory 0 0 8987 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 10457 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR208 {quote} I've attached one of the produced sstables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)