[jira] [Created] (CASSANDRA-9668) RepairException when trying to run concurrent repair -pr
david created CASSANDRA-9668: Summary: RepairException when trying to run concurrent repair -pr Key: CASSANDRA-9668 URL: https://issues.apache.org/jira/browse/CASSANDRA-9668 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.7 Reporter: david Priority: Critical Fix For: 2.1.x Was on 2.1.3 having very similar issues to those described in: https://issues.apache.org/jira/browse/CASSANDRA-9266 I updated to 2.1.7, more for some other fixes, but now if I try and run concurrent repairs (different boxes) consistently get: {noformat} ERROR [Thread-14156] 2015-06-28 09:33:12,616 StorageService.java:2959 - Repair session b1e67660-1d78-11e5-aec7-4f05493cbe02 for range (-4660677346721084182,-4658765298409301171] failed with error org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_40] at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_40] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_40] ... 1 common frames omitted Caused by: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.7.jar:2.1.7] ... 3 common frames omitted {noformat} The specific repair command being issued: {noformat} nodetool repair -local -pr -inc -par -- keyspace {noformat} It's a 15 box environment with a replication factor of 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[5/6] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2
Merge branch 'cassandra-2.1' into cassandra-2.2 Conflicts: build.xml Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/02a7c342 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/02a7c342 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/02a7c342 Branch: refs/heads/cassandra-2.2 Commit: 02a7c342922a209ac7374f2f425c783a5faf8538 Parents: 14d7a63 bd4a9d1 Author: Benedict Elliott Smith bened...@apache.org Authored: Sun Jun 28 11:39:53 2015 +0100 Committer: Benedict Elliott Smith bened...@apache.org Committed: Sun Jun 28 11:39:53 2015 +0100 -- --
[3/6] cassandra git commit: backport burn test refactor
backport burn test refactor Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bd4a9d18 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bd4a9d18 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bd4a9d18 Branch: refs/heads/trunk Commit: bd4a9d18e1317dcb8542bd4adc5a9f99b108d6c6 Parents: 8a56868 Author: Benedict Elliott Smith bened...@apache.org Authored: Sun Jun 28 11:38:22 2015 +0100 Committer: Benedict Elliott Smith bened...@apache.org Committed: Sun Jun 28 11:38:22 2015 +0100 -- build.xml | 7 + .../cassandra/concurrent/LongOpOrderTest.java | 240 + .../concurrent/LongSharedExecutorPoolTest.java | 226 + .../apache/cassandra/utils/LongBTreeTest.java | 502 +++ .../cassandra/concurrent/LongOpOrderTest.java | 240 - .../concurrent/LongSharedExecutorPoolTest.java | 228 - .../apache/cassandra/utils/LongBTreeTest.java | 401 --- 7 files changed, 975 insertions(+), 869 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd4a9d18/build.xml -- diff --git a/build.xml b/build.xml index 73e76e5..18ad49f 100644 --- a/build.xml +++ b/build.xml @@ -93,6 +93,7 @@ property name=test.timeout value=6 / property name=test.long.timeout value=60 / +property name=test.burn.timeout value=60 / !-- default for cql tests. Can be override by -Dcassandra.test.use_prepared=false -- property name=cassandra.test.use_prepared value=true / @@ -1258,6 +1259,12 @@ /testmacro /target + target name=test-burn depends=build-test description=Execute functional tests +testmacro suitename=burn inputdir=${test.burn.src} + timeout=${test.burn.timeout} +/testmacro + /target + target name=long-test depends=build-test description=Execute functional tests testmacro suitename=long inputdir=${test.long.src} timeout=${test.long.timeout} http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd4a9d18/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java -- diff --git a/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java b/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java new file mode 100644 index 000..d7105df --- /dev/null +++ b/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java @@ -0,0 +1,240 @@ +package org.apache.cassandra.concurrent; +/* + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * + */ + + +import java.util.Map; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ThreadLocalRandom; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +import org.cliffc.high_scale_lib.NonBlockingHashMap; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.cassandra.utils.concurrent.OpOrder; + +import static org.junit.Assert.assertTrue; + +// TODO: we don't currently test SAFE functionality at all! +// TODO: should also test markBlocking and SyncOrdered +public class LongOpOrderTest +{ + +private static final Logger logger = LoggerFactory.getLogger(LongOpOrderTest.class); + +static final int CONSUMERS = 4; +static final int PRODUCERS = 32; + +static final long RUNTIME = TimeUnit.MINUTES.toMillis(5); +static final long REPORT_INTERVAL = TimeUnit.MINUTES.toMillis(1); + +static final Thread.UncaughtExceptionHandler handler = new Thread.UncaughtExceptionHandler() +{ +@Override +public void uncaughtException(Thread t, Throwable e) +{ +System.err.println(t.getName() + : + e.getMessage()); +e.printStackTrace(); +} +}; + +final OpOrder order = new OpOrder(); +
[1/6] cassandra git commit: backport burn test refactor
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 8a56868bc - bd4a9d18e refs/heads/cassandra-2.2 14d7a63b8 - 02a7c3429 refs/heads/trunk 6739434c6 - 3671082b0 backport burn test refactor Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bd4a9d18 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bd4a9d18 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bd4a9d18 Branch: refs/heads/cassandra-2.1 Commit: bd4a9d18e1317dcb8542bd4adc5a9f99b108d6c6 Parents: 8a56868 Author: Benedict Elliott Smith bened...@apache.org Authored: Sun Jun 28 11:38:22 2015 +0100 Committer: Benedict Elliott Smith bened...@apache.org Committed: Sun Jun 28 11:38:22 2015 +0100 -- build.xml | 7 + .../cassandra/concurrent/LongOpOrderTest.java | 240 + .../concurrent/LongSharedExecutorPoolTest.java | 226 + .../apache/cassandra/utils/LongBTreeTest.java | 502 +++ .../cassandra/concurrent/LongOpOrderTest.java | 240 - .../concurrent/LongSharedExecutorPoolTest.java | 228 - .../apache/cassandra/utils/LongBTreeTest.java | 401 --- 7 files changed, 975 insertions(+), 869 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd4a9d18/build.xml -- diff --git a/build.xml b/build.xml index 73e76e5..18ad49f 100644 --- a/build.xml +++ b/build.xml @@ -93,6 +93,7 @@ property name=test.timeout value=6 / property name=test.long.timeout value=60 / +property name=test.burn.timeout value=60 / !-- default for cql tests. Can be override by -Dcassandra.test.use_prepared=false -- property name=cassandra.test.use_prepared value=true / @@ -1258,6 +1259,12 @@ /testmacro /target + target name=test-burn depends=build-test description=Execute functional tests +testmacro suitename=burn inputdir=${test.burn.src} + timeout=${test.burn.timeout} +/testmacro + /target + target name=long-test depends=build-test description=Execute functional tests testmacro suitename=long inputdir=${test.long.src} timeout=${test.long.timeout} http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd4a9d18/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java -- diff --git a/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java b/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java new file mode 100644 index 000..d7105df --- /dev/null +++ b/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java @@ -0,0 +1,240 @@ +package org.apache.cassandra.concurrent; +/* + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * + */ + + +import java.util.Map; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ThreadLocalRandom; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +import org.cliffc.high_scale_lib.NonBlockingHashMap; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.cassandra.utils.concurrent.OpOrder; + +import static org.junit.Assert.assertTrue; + +// TODO: we don't currently test SAFE functionality at all! +// TODO: should also test markBlocking and SyncOrdered +public class LongOpOrderTest +{ + +private static final Logger logger = LoggerFactory.getLogger(LongOpOrderTest.class); + +static final int CONSUMERS = 4; +static final int PRODUCERS = 32; + +static final long RUNTIME = TimeUnit.MINUTES.toMillis(5); +static final long REPORT_INTERVAL = TimeUnit.MINUTES.toMillis(1); + +static final Thread.UncaughtExceptionHandler handler = new Thread.UncaughtExceptionHandler() +{ +@Override +public void uncaughtException(Thread t, Throwable
[2/6] cassandra git commit: backport burn test refactor
backport burn test refactor Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bd4a9d18 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bd4a9d18 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bd4a9d18 Branch: refs/heads/cassandra-2.2 Commit: bd4a9d18e1317dcb8542bd4adc5a9f99b108d6c6 Parents: 8a56868 Author: Benedict Elliott Smith bened...@apache.org Authored: Sun Jun 28 11:38:22 2015 +0100 Committer: Benedict Elliott Smith bened...@apache.org Committed: Sun Jun 28 11:38:22 2015 +0100 -- build.xml | 7 + .../cassandra/concurrent/LongOpOrderTest.java | 240 + .../concurrent/LongSharedExecutorPoolTest.java | 226 + .../apache/cassandra/utils/LongBTreeTest.java | 502 +++ .../cassandra/concurrent/LongOpOrderTest.java | 240 - .../concurrent/LongSharedExecutorPoolTest.java | 228 - .../apache/cassandra/utils/LongBTreeTest.java | 401 --- 7 files changed, 975 insertions(+), 869 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd4a9d18/build.xml -- diff --git a/build.xml b/build.xml index 73e76e5..18ad49f 100644 --- a/build.xml +++ b/build.xml @@ -93,6 +93,7 @@ property name=test.timeout value=6 / property name=test.long.timeout value=60 / +property name=test.burn.timeout value=60 / !-- default for cql tests. Can be override by -Dcassandra.test.use_prepared=false -- property name=cassandra.test.use_prepared value=true / @@ -1258,6 +1259,12 @@ /testmacro /target + target name=test-burn depends=build-test description=Execute functional tests +testmacro suitename=burn inputdir=${test.burn.src} + timeout=${test.burn.timeout} +/testmacro + /target + target name=long-test depends=build-test description=Execute functional tests testmacro suitename=long inputdir=${test.long.src} timeout=${test.long.timeout} http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd4a9d18/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java -- diff --git a/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java b/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java new file mode 100644 index 000..d7105df --- /dev/null +++ b/test/burn/org/apache/cassandra/concurrent/LongOpOrderTest.java @@ -0,0 +1,240 @@ +package org.apache.cassandra.concurrent; +/* + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * + */ + + +import java.util.Map; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ThreadLocalRandom; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +import org.cliffc.high_scale_lib.NonBlockingHashMap; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.cassandra.utils.concurrent.OpOrder; + +import static org.junit.Assert.assertTrue; + +// TODO: we don't currently test SAFE functionality at all! +// TODO: should also test markBlocking and SyncOrdered +public class LongOpOrderTest +{ + +private static final Logger logger = LoggerFactory.getLogger(LongOpOrderTest.class); + +static final int CONSUMERS = 4; +static final int PRODUCERS = 32; + +static final long RUNTIME = TimeUnit.MINUTES.toMillis(5); +static final long REPORT_INTERVAL = TimeUnit.MINUTES.toMillis(1); + +static final Thread.UncaughtExceptionHandler handler = new Thread.UncaughtExceptionHandler() +{ +@Override +public void uncaughtException(Thread t, Throwable e) +{ +System.err.println(t.getName() + : + e.getMessage()); +e.printStackTrace(); +} +}; + +final OpOrder order = new OpOrder(); +
[4/6] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2
Merge branch 'cassandra-2.1' into cassandra-2.2 Conflicts: build.xml Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/02a7c342 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/02a7c342 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/02a7c342 Branch: refs/heads/trunk Commit: 02a7c342922a209ac7374f2f425c783a5faf8538 Parents: 14d7a63 bd4a9d1 Author: Benedict Elliott Smith bened...@apache.org Authored: Sun Jun 28 11:39:53 2015 +0100 Committer: Benedict Elliott Smith bened...@apache.org Committed: Sun Jun 28 11:39:53 2015 +0100 -- --
[6/6] cassandra git commit: Merge branch 'cassandra-2.2' into trunk
Merge branch 'cassandra-2.2' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3671082b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3671082b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3671082b Branch: refs/heads/trunk Commit: 3671082b037c05979740c9bc5a4ee3a4a4425bf7 Parents: 6739434 02a7c34 Author: Benedict Elliott Smith bened...@apache.org Authored: Sun Jun 28 11:40:00 2015 +0100 Committer: Benedict Elliott Smith bened...@apache.org Committed: Sun Jun 28 11:40:00 2015 +0100 -- --
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604662#comment-14604662 ] Benedict commented on CASSANDRA-9318: - I'm pretty sure I've made clear a few times that I'm proposing load shedding based on _both_ resource consumption and timeout. i.e. if we are running out of resources, we hint, if we completely run out of resources, we shed. In this case, shedding is _never_ incapable of keeping us in a happy place, and ensures we absolutely prevent any spam bringing down the server. I think we need to really separate the two concerns, as we seem to be jumping between them: keeping the server alive is best done through shedding; helping users with bulk loaders is best served by pausing single clients that are exceeding our rate of consumption. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine
[ https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604676#comment-14604676 ] Sylvain Lebresne commented on CASSANDRA-8099: - I've started it and plan on focusing on it more exclusively this week. I'll add that I'm quite keen on finishing giving it this first short myself. Refactor and modernize the storage engine - Key: CASSANDRA-8099 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 3.0 beta 1 Attachments: 8099-nit The current storage engine (which for this ticket I'll loosely define as the code implementing the read/write path) is suffering from old age. One of the main problem is that the only structure it deals with is the cell, which completely ignores the more high level CQL structure that groups cell into (CQL) rows. This leads to many inefficiencies, like the fact that during a reads we have to group cells multiple times (to count on replica, then to count on the coordinator, then to produce the CQL resultset) because we forget about the grouping right away each time (so lots of useless cell names comparisons in particular). But outside inefficiencies, having to manually recreate the CQL structure every time we need it for something is hindering new features and makes the code more complex that it should be. Said storage engine also has tons of technical debt. To pick an example, the fact that during range queries we update {{SliceQueryFilter.count}} is pretty hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to go into to simply remove the last query result. So I want to bite the bullet and modernize this storage engine. I propose to do 2 main things: # Make the storage engine more aware of the CQL structure. In practice, instead of having partitions be a simple iterable map of cells, it should be an iterable list of row (each being itself composed of per-column cells, though obviously not exactly the same kind of cell we have today). # Make the engine more iterative. What I mean here is that in the read path, we end up reading all cells in memory (we put them in a ColumnFamily object), but there is really no reason to. If instead we were working with iterators all the way through, we could get to a point where we're basically transferring data from disk to the network, and we should be able to reduce GC substantially. Please note that such refactor should provide some performance improvements right off the bat but it's not it's primary goal either. It's primary goal is to simplify the storage engine and adds abstraction that are better suited to further optimizations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records
Benedict created CASSANDRA-9669: --- Summary: If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records Key: CASSANDRA-9669 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Priority: Critical While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, on restart we simply take the maximum replay position of any sstable on disk, and ignore anything prior. It is quite possible for there to be two flushes triggered for a given table, and for the second to finish first by virtue of containing a much smaller quantity of live data (or perhaps the disk is just under less pressure). If we crash before the first sstable has been written, then on restart the data it would have represented will disappear, since we will not replay the CL records. This looks to be a bug present since time immemorial, and also seems pretty serious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8099) Refactor and modernize the storage engine
[ https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604381#comment-14604381 ] Benedict edited comment on CASSANDRA-8099 at 6/28/15 10:46 AM: --- [~slebresne]: what's the state of play with the refactor work? Is it being done in the near future? Trying to figure out if/when I should start making pull requests for the new memtable hierarchy. (If it isn't in progress, I'll see about starting the refactor myself and having you vet it instead) was (Author: benedict): [~slebresne]: what's the state of play with the refactor work? Is it being done in the near future? Trying to figure out if/when I should start making pull requests for the new memtable hierarchy. Refactor and modernize the storage engine - Key: CASSANDRA-8099 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 3.0 beta 1 Attachments: 8099-nit The current storage engine (which for this ticket I'll loosely define as the code implementing the read/write path) is suffering from old age. One of the main problem is that the only structure it deals with is the cell, which completely ignores the more high level CQL structure that groups cell into (CQL) rows. This leads to many inefficiencies, like the fact that during a reads we have to group cells multiple times (to count on replica, then to count on the coordinator, then to produce the CQL resultset) because we forget about the grouping right away each time (so lots of useless cell names comparisons in particular). But outside inefficiencies, having to manually recreate the CQL structure every time we need it for something is hindering new features and makes the code more complex that it should be. Said storage engine also has tons of technical debt. To pick an example, the fact that during range queries we update {{SliceQueryFilter.count}} is pretty hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to go into to simply remove the last query result. So I want to bite the bullet and modernize this storage engine. I propose to do 2 main things: # Make the storage engine more aware of the CQL structure. In practice, instead of having partitions be a simple iterable map of cells, it should be an iterable list of row (each being itself composed of per-column cells, though obviously not exactly the same kind of cell we have today). # Make the engine more iterative. What I mean here is that in the read path, we end up reading all cells in memory (we put them in a ColumnFamily object), but there is really no reason to. If instead we were working with iterators all the way through, we could get to a point where we're basically transferring data from disk to the network, and we should be able to reduce GC substantially. Please note that such refactor should provide some performance improvements right off the bat but it's not it's primary goal either. It's primary goal is to simplify the storage engine and adds abstraction that are better suited to further optimizations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604649#comment-14604649 ] Jonathan Ellis edited comment on CASSANDRA-9318 at 6/28/15 12:08 PM: - Here's where I've ended up: # Continuing to accept writes faster than a coordinator can deliver them to replicas is bad. Even perfect load shedding is worse from a client perspective than throttling, since if we load shed and time out the client needs to try to guess the right rate to retry at. # For the same reason, accepting a write but then refusing it with UnavailableException is worse than waiting to accept the write until we have capacity for it. # It's more important to throttle writes because while we can get in trouble with large reads too (a small request turns into a big reply), in practice reads are naturally throttled because a client needs to wait for the read before taking action on it. With writes on the other hand a new user's first inclination is to see how fast s/he can bulk load stuff. In practice, I see load shedding and throttling as complementary. Replicas can continue to rely on load shedding. Perhaps we can attempt distributed back pressure later (if every replica is overloaded, we should again throttle clients) but for now let's narrow our scope to throttling clients to the capacity of a coordinator to send out. *I propose we define a limit on the amount of memory MessagingService can consume and pause reading additional requests whenever that limit is hit.* Note that: # If MS's load is distributed evenly across all destinations then this is trivially the right thing to do. # If MS's load is caused by a single replica falling over or unable to keep up, this is still the right thing to do because the alternative is worse. MS will load shed timed out requests, but if clients are sending more requests to a single replica than we can shed (if rate * timeout capacity) then we still need to throttle or we will exhaust the heap and fall over. (The hint-based UnavailableException tries to help with scenario 2, and I will open a ticket to test how well that actually works. But the hint threshold cannot help with scenario 1 at all and that is the hole this ticket needs to plug.) was (Author: jbellis): Here's where I've ended up: # Continuing to accept writes faster than a coordinator can deliver them to replicas is bad. Even perfect load shedding is worse from a client perspective than throttling, since if we load shed and time out the client needs to try to guess the right rate to retry at. # For the same reason, accepting a write but then refusing it with UnavailableException is worse than waiting to accept the write until we have capacity for it. # It's more important to throttle writes because while we can get in trouble with large reads too (a small request turns into a big reply), in practice reads are naturally throttled because a client needs to wait for the read before taking action on it. With writes on the other hand a new user's first inclination is to see how fast s/he can bulk load stuff. In practice, I see load shedding and throttling as complementary. Replicas can continue to rely on load shedding. Perhaps we can attempt distributed back pressure later (if every replica is overloaded, we should again throttle clients) but for now let's narrow our scope to throttling clients to the capacity of a coordinator to send out. I propose we define a limit on the amount of memory MessagingService can consume and pause reading additional requests whenever that limit is hit. Note that: # If MS's load is distributed evenly across all destinations then this is trivially the right thing to do. # If MS's load is caused by a single replica falling over or unable to keep up, this is still the right thing to do because the alternative is worse. MS will load shed timed out requests, but if clients are sending more requests to a single replica than we can shed (if rate * timeout capacity) then we still need to throttle or we will exhaust the heap and fall over. (The hint-based UnavailableException tries to help with scenario 2, and I will open a ticket to test how well that actually works. But the hint threshold cannot help with scenario 1 at all and that is the hole this ticket needs to plug.) Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604649#comment-14604649 ] Jonathan Ellis commented on CASSANDRA-9318: --- Here's where I've ended up: # Continuing to accept writes faster than a coordinator can deliver them to replicas is bad. Even perfect load shedding is worse from a client perspective than throttling, since if we load shed and time out the client needs to try to guess the right rate to retry at. # For the same reason, accepting a write but then refusing it with UnavailableException is worse than waiting to accept the write until we have capacity for it. # It's more important to throttle writes because while we can get in trouble with large reads too (a small request turns into a big reply), in practice reads are naturally throttled because a client needs to wait for the read before taking action on it. With writes on the other hand a new user's first inclination is to see how fast s/he can bulk load stuff. In practice, I see load shedding and throttling as complementary. Replicas can continue to rely on load shedding. Perhaps we can attempt distributed back pressure later (if every replica is overloaded, we should again throttle clients) but for now let's narrow our scope to throttling clients to the capacity of a coordinator to send out. I propose we define a limit on the amount of memory MessagingService can consume and pause reading additional requests whenever that limit is hit. Note that: # If MS's load is distributed evenly across all destinations then this is trivially the right thing to do. # If MS's load is caused by a single replica falling over or unable to keep up, this is still the right thing to do because the alternative is worse. MS will load shed timed out requests, but if clients are sending more requests to a single replica than we can shed (if rate * timeout capacity) then we still need to throttle or we will exhaust the heap and fall over. (The hint-based UnavailableException tries to help with scenario 2, and I will open a ticket to test how well that actually works. But the hint threshold cannot help with scenario 1 at all and that is the hole this ticket needs to plug.) Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604656#comment-14604656 ] Benedict commented on CASSANDRA-9318: - bq. If MS's load is caused by a single replica falling over or unable to keep up, this is still the right thing to do... or we will exhaust the heap and fall over. How does load shedding (or immediately hinting) not prevent this scenario? The proposal you're making appreciably harms our availability guarantees. Load shedding and/or hinting does not, and it fulfils this most important criterion. If we pause accepting requests _from a single client_ once that client is using in excess of a lower watermark (based on some fair share of available memory in the MS), and _only that client_ is affected, I think that is an acceptably constrained loss of availability. Enforcing it globally seems to me to far too significantly harm our most central USP. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-9528) Improve log output from unit tests
[ https://issues.apache.org/jira/browse/CASSANDRA-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp reopened CASSANDRA-9528: - Reopening just to ensure that the proper fix to get the complete artifacts doesn't get lost Improve log output from unit tests -- Key: CASSANDRA-9528 URL: https://issues.apache.org/jira/browse/CASSANDRA-9528 Project: Cassandra Issue Type: Test Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 beta 1 * Single log output file per suite * stdout/stderr to the same log file with proper interleaving * Don't interleave interactive output from unit tests run concurrently to the console. Print everything about the test once the test has completed. * Fetch and compress log files as part of artifacts collected by cassci -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[4/4] cassandra git commit: Merge branch 'cassandra-2.2' into trunk
Merge branch 'cassandra-2.2' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6739434c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6739434c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6739434c Branch: refs/heads/trunk Commit: 6739434c6ffa7a501989fb5f787d408e51d0a135 Parents: 9e28938 14d7a63 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:37:00 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:37:00 2015 +0200 -- CHANGES.txt | 3 +++ .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java| 2 +- .../cassandra/service/AbstractReadExecutor.java | 12 .../apache/cassandra/service/ReadCallback.java | 20 ++-- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 37 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/6739434c/CHANGES.txt --
[3/3] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2
Merge branch 'cassandra-2.1' into cassandra-2.2 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/14d7a63b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/14d7a63b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/14d7a63b Branch: refs/heads/cassandra-2.2 Commit: 14d7a63b8a29b15831d035182d12cfacc7518687 Parents: 2a4ab87 8a56868 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:36:54 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:36:54 2015 +0200 -- CHANGES.txt | 3 +++ .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java| 2 +- .../cassandra/service/AbstractReadExecutor.java | 12 .../apache/cassandra/service/ReadCallback.java | 20 ++-- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 37 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/14d7a63b/CHANGES.txt -- diff --cc CHANGES.txt index 811e955,3e4fd36..8f4f752 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,30 -1,14 +1,33 @@@ -2.1.8 +2.2 + * Allow JMX over SSL directly from nodetool (CASSANDRA-9090) + * Update cqlsh for UDFs (CASSANDRA-7556) + * Change Windows kernel default timer resolution (CASSANDRA-9634) + * Deprected sstable2json and json2sstable (CASSANDRA-9618) + * Allow native functions in user-defined aggregates (CASSANDRA-9542) + * Don't repair system_distributed by default (CASSANDRA-9621) + * Fix mixing min, max, and count aggregates for blob type (CASSANRA-9622) + * Rename class for DATE type in Java driver (CASSANDRA-9563) + * Duplicate compilation of UDFs on coordinator (CASSANDRA-9475) + * Fix connection leak in CqlRecordWriter (CASSANDRA-9576) + * Mlockall before opening system sstables remove boot_without_jna option (CASSANDRA-9573) + * Add functions to convert timeuuid to date or time, deprecate dateOf and unixTimestampOf (CASSANDRA-9229) + * Make sure we cancel non-compacting sstables from LifecycleTransaction (CASSANDRA-9566) + * Fix deprecated repair JMX API (CASSANDRA-9570) + * Add logback metrics (CASSANDRA-9378) + * Update and refactor ant test/test-compression to run the tests in parallel (CASSANDRA-9583) +Merged from 2.1: * Fix IndexOutOfBoundsException when inserting tuple with too many elements using the string literal notation (CASSANDRA-9559) - * Allow JMX over SSL directly from nodetool (CASSANDRA-9090) - * Fix incorrect result for IN queries where column not found (CASSANDRA-9540) * Enable describe on indices (CASSANDRA-7814) + * Fix incorrect result for IN queries where column not found (CASSANDRA-9540) * ColumnFamilyStore.selectAndReference may block during compaction (CASSANDRA-9637) + * Fix bug in cardinality check when compacting (CASSANDRA-9580) + * Fix memory leak in Ref due to ConcurrentLinkedQueue.remove() behaviour (CASSANDRA-9549) + * Make rebuild only run one at a time (CASSANDRA-9119) Merged from 2.0 + * Improve trace messages for RR (CASSANDRA-9479) + * Fix suboptimal secondary index selection when restricted +clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) * Fix error message when attempting to create an index on a column in a COMPACT STORAGE table with clustering columns (CASSANDRA-9527) http://git-wip-us.apache.org/repos/asf/cassandra/blob/14d7a63b/src/java/org/apache/cassandra/net/MessagingService.java -- diff --cc src/java/org/apache/cassandra/net/MessagingService.java index 293a27c,1820c5c..83bc337 --- a/src/java/org/apache/cassandra/net/MessagingService.java +++ b/src/java/org/apache/cassandra/net/MessagingService.java @@@ -745,12 -728,15 +745,12 @@@ public final class MessagingService imp { TraceState state = Tracing.instance.initializeFromMessage(message); if (state != null) - state.trace(Message received from {}, message.from); + state.trace({} message received from {}, message.verb, message.from); -Verb verb = message.verb; -message = SinkManager.processInboundMessage(message, id); -if (message == null) -{ -incrementRejectedMessages(verb); -return; -} +// message sinks are a testing hook +for (IMessageSink ms : messageSinks) +if (!ms.allowIncomingMessage(message, id)) +return; Runnable runnable = new MessageDeliveryTask(message, id, timestamp); TracingAwareExecutorService stage =
[1/3] cassandra git commit: Improve trace messages for RR
Repository: cassandra Updated Branches: refs/heads/cassandra-2.2 2a4ab8716 - 14d7a63b8 Improve trace messages for RR patch by Robert Stupp; reviewed by Jason Brown for CASSANDRA-9479 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/353d4a05 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/353d4a05 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/353d4a05 Branch: refs/heads/cassandra-2.2 Commit: 353d4a052c866cb230e06e69e99d9c5c8c8d955c Parents: f2db756 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:24:34 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:24:34 2015 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java | 2 +- .../cassandra/service/AbstractReadExecutor.java | 17 + .../apache/cassandra/service/ReadCallback.java | 19 ++- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 40 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 32f0873..6a137a3 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.17 + * Improve trace messages for RR (CASSANDRA-9479) * Fix suboptimal secondary index selection when restricted clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/net/MessagingService.java -- diff --git a/src/java/org/apache/cassandra/net/MessagingService.java b/src/java/org/apache/cassandra/net/MessagingService.java index d570faf..ee6b87b 100644 --- a/src/java/org/apache/cassandra/net/MessagingService.java +++ b/src/java/org/apache/cassandra/net/MessagingService.java @@ -722,7 +722,7 @@ public final class MessagingService implements MessagingServiceMBean { TraceState state = Tracing.instance.initializeFromMessage(message); if (state != null) -state.trace(Message received from {}, message.from); +state.trace({} message received from {}, message.verb, message.from); Verb verb = message.verb; message = SinkManager.processInboundMessage(message, id); http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/net/OutboundTcpConnection.java -- diff --git a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java index 5559df2..af61dd4 100644 --- a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java +++ b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java @@ -186,7 +186,7 @@ public class OutboundTcpConnection extends Thread { UUID sessionId = UUIDGen.getUUID(ByteBuffer.wrap(sessionBytes)); TraceState state = Tracing.instance.get(sessionId); -String message = String.format(Sending message to %s, poolReference.endPoint()); +String message = String.format(Sending %s message to %s, qm.message.verb, poolReference.endPoint()); // session may have already finished; see CASSANDRA-5668 if (state == null) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/service/AbstractReadExecutor.java -- diff --git a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java index 3f57e73..2f2370d 100644 --- a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java +++ b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java @@ -43,6 +43,8 @@ import org.apache.cassandra.metrics.ReadRepairMetrics; import org.apache.cassandra.net.MessageOut; import org.apache.cassandra.net.MessagingService; import org.apache.cassandra.service.StorageProxy.LocalReadRunnable; +import org.apache.cassandra.tracing.TraceState; +import org.apache.cassandra.tracing.Tracing; import org.apache.cassandra.utils.FBUtilities; /** @@ -61,12 +63,14 @@ public abstract class AbstractReadExecutor protected final ListInetAddress targetReplicas; protected final RowDigestResolver resolver; protected final ReadCallbackReadResponse, Row handler; +protected final TraceState traceState;
[2/4] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/8a56868b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/8a56868b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/8a56868b Branch: refs/heads/trunk Commit: 8a56868bcaa7d58c907410a1821e83ada72ee0a9 Parents: 2c58581 353d4a0 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:27:20 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:33:59 2015 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java| 2 +- .../cassandra/service/AbstractReadExecutor.java | 12 .../apache/cassandra/service/ReadCallback.java | 20 ++-- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 35 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/CHANGES.txt -- diff --cc CHANGES.txt index 0b0cf83,6a137a3..3e4fd36 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,11 -1,5 +1,12 @@@ -2.0.17 +2.1.8 + * Fix IndexOutOfBoundsException when inserting tuple with too many + elements using the string literal notation (CASSANDRA-9559) + * Allow JMX over SSL directly from nodetool (CASSANDRA-9090) + * Fix incorrect result for IN queries where column not found (CASSANDRA-9540) + * Enable describe on indices (CASSANDRA-7814) + * ColumnFamilyStore.selectAndReference may block during compaction (CASSANDRA-9637) +Merged from 2.0 + * Improve trace messages for RR (CASSANDRA-9479) * Fix suboptimal secondary index selection when restricted clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/net/MessagingService.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java -- diff --cc src/java/org/apache/cassandra/service/AbstractReadExecutor.java index 0546e27,2f2370d..2d02e34 --- a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java +++ b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java @@@ -77,7 -81,23 +81,8 @@@ public abstract class AbstractReadExecu protected void makeDataRequests(IterableInetAddress endpoints) { -for (InetAddress endpoint : endpoints) -{ -if (isLocalRequest(endpoint)) -{ -if (traceState != null) -traceState.trace(reading data locally); -logger.trace(reading data locally); -StageManager.getStage(Stage.READ).execute(new LocalReadRunnable(command, handler)); -} -else -{ -if (traceState != null) -traceState.trace(reading data from {}, endpoint); -logger.trace(reading data from {}, endpoint); -MessagingService.instance().sendRR(command.createMessage(), endpoint, handler); -} -} +makeRequests(command, endpoints); ++ } protected void makeDigestRequests(IterableInetAddress endpoints) @@@ -94,21 -109,18 +99,23 @@@ { if (isLocalRequest(endpoint)) { -if (traceState != null) -traceState.trace(reading digest locally); -logger.trace(reading digest locally); -StageManager.getStage(Stage.READ).execute(new LocalReadRunnable(digestCommand, handler)); -} -else -{ -if (traceState != null) -traceState.trace(reading digest from {}, endpoint); -logger.trace(reading digest from {}, endpoint); -MessagingService.instance().sendRR(message, endpoint, handler); +hasLocalEndpoint = true; +continue; } + ++if (traceState != null) ++traceState.trace(reading {} from {}, readCommand.isDigestQuery() ? digest : data, endpoint); +logger.trace(reading {} from {}, readCommand.isDigestQuery() ? digest : data, endpoint); +
[1/2] cassandra git commit: Improve trace messages for RR
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 2c5858133 - 8a56868bc Improve trace messages for RR patch by Robert Stupp; reviewed by Jason Brown for CASSANDRA-9479 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/353d4a05 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/353d4a05 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/353d4a05 Branch: refs/heads/cassandra-2.1 Commit: 353d4a052c866cb230e06e69e99d9c5c8c8d955c Parents: f2db756 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:24:34 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:24:34 2015 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java | 2 +- .../cassandra/service/AbstractReadExecutor.java | 17 + .../apache/cassandra/service/ReadCallback.java | 19 ++- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 40 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 32f0873..6a137a3 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.17 + * Improve trace messages for RR (CASSANDRA-9479) * Fix suboptimal secondary index selection when restricted clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/net/MessagingService.java -- diff --git a/src/java/org/apache/cassandra/net/MessagingService.java b/src/java/org/apache/cassandra/net/MessagingService.java index d570faf..ee6b87b 100644 --- a/src/java/org/apache/cassandra/net/MessagingService.java +++ b/src/java/org/apache/cassandra/net/MessagingService.java @@ -722,7 +722,7 @@ public final class MessagingService implements MessagingServiceMBean { TraceState state = Tracing.instance.initializeFromMessage(message); if (state != null) -state.trace(Message received from {}, message.from); +state.trace({} message received from {}, message.verb, message.from); Verb verb = message.verb; message = SinkManager.processInboundMessage(message, id); http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/net/OutboundTcpConnection.java -- diff --git a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java index 5559df2..af61dd4 100644 --- a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java +++ b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java @@ -186,7 +186,7 @@ public class OutboundTcpConnection extends Thread { UUID sessionId = UUIDGen.getUUID(ByteBuffer.wrap(sessionBytes)); TraceState state = Tracing.instance.get(sessionId); -String message = String.format(Sending message to %s, poolReference.endPoint()); +String message = String.format(Sending %s message to %s, qm.message.verb, poolReference.endPoint()); // session may have already finished; see CASSANDRA-5668 if (state == null) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/service/AbstractReadExecutor.java -- diff --git a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java index 3f57e73..2f2370d 100644 --- a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java +++ b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java @@ -43,6 +43,8 @@ import org.apache.cassandra.metrics.ReadRepairMetrics; import org.apache.cassandra.net.MessageOut; import org.apache.cassandra.net.MessagingService; import org.apache.cassandra.service.StorageProxy.LocalReadRunnable; +import org.apache.cassandra.tracing.TraceState; +import org.apache.cassandra.tracing.Tracing; import org.apache.cassandra.utils.FBUtilities; /** @@ -61,12 +63,14 @@ public abstract class AbstractReadExecutor protected final ListInetAddress targetReplicas; protected final RowDigestResolver resolver; protected final ReadCallbackReadResponse, Row handler; +protected final TraceState traceState;
[1/4] cassandra git commit: Improve trace messages for RR
Repository: cassandra Updated Branches: refs/heads/trunk 9e2893853 - 6739434c6 Improve trace messages for RR patch by Robert Stupp; reviewed by Jason Brown for CASSANDRA-9479 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/353d4a05 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/353d4a05 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/353d4a05 Branch: refs/heads/trunk Commit: 353d4a052c866cb230e06e69e99d9c5c8c8d955c Parents: f2db756 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:24:34 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:24:34 2015 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java | 2 +- .../cassandra/service/AbstractReadExecutor.java | 17 + .../apache/cassandra/service/ReadCallback.java | 19 ++- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 40 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 32f0873..6a137a3 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.17 + * Improve trace messages for RR (CASSANDRA-9479) * Fix suboptimal secondary index selection when restricted clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/net/MessagingService.java -- diff --git a/src/java/org/apache/cassandra/net/MessagingService.java b/src/java/org/apache/cassandra/net/MessagingService.java index d570faf..ee6b87b 100644 --- a/src/java/org/apache/cassandra/net/MessagingService.java +++ b/src/java/org/apache/cassandra/net/MessagingService.java @@ -722,7 +722,7 @@ public final class MessagingService implements MessagingServiceMBean { TraceState state = Tracing.instance.initializeFromMessage(message); if (state != null) -state.trace(Message received from {}, message.from); +state.trace({} message received from {}, message.verb, message.from); Verb verb = message.verb; message = SinkManager.processInboundMessage(message, id); http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/net/OutboundTcpConnection.java -- diff --git a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java index 5559df2..af61dd4 100644 --- a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java +++ b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java @@ -186,7 +186,7 @@ public class OutboundTcpConnection extends Thread { UUID sessionId = UUIDGen.getUUID(ByteBuffer.wrap(sessionBytes)); TraceState state = Tracing.instance.get(sessionId); -String message = String.format(Sending message to %s, poolReference.endPoint()); +String message = String.format(Sending %s message to %s, qm.message.verb, poolReference.endPoint()); // session may have already finished; see CASSANDRA-5668 if (state == null) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/service/AbstractReadExecutor.java -- diff --git a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java index 3f57e73..2f2370d 100644 --- a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java +++ b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java @@ -43,6 +43,8 @@ import org.apache.cassandra.metrics.ReadRepairMetrics; import org.apache.cassandra.net.MessageOut; import org.apache.cassandra.net.MessagingService; import org.apache.cassandra.service.StorageProxy.LocalReadRunnable; +import org.apache.cassandra.tracing.TraceState; +import org.apache.cassandra.tracing.Tracing; import org.apache.cassandra.utils.FBUtilities; /** @@ -61,12 +63,14 @@ public abstract class AbstractReadExecutor protected final ListInetAddress targetReplicas; protected final RowDigestResolver resolver; protected final ReadCallbackReadResponse, Row handler; +protected final TraceState traceState;
[2/3] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/8a56868b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/8a56868b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/8a56868b Branch: refs/heads/cassandra-2.2 Commit: 8a56868bcaa7d58c907410a1821e83ada72ee0a9 Parents: 2c58581 353d4a0 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:27:20 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:33:59 2015 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java| 2 +- .../cassandra/service/AbstractReadExecutor.java | 12 .../apache/cassandra/service/ReadCallback.java | 20 ++-- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 35 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/CHANGES.txt -- diff --cc CHANGES.txt index 0b0cf83,6a137a3..3e4fd36 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,11 -1,5 +1,12 @@@ -2.0.17 +2.1.8 + * Fix IndexOutOfBoundsException when inserting tuple with too many + elements using the string literal notation (CASSANDRA-9559) + * Allow JMX over SSL directly from nodetool (CASSANDRA-9090) + * Fix incorrect result for IN queries where column not found (CASSANDRA-9540) + * Enable describe on indices (CASSANDRA-7814) + * ColumnFamilyStore.selectAndReference may block during compaction (CASSANDRA-9637) +Merged from 2.0 + * Improve trace messages for RR (CASSANDRA-9479) * Fix suboptimal secondary index selection when restricted clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/net/MessagingService.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java -- diff --cc src/java/org/apache/cassandra/service/AbstractReadExecutor.java index 0546e27,2f2370d..2d02e34 --- a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java +++ b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java @@@ -77,7 -81,23 +81,8 @@@ public abstract class AbstractReadExecu protected void makeDataRequests(IterableInetAddress endpoints) { -for (InetAddress endpoint : endpoints) -{ -if (isLocalRequest(endpoint)) -{ -if (traceState != null) -traceState.trace(reading data locally); -logger.trace(reading data locally); -StageManager.getStage(Stage.READ).execute(new LocalReadRunnable(command, handler)); -} -else -{ -if (traceState != null) -traceState.trace(reading data from {}, endpoint); -logger.trace(reading data from {}, endpoint); -MessagingService.instance().sendRR(command.createMessage(), endpoint, handler); -} -} +makeRequests(command, endpoints); ++ } protected void makeDigestRequests(IterableInetAddress endpoints) @@@ -94,21 -109,18 +99,23 @@@ { if (isLocalRequest(endpoint)) { -if (traceState != null) -traceState.trace(reading digest locally); -logger.trace(reading digest locally); -StageManager.getStage(Stage.READ).execute(new LocalReadRunnable(digestCommand, handler)); -} -else -{ -if (traceState != null) -traceState.trace(reading digest from {}, endpoint); -logger.trace(reading digest from {}, endpoint); -MessagingService.instance().sendRR(message, endpoint, handler); +hasLocalEndpoint = true; +continue; } + ++if (traceState != null) ++traceState.trace(reading {} from {}, readCommand.isDigestQuery() ? digest : data, endpoint); +logger.trace(reading {} from {}, readCommand.isDigestQuery() ? digest : data, endpoint); +
[2/2] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/8a56868b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/8a56868b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/8a56868b Branch: refs/heads/cassandra-2.1 Commit: 8a56868bcaa7d58c907410a1821e83ada72ee0a9 Parents: 2c58581 353d4a0 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:27:20 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:33:59 2015 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java| 2 +- .../cassandra/service/AbstractReadExecutor.java | 12 .../apache/cassandra/service/ReadCallback.java | 20 ++-- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 35 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/CHANGES.txt -- diff --cc CHANGES.txt index 0b0cf83,6a137a3..3e4fd36 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,11 -1,5 +1,12 @@@ -2.0.17 +2.1.8 + * Fix IndexOutOfBoundsException when inserting tuple with too many + elements using the string literal notation (CASSANDRA-9559) + * Allow JMX over SSL directly from nodetool (CASSANDRA-9090) + * Fix incorrect result for IN queries where column not found (CASSANDRA-9540) + * Enable describe on indices (CASSANDRA-7814) + * ColumnFamilyStore.selectAndReference may block during compaction (CASSANDRA-9637) +Merged from 2.0 + * Improve trace messages for RR (CASSANDRA-9479) * Fix suboptimal secondary index selection when restricted clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/net/MessagingService.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/8a56868b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java -- diff --cc src/java/org/apache/cassandra/service/AbstractReadExecutor.java index 0546e27,2f2370d..2d02e34 --- a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java +++ b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java @@@ -77,7 -81,23 +81,8 @@@ public abstract class AbstractReadExecu protected void makeDataRequests(IterableInetAddress endpoints) { -for (InetAddress endpoint : endpoints) -{ -if (isLocalRequest(endpoint)) -{ -if (traceState != null) -traceState.trace(reading data locally); -logger.trace(reading data locally); -StageManager.getStage(Stage.READ).execute(new LocalReadRunnable(command, handler)); -} -else -{ -if (traceState != null) -traceState.trace(reading data from {}, endpoint); -logger.trace(reading data from {}, endpoint); -MessagingService.instance().sendRR(command.createMessage(), endpoint, handler); -} -} +makeRequests(command, endpoints); ++ } protected void makeDigestRequests(IterableInetAddress endpoints) @@@ -94,21 -109,18 +99,23 @@@ { if (isLocalRequest(endpoint)) { -if (traceState != null) -traceState.trace(reading digest locally); -logger.trace(reading digest locally); -StageManager.getStage(Stage.READ).execute(new LocalReadRunnable(digestCommand, handler)); -} -else -{ -if (traceState != null) -traceState.trace(reading digest from {}, endpoint); -logger.trace(reading digest from {}, endpoint); -MessagingService.instance().sendRR(message, endpoint, handler); +hasLocalEndpoint = true; +continue; } + ++if (traceState != null) ++traceState.trace(reading {} from {}, readCommand.isDigestQuery() ? digest : data, endpoint); +logger.trace(reading {} from {}, readCommand.isDigestQuery() ? digest : data, endpoint); +
[3/4] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2
Merge branch 'cassandra-2.1' into cassandra-2.2 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/14d7a63b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/14d7a63b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/14d7a63b Branch: refs/heads/trunk Commit: 14d7a63b8a29b15831d035182d12cfacc7518687 Parents: 2a4ab87 8a56868 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:36:54 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:36:54 2015 +0200 -- CHANGES.txt | 3 +++ .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java| 2 +- .../cassandra/service/AbstractReadExecutor.java | 12 .../apache/cassandra/service/ReadCallback.java | 20 ++-- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 37 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/14d7a63b/CHANGES.txt -- diff --cc CHANGES.txt index 811e955,3e4fd36..8f4f752 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,30 -1,14 +1,33 @@@ -2.1.8 +2.2 + * Allow JMX over SSL directly from nodetool (CASSANDRA-9090) + * Update cqlsh for UDFs (CASSANDRA-7556) + * Change Windows kernel default timer resolution (CASSANDRA-9634) + * Deprected sstable2json and json2sstable (CASSANDRA-9618) + * Allow native functions in user-defined aggregates (CASSANDRA-9542) + * Don't repair system_distributed by default (CASSANDRA-9621) + * Fix mixing min, max, and count aggregates for blob type (CASSANRA-9622) + * Rename class for DATE type in Java driver (CASSANDRA-9563) + * Duplicate compilation of UDFs on coordinator (CASSANDRA-9475) + * Fix connection leak in CqlRecordWriter (CASSANDRA-9576) + * Mlockall before opening system sstables remove boot_without_jna option (CASSANDRA-9573) + * Add functions to convert timeuuid to date or time, deprecate dateOf and unixTimestampOf (CASSANDRA-9229) + * Make sure we cancel non-compacting sstables from LifecycleTransaction (CASSANDRA-9566) + * Fix deprecated repair JMX API (CASSANDRA-9570) + * Add logback metrics (CASSANDRA-9378) + * Update and refactor ant test/test-compression to run the tests in parallel (CASSANDRA-9583) +Merged from 2.1: * Fix IndexOutOfBoundsException when inserting tuple with too many elements using the string literal notation (CASSANDRA-9559) - * Allow JMX over SSL directly from nodetool (CASSANDRA-9090) - * Fix incorrect result for IN queries where column not found (CASSANDRA-9540) * Enable describe on indices (CASSANDRA-7814) + * Fix incorrect result for IN queries where column not found (CASSANDRA-9540) * ColumnFamilyStore.selectAndReference may block during compaction (CASSANDRA-9637) + * Fix bug in cardinality check when compacting (CASSANDRA-9580) + * Fix memory leak in Ref due to ConcurrentLinkedQueue.remove() behaviour (CASSANDRA-9549) + * Make rebuild only run one at a time (CASSANDRA-9119) Merged from 2.0 + * Improve trace messages for RR (CASSANDRA-9479) + * Fix suboptimal secondary index selection when restricted +clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) * Fix error message when attempting to create an index on a column in a COMPACT STORAGE table with clustering columns (CASSANDRA-9527) http://git-wip-us.apache.org/repos/asf/cassandra/blob/14d7a63b/src/java/org/apache/cassandra/net/MessagingService.java -- diff --cc src/java/org/apache/cassandra/net/MessagingService.java index 293a27c,1820c5c..83bc337 --- a/src/java/org/apache/cassandra/net/MessagingService.java +++ b/src/java/org/apache/cassandra/net/MessagingService.java @@@ -745,12 -728,15 +745,12 @@@ public final class MessagingService imp { TraceState state = Tracing.instance.initializeFromMessage(message); if (state != null) - state.trace(Message received from {}, message.from); + state.trace({} message received from {}, message.verb, message.from); -Verb verb = message.verb; -message = SinkManager.processInboundMessage(message, id); -if (message == null) -{ -incrementRejectedMessages(verb); -return; -} +// message sinks are a testing hook +for (IMessageSink ms : messageSinks) +if (!ms.allowIncomingMessage(message, id)) +return; Runnable runnable = new MessageDeliveryTask(message, id, timestamp); TracingAwareExecutorService stage =
cassandra git commit: Improve trace messages for RR
Repository: cassandra Updated Branches: refs/heads/cassandra-2.0 f2db756ab - 353d4a052 Improve trace messages for RR patch by Robert Stupp; reviewed by Jason Brown for CASSANDRA-9479 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/353d4a05 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/353d4a05 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/353d4a05 Branch: refs/heads/cassandra-2.0 Commit: 353d4a052c866cb230e06e69e99d9c5c8c8d955c Parents: f2db756 Author: Robert Stupp sn...@snazy.de Authored: Sun Jun 28 10:24:34 2015 +0200 Committer: Robert Stupp sn...@snazy.de Committed: Sun Jun 28 10:24:34 2015 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/net/MessagingService.java | 2 +- .../cassandra/net/OutboundTcpConnection.java | 2 +- .../cassandra/service/AbstractReadExecutor.java | 17 + .../apache/cassandra/service/ReadCallback.java | 19 ++- .../cassandra/service/RowDataResolver.java | 2 ++ 6 files changed, 40 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 32f0873..6a137a3 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.17 + * Improve trace messages for RR (CASSANDRA-9479) * Fix suboptimal secondary index selection when restricted clustering column is also indexed (CASSANDRA-9631) * (cqlsh) Add min_threshold to DTCS option autocomplete (CASSANDRA-9385) http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/net/MessagingService.java -- diff --git a/src/java/org/apache/cassandra/net/MessagingService.java b/src/java/org/apache/cassandra/net/MessagingService.java index d570faf..ee6b87b 100644 --- a/src/java/org/apache/cassandra/net/MessagingService.java +++ b/src/java/org/apache/cassandra/net/MessagingService.java @@ -722,7 +722,7 @@ public final class MessagingService implements MessagingServiceMBean { TraceState state = Tracing.instance.initializeFromMessage(message); if (state != null) -state.trace(Message received from {}, message.from); +state.trace({} message received from {}, message.verb, message.from); Verb verb = message.verb; message = SinkManager.processInboundMessage(message, id); http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/net/OutboundTcpConnection.java -- diff --git a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java index 5559df2..af61dd4 100644 --- a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java +++ b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java @@ -186,7 +186,7 @@ public class OutboundTcpConnection extends Thread { UUID sessionId = UUIDGen.getUUID(ByteBuffer.wrap(sessionBytes)); TraceState state = Tracing.instance.get(sessionId); -String message = String.format(Sending message to %s, poolReference.endPoint()); +String message = String.format(Sending %s message to %s, qm.message.verb, poolReference.endPoint()); // session may have already finished; see CASSANDRA-5668 if (state == null) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/353d4a05/src/java/org/apache/cassandra/service/AbstractReadExecutor.java -- diff --git a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java index 3f57e73..2f2370d 100644 --- a/src/java/org/apache/cassandra/service/AbstractReadExecutor.java +++ b/src/java/org/apache/cassandra/service/AbstractReadExecutor.java @@ -43,6 +43,8 @@ import org.apache.cassandra.metrics.ReadRepairMetrics; import org.apache.cassandra.net.MessageOut; import org.apache.cassandra.net.MessagingService; import org.apache.cassandra.service.StorageProxy.LocalReadRunnable; +import org.apache.cassandra.tracing.TraceState; +import org.apache.cassandra.tracing.Tracing; import org.apache.cassandra.utils.FBUtilities; /** @@ -61,12 +63,14 @@ public abstract class AbstractReadExecutor protected final ListInetAddress targetReplicas; protected final RowDigestResolver resolver; protected final ReadCallbackReadResponse, Row handler; +protected final TraceState traceState;
[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-9666: --- Reviewer: Marcus Eriksson Provide an alternative to DTCS -- Key: CASSANDRA-9666 URL: https://issues.apache.org/jira/browse/CASSANDRA-9666 Project: Cassandra Issue Type: Improvement Reporter: Jeff Jirsa Assignee: Jeff Jirsa Fix For: 2.1.x, 2.2.x DTCS is great for time series data, but it comes with caveats that make it difficult to use in production (typical operator behaviors such as bootstrap, removenode, and repair have MAJOR caveats as they relate to max_sstable_age_days, and hints/read repair break the selection algorithm). I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the tiered nature of DTCS in order to address some of DTCS' operational shortcomings. I believe it is necessary to propose an alternative rather than simply adjusting DTCS, because it fundamentally removes the tiered nature in order to remove the parameter max_sstable_age_days - the result is very very different, even if it is heavily inspired by DTCS. Specifically, rather than creating a number of windows of ever increasing sizes, this strategy allows an operator to choose the window size, compact with STCS within the first window of that size, and aggressive compact down to a single sstable once that window is no longer current. The window size is a combination of unit (minutes, hours, days) and size (1, etc), such that an operator can expect all data using a block of that size to be compacted together (that is, if your unit is hours, and size is 6, you will create roughly 4 sstables per day, each one containing roughly 6 hours of data). The result addresses a number of the problems with DateTieredCompactionStrategy: - At the present time, DTCS’s first window is compacted using an unusual selection criteria, which prefers files with earlier timestamps, but ignores sizes. In TimeWindowCompactionStrategy, the first window data will be compacted with the well tested, fast, reliable STCS. All STCS options can be passed to TimeWindowCompactionStrategy to configure the first window’s compaction behavior. - HintedHandoff may put old data in new sstables, but it will have little impact other than slightly reduced efficiency (sstables will cover a wider range, but the old timestamps will not impact sstable selection criteria during compaction) - ReadRepair may put old data in new sstables, but it will have little impact other than slightly reduced efficiency (sstables will cover a wider range, but the old timestamps will not impact sstable selection criteria during compaction) - Small, old sstables resulting from streams of any kind will be swiftly and aggressively compacted with the other sstables matching their similar maxTimestamp, without causing sstables in neighboring windows to grow in size. - The configuration options are explicit and straightforward - the tuning parameters leave little room for error. The window is set in common, easily understandable terms such as “12 hours”, “1 Day”, “30 days”. The minute/hour/day options are granular enough for users keeping data for hours, and users keeping data for years. - There is no explicitly configurable max sstable age, though sstables will naturally stop compacting once new data is written in that window. - Streaming operations can create sstables with old timestamps, and they'll naturally be joined together with sstables in the same time bucket. This is true for bootstrap/repair/sstableloader/removenode. - It remains true that if old data and new data is written into the memtable at the same time, the resulting sstables will be treated as if they were new sstables, however, that no longer negatively impacts the compaction strategy’s selection criteria for older windows. Patch provided for both 2.1 ( https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 ) and 2.2 ( https://github.com/jeffjirsa/cassandra/commits/twcs ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604697#comment-14604697 ] Jonathan Ellis commented on CASSANDRA-9318: --- # You can't just shed indiscriminately without breaking the contract that once you've accepted a write (and not handed back UnavailableException) then you must deliver it or hint it. You can't just drop it on the floor even to save yourself from falling over. (If you fall over then at least it's clear to the user that you weren't able to fulfill your contract. No, logging it isn't good enough.) # Again, shedding is strictly worse from a user's point of view than not accepting a write we can't handle. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9668) RepairException when trying to run concurrent repair -pr
[ https://issues.apache.org/jira/browse/CASSANDRA-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604711#comment-14604711 ] Yuki Morishita commented on CASSANDRA-9668: --- There should be ERROR on /172.31.13.127. Can you find it with repair {{#b1e67660-1d78-11e5-aec7-4f05493cbe02}}? RepairException when trying to run concurrent repair -pr Key: CASSANDRA-9668 URL: https://issues.apache.org/jira/browse/CASSANDRA-9668 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.7 Reporter: david Assignee: Yuki Morishita Priority: Critical Fix For: 2.1.x Was on 2.1.3 having very similar issues to those described in: https://issues.apache.org/jira/browse/CASSANDRA-9266 I updated to 2.1.7, more for some other fixes, but now if I try and run concurrent repairs (different boxes) consistently get: {noformat} ERROR [Thread-14156] 2015-06-28 09:33:12,616 StorageService.java:2959 - Repair session b1e67660-1d78-11e5-aec7-4f05493cbe02 for range (-4660677346721084182,-4658765298409301171] failed with error org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_40] at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_40] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_40] ... 1 common frames omitted Caused by: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.7.jar:2.1.7] ... 3 common frames omitted {noformat} The specific repair command being issued: {noformat} nodetool repair -local -pr -inc -par -- keyspace {noformat} It's a 15 box environment with a replication factor of 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604746#comment-14604746 ] Benedict edited comment on CASSANDRA-9318 at 6/28/15 3:48 PM: -- bq. This in no way affects our contract or guarantees, since we don't do anything at all in the intervening period except consume memory. bq. The whole point is that coordinators are falling over from OOM. This isn't just something we can wave away as negligible. I was referring here to the status quo, FTR. Also FTR, we do clearly state hints are best effort (they also aren't guaranteed to be persisted), so as far as contracts / guarantees are concerned, I don't know we make any (and I wasn't aware of this one). It would be really helpful for these (and many other) discussions if all of the assumptions, contracts and guarantees we make about correctness and delivery were made available in a single clearly spelled out document (and that, like the code style, this document is the final arbiter of what action to take). was (Author: benedict): bq. This in no way affects our contract or guarantees, since we don't do anything at all in the intervening period except consume memory. bq. The whole point is that coordinators are falling over from OOM. This isn't just something we can wave away as negligible. I was referring here to the status quo, FTR. Also FTR, we do clearly state hints are best effort (they also aren't guaranteed to be persisted), so as far as contracts / guarantees are concerned, I don't know we make any (and I wasn't aware of this one). It would be really helpful for these (and many other) discussions if all of the assumptions, contracts and guarantees we make about correctness and delivery were made available in a single clearly spelled out document. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604767#comment-14604767 ] Jonathan Ellis commented on CASSANDRA-9318: --- I think you're [Aleksey] confusing replica-side and coordinator-side. There are no separate write and read queues on the coordinator and there probably shouldn't be. Again, let's avoid the temptation to boil the ocean. We can make a simple improvement by bounding the total outstanding request size that we have in flight on the coordinator side. Once we exceed that global limit we stop reading additional requests. That's it. There are lots of ways we can tweak this (e.g. estimating how many rows a read is likely to return) but making that change alone will be a big improvement and is something that we can reasonably do for 2.1.x, so let's not hold it up for grand rewrites of all the things. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604734#comment-14604734 ] Benedict commented on CASSANDRA-9318: - In fact, building on CASSANDRA-6230, it might be superior to immediately write to the hints buffer (saving the serialization for passing to MS; this might also reduce serialization rather than increase), and to mark the hint delivered when we receive the callback. On flushing hints, we can ignore any that have been delivered (which we would prefer to do anyway). We ideally only flush the hints buffer after the timeout interval has elapsed, or alternatively if we run out of a generous memory allowance. With some small tweaks we would only need to keep a minimal piece of identifying information to invalidate the hint record, even after it has been written to disk. This would ensure our behaviour is identical, except with a guaranteed bound on memory utilisation (and increased capacity, since we can serialize the hints off heap, and they will occupy much less space). Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine
[ https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604748#comment-14604748 ] Benedict commented on CASSANDRA-8099: - That's all I needed to hear. Thanks! Refactor and modernize the storage engine - Key: CASSANDRA-8099 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 3.0 beta 1 Attachments: 8099-nit The current storage engine (which for this ticket I'll loosely define as the code implementing the read/write path) is suffering from old age. One of the main problem is that the only structure it deals with is the cell, which completely ignores the more high level CQL structure that groups cell into (CQL) rows. This leads to many inefficiencies, like the fact that during a reads we have to group cells multiple times (to count on replica, then to count on the coordinator, then to produce the CQL resultset) because we forget about the grouping right away each time (so lots of useless cell names comparisons in particular). But outside inefficiencies, having to manually recreate the CQL structure every time we need it for something is hindering new features and makes the code more complex that it should be. Said storage engine also has tons of technical debt. To pick an example, the fact that during range queries we update {{SliceQueryFilter.count}} is pretty hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to go into to simply remove the last query result. So I want to bite the bullet and modernize this storage engine. I propose to do 2 main things: # Make the storage engine more aware of the CQL structure. In practice, instead of having partitions be a simple iterable map of cells, it should be an iterable list of row (each being itself composed of per-column cells, though obviously not exactly the same kind of cell we have today). # Make the engine more iterative. What I mean here is that in the read path, we end up reading all cells in memory (we put them in a ColumnFamily object), but there is really no reason to. If instead we were working with iterators all the way through, we could get to a point where we're basically transferring data from disk to the network, and we should be able to reduce GC substantially. Please note that such refactor should provide some performance improvements right off the bat but it's not it's primary goal either. It's primary goal is to simplify the storage engine and adds abstraction that are better suited to further optimizations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604763#comment-14604763 ] Jonathan Ellis edited comment on CASSANDRA-9318 at 6/28/15 4:31 PM: Hinting is better than leaving things in an unknown state but it's not something we should opt users into if we have a better option, since it basically turns the write into CL.ANY. I think you're [Benedict] overselling how scary it is to stop reading new requests until we can free up some memory from MS. We're not dropping connections. We're just imposing some flow control. Which is something that already happens at different levels anyway. was (Author: jbellis): Hinting is better than leaving things in an unknown state but it's not something we should opt users into if we have a better option, since it basically turns the write into CL.ANY. I think you're overselling how scary it is to stop reading new requests until we can free up some memory from MS. We're not dropping connections. We're just imposing some flow control. Which is something that already happens at different levels anyway. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604771#comment-14604771 ] Aleksey Yeschenko commented on CASSANDRA-9318: -- bq. I think you're [Aleksey] confusing replica-side and coordinator-side. There are no separate write and read queues on the coordinator and there probably shouldn't be. I don't think I am. I also don't think that the distinction between replica-side and cordinator-side is very useful here, given the most common case of collocating the two roles (with the exception of Rick). There are no separate queues, but there should eventually be something bounding both reads and writes. And arguably the two are different enough from each other (one is request heavy, the other is response heavy) to be treated differently, and in effect as two separate queues. Otherwise, this discussions sounds like fun, but I'm going to stay out of it until I form a 2.1.x-appropriate opinion on it. Just wanted to clarify some hints-related questions and CASSANDRA-6230 (which, being a 3.0 ticket, is somewhat not immediately relevant). Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9668) RepairException when trying to run concurrent repair -pr
[ https://issues.apache.org/jira/browse/CASSANDRA-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-9668: --- Assignee: Yuki Morishita RepairException when trying to run concurrent repair -pr Key: CASSANDRA-9668 URL: https://issues.apache.org/jira/browse/CASSANDRA-9668 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.7 Reporter: david Assignee: Yuki Morishita Priority: Critical Fix For: 2.1.x Was on 2.1.3 having very similar issues to those described in: https://issues.apache.org/jira/browse/CASSANDRA-9266 I updated to 2.1.7, more for some other fixes, but now if I try and run concurrent repairs (different boxes) consistently get: {noformat} ERROR [Thread-14156] 2015-06-28 09:33:12,616 StorageService.java:2959 - Repair session b1e67660-1d78-11e5-aec7-4f05493cbe02 for range (-4660677346721084182,-4658765298409301171] failed with error org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_40] at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_40] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_40] ... 1 common frames omitted Caused by: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.7.jar:2.1.7] ... 3 common frames omitted {noformat} The specific repair command being issued: {noformat} nodetool repair -local -pr -inc -par -- keyspace {noformat} It's a 15 box environment with a replication factor of 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-9666: --- Assignee: Jeff Jirsa Provide an alternative to DTCS -- Key: CASSANDRA-9666 URL: https://issues.apache.org/jira/browse/CASSANDRA-9666 Project: Cassandra Issue Type: Improvement Reporter: Jeff Jirsa Assignee: Jeff Jirsa Fix For: 2.1.x, 2.2.x DTCS is great for time series data, but it comes with caveats that make it difficult to use in production (typical operator behaviors such as bootstrap, removenode, and repair have MAJOR caveats as they relate to max_sstable_age_days, and hints/read repair break the selection algorithm). I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the tiered nature of DTCS in order to address some of DTCS' operational shortcomings. I believe it is necessary to propose an alternative rather than simply adjusting DTCS, because it fundamentally removes the tiered nature in order to remove the parameter max_sstable_age_days - the result is very very different, even if it is heavily inspired by DTCS. Specifically, rather than creating a number of windows of ever increasing sizes, this strategy allows an operator to choose the window size, compact with STCS within the first window of that size, and aggressive compact down to a single sstable once that window is no longer current. The window size is a combination of unit (minutes, hours, days) and size (1, etc), such that an operator can expect all data using a block of that size to be compacted together (that is, if your unit is hours, and size is 6, you will create roughly 4 sstables per day, each one containing roughly 6 hours of data). The result addresses a number of the problems with DateTieredCompactionStrategy: - At the present time, DTCS’s first window is compacted using an unusual selection criteria, which prefers files with earlier timestamps, but ignores sizes. In TimeWindowCompactionStrategy, the first window data will be compacted with the well tested, fast, reliable STCS. All STCS options can be passed to TimeWindowCompactionStrategy to configure the first window’s compaction behavior. - HintedHandoff may put old data in new sstables, but it will have little impact other than slightly reduced efficiency (sstables will cover a wider range, but the old timestamps will not impact sstable selection criteria during compaction) - ReadRepair may put old data in new sstables, but it will have little impact other than slightly reduced efficiency (sstables will cover a wider range, but the old timestamps will not impact sstable selection criteria during compaction) - Small, old sstables resulting from streams of any kind will be swiftly and aggressively compacted with the other sstables matching their similar maxTimestamp, without causing sstables in neighboring windows to grow in size. - The configuration options are explicit and straightforward - the tuning parameters leave little room for error. The window is set in common, easily understandable terms such as “12 hours”, “1 Day”, “30 days”. The minute/hour/day options are granular enough for users keeping data for hours, and users keeping data for years. - There is no explicitly configurable max sstable age, though sstables will naturally stop compacting once new data is written in that window. - Streaming operations can create sstables with old timestamps, and they'll naturally be joined together with sstables in the same time bucket. This is true for bootstrap/repair/sstableloader/removenode. - It remains true that if old data and new data is written into the memtable at the same time, the resulting sstables will be treated as if they were new sstables, however, that no longer negatively impacts the compaction strategy’s selection criteria for older windows. Patch provided for both 2.1 ( https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 ) and 2.2 ( https://github.com/jeffjirsa/cassandra/commits/twcs ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9668) RepairException when trying to run concurrent repair -pr
[ https://issues.apache.org/jira/browse/CASSANDRA-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604718#comment-14604718 ] david commented on CASSANDRA-9668: -- Yes, here is corrosponding error: {noformat} ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,029 CompactionManager.java:972 - Cannot start multiple repair sessions over the same sstables ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,029 Validator.java:245 - Failed creating a merkle tree for [repair #b1c30fe0-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (9062648853864216757,9072201154757474095]], /172.31.46.189 (see log for details) ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,029 CassandraDaemon.java:223 - Exception in thread Thread[ValidationExecutor:19,1,main] java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:973) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:623) ~[apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] {noformat} This suggest running concurrent repairs (with -pr) is not possible. Is this true? RepairException when trying to run concurrent repair -pr Key: CASSANDRA-9668 URL: https://issues.apache.org/jira/browse/CASSANDRA-9668 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.7 Reporter: david Assignee: Yuki Morishita Priority: Critical Fix For: 2.1.x Was on 2.1.3 having very similar issues to those described in: https://issues.apache.org/jira/browse/CASSANDRA-9266 I updated to 2.1.7, more for some other fixes, but now if I try and run concurrent repairs (different boxes) consistently get: {noformat} ERROR [Thread-14156] 2015-06-28 09:33:12,616 StorageService.java:2959 - Repair session b1e67660-1d78-11e5-aec7-4f05493cbe02 for range (-4660677346721084182,-4658765298409301171] failed with error org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_40] at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_40] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_40] ... 1 common frames omitted Caused by: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]]
[jira] [Updated] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9318: -- Fix Version/s: 2.1.x Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604735#comment-14604735 ] Jonathan Ellis edited comment on CASSANDRA-9318 at 6/28/15 3:35 PM: Let's pull optimizing hints to a separate ticket. It is complementary to don't accept more requests than you can handle. was (Author: jbellis): Let's pull optimizing hints to a separate ticket. It is complementary to don't accept more than you can handle. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604740#comment-14604740 ] Benedict commented on CASSANDRA-9318: - bq. To be clear: all I'm effectively suggesting is we hint ... earlier. OK, so it looks like my understanding of hints was incorrect. However this statement is still valid, and still a better course of action. If we hint immediately, we buy ourselves breathing room _without affecting availability_. If this doesn't buy us enough breathing room, then blocking incoming writes is fine. But we should exhaust all our avenues that maintain our guarantees first, no? Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604715#comment-14604715 ] Benedict commented on CASSANDRA-9318: - bq. shedding is strictly worse If we can do so without affecting our availability guarantees, sure. bq. the contract We already drop hints on the floor if they cannot keep up. To be clear: all I'm effectively suggesting is we hint (including the hinting step that drops hints\*) earlier. This in no way affects our contract or guarantees, since we don't do anything at all in the intervening period except consume memory. All we do is wait until the timeout expires and then hint. This simply preempts the timer and hints immediately, possibly resulting in the message being delivered via hints when it was delivered successfully (but very late) first time around, but also possibly resulting in the hint being dropped, as it could be at either time. \* except that we can probably make this decision _less_ pessimistic than it is currently, with better visibility on overall resource utilisation. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9668) RepairException when trying to run concurrent repair -pr
[ https://issues.apache.org/jira/browse/CASSANDRA-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604718#comment-14604718 ] david edited comment on CASSANDRA-9668 at 6/28/15 3:19 PM: --- Yes, here is corrosponding error: {noformat} ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 CompactionManager.java:972 - Cannot start multiple repair sessions over the same sstables ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 Validator.java:245 - Failed creating a merkle tree for [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on evosload_services_otg_scee_com_driveclub/data, (-4660677346721084182,-4658765298409301171]], /172.31.46.189 (see log for details) ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 CassandraDaemon.java:223 - Exception in thread Thread[ValidationExecutor:19,1,main] java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:973) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:623) ~[apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] {noformat} This suggest running concurrent repairs (with -pr) is not possible. Is this true? was (Author: biffta): Yes, here is corrosponding error: {noformat} ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,029 CompactionManager.java:972 - Cannot start multiple repair sessions over the same sstables ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,029 Validator.java:245 - Failed creating a merkle tree for [repair #b1c30fe0-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (9062648853864216757,9072201154757474095]], /172.31.46.189 (see log for details) ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,029 CassandraDaemon.java:223 - Exception in thread Thread[ValidationExecutor:19,1,main] java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:973) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:623) ~[apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] {noformat} This suggest running concurrent repairs (with -pr) is not possible. Is this true? RepairException when trying to run concurrent repair -pr Key: CASSANDRA-9668 URL: https://issues.apache.org/jira/browse/CASSANDRA-9668 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.7 Reporter: david Assignee: Yuki Morishita Priority: Critical Fix For: 2.1.x Was on 2.1.3 having very similar issues to those described in: https://issues.apache.org/jira/browse/CASSANDRA-9266 I updated to 2.1.7, more for some other fixes, but now if I try and run concurrent repairs (different boxes) consistently get: {noformat} ERROR [Thread-14156] 2015-06-28 09:33:12,616 StorageService.java:2959 - Repair session b1e67660-1d78-11e5-aec7-4f05493cbe02 for range (-4660677346721084182,-4658765298409301171] failed with error org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_40] at
[jira] [Comment Edited] (CASSANDRA-9668) RepairException when trying to run concurrent repair -pr
[ https://issues.apache.org/jira/browse/CASSANDRA-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604718#comment-14604718 ] david edited comment on CASSANDRA-9668 at 6/28/15 3:19 PM: --- Yes, here is corrosponding error: {noformat} ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 CompactionManager.java:972 - Cannot start multiple repair sessions over the same sstables ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 Validator.java:245 - Failed creating a merkle tree for [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]], /172.31.46.189 (see log for details) ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 CassandraDaemon.java:223 - Exception in thread Thread[ValidationExecutor:19,1,main] java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:973) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:623) ~[apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] {noformat} This suggest running concurrent repairs (with -pr) is not possible. Is this true? was (Author: biffta): Yes, here is corrosponding error: {noformat} ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 CompactionManager.java:972 - Cannot start multiple repair sessions over the same sstables ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 Validator.java:245 - Failed creating a merkle tree for [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on evosload_services_otg_scee_com_driveclub/data, (-4660677346721084182,-4658765298409301171]], /172.31.46.189 (see log for details) ERROR [ValidationExecutor:19] 2015-06-28 09:33:12,261 CassandraDaemon.java:223 - Exception in thread Thread[ValidationExecutor:19,1,main] java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:973) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:623) ~[apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] {noformat} This suggest running concurrent repairs (with -pr) is not possible. Is this true? RepairException when trying to run concurrent repair -pr Key: CASSANDRA-9668 URL: https://issues.apache.org/jira/browse/CASSANDRA-9668 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.7 Reporter: david Assignee: Yuki Morishita Priority: Critical Fix For: 2.1.x Was on 2.1.3 having very similar issues to those described in: https://issues.apache.org/jira/browse/CASSANDRA-9266 I updated to 2.1.7, more for some other fixes, but now if I try and run concurrent repairs (different boxes) consistently get: {noformat} ERROR [Thread-14156] 2015-06-28 09:33:12,616 StorageService.java:2959 - Repair session b1e67660-1d78-11e5-aec7-4f05493cbe02 for range (-4660677346721084182,-4658765298409301171] failed with error org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b1e67660-1d78-11e5-aec7-4f05493cbe02 on keyspace/data, (-4660677346721084182,-4658765298409301171]] Validation failed in /172.31.13.127 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_40]
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604728#comment-14604728 ] Jonathan Ellis edited comment on CASSANDRA-9318 at 6/28/15 3:24 PM: bq. This in no way affects our contract or guarantees, since we don't do anything at all in the intervening period except consume memory. The whole point is that coordinators are falling over from OOM. This isn't just something we can wave away as negligible. was (Author: jbellis): bq. This in no way affects our contract or guarantees, since we don't do anything at all in the intervening period except consume memory. The whole point is that coordinators are falling over from OOM. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604726#comment-14604726 ] Jonathan Ellis commented on CASSANDRA-9318: --- We do not drop hints on the floor. We abort a write with UAE if we have too many hints in progress, but that is okay because we've told the client to expect it. Here's a quick summary: # If you send timedoutexception to a client, then you have to write a hint # If you send UAE, then you do not have to write a hint This is equivalent to # If you send the write to any replicas, you must send it to all (via hint if necessary) # If you send UAE, you must do so before sending to any replicas Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604728#comment-14604728 ] Jonathan Ellis commented on CASSANDRA-9318: --- bq. This in no way affects our contract or guarantees, since we don't do anything at all in the intervening period except consume memory. The whole point is that coordinators are falling over from OOM. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604735#comment-14604735 ] Jonathan Ellis commented on CASSANDRA-9318: --- Let's pull optimizing hints to a separate ticket. It is complementary to don't accept more than you can handle. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604746#comment-14604746 ] Benedict commented on CASSANDRA-9318: - bq. This in no way affects our contract or guarantees, since we don't do anything at all in the intervening period except consume memory. bq. The whole point is that coordinators are falling over from OOM. This isn't just something we can wave away as negligible. I was referring here to the status quo, FTR. Also FTR, we do clearly state hints are best effort (they also aren't guaranteed to be persisted), so as far as contracts / guarantees are concerned, I don't know we make any (and I wasn't aware of this one). It would be really helpful for these (and many other) discussions if all of the assumptions, contracts and guarantees we make about correctness and delivery were made available in a single clearly spelled out document. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604755#comment-14604755 ] Aleksey Yeschenko commented on CASSANDRA-9318: -- bq. We do not drop hints on the floor. We don't explicitly, but they are still best effort - before CASSANDRA-6230, and after CASSANDRA-6230, too. Before, because of their brokenness truncating the whole table at first sign of trouble, or disabling hints entirely, is the norm. Even if this hadn't been the case, the way they are persisted - ttld by the lowest gc gs of all tables in the mutation - means that there are many scenarios when we wouldn't replay what we had persisted. We also don't preserve a hint at all if a node is past hints window, and this only affects CL.ANY. CASSANDRA-6230 codifies this fact, and that makes having a separate file+ per host feasible at all - without an intermediate shared log. As soon as we write to a shared {{HintsBuffer}}, we consider a hint written. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604765#comment-14604765 ] Aleksey Yeschenko commented on CASSANDRA-9318: -- bq. On flushing hints, we can ignore any that have been delivered (which we would prefer to do anyway). We ideally only flush the hints buffer after the timeout interval has elapsed, or alternatively if we run out of a generous memory allowance. This we can/should do. bq. With some small tweaks we would only need to keep a minimal piece of identifying information to invalidate the hint record, even after it has been written to disk. I would prefer not go after records that already made it to disk - the former should be good enough. In general, let's keep hints chatter to CASSANDRA-6230 ticket comments, please. As for the matter at hands - we should bound both write and read queues, but certainly not just by some fixed queue lengths. For reads we should be bound by a fixed size read buffer on one side, and SLA on the other side - that we could do once we have the ability to terminate the queries in flight. For writes, don't have an opinion formed, yet. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604763#comment-14604763 ] Jonathan Ellis commented on CASSANDRA-9318: --- Hinting is better than leaving things in an unknown state but it's not something we should opt users into if we have a better option, since it basically turns the write into CL.ANY. I think you're overselling how scary it is to stop reading new requests until we can free up some memory from MS. We're not dropping connections. We're just imposing some flow control. Which is something that already happens at different levels anyway. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604769#comment-14604769 ] Jonathan Ellis commented on CASSANDRA-9318: --- (Re: hints still being best effort -- yes, there are reasons why hints might not be replayed, but that is not the same as not being written in the first place [when a timeout is returned] which is what I was talking about.) Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604659#comment-14604659 ] Jonathan Ellis commented on CASSANDRA-9318: --- bq. How does load shedding (or immediately hinting) not prevent this scenario? I explained that: bq. if clients are sending more requests to a single replica than we can shed (rate * timeout capacity) Suppose we can hold 1GB of messages before we start having GC trouble, and the timeout is default 2s. If we get more than 500MB/s of requests for a dead client we will be in trouble since they are arriving faster than they are shed. bq. The proposal you're making appreciably harms our availability guarantees. It does not, because we only throttle when shedding is insufficient to keep us in a happy place. bq. If we pause accepting requests from a single client Then we don't solve the larger problem which is multiple clients teaming up to overwhelm a coordinator. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9672) Provide a per-table param that would force default ttl on all updates
Aleksey Yeschenko created CASSANDRA-9672: Summary: Provide a per-table param that would force default ttl on all updates Key: CASSANDRA-9672 URL: https://issues.apache.org/jira/browse/CASSANDRA-9672 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Priority: Minor Many users have tables that don't rely on TTL entirely - no deletes, and only fixed TTL value. The way that default ttl works now, we only apply it if none is specified. We should provide an option that would *enforce* the specified TTL. Not allowing ttl-less {{INSERT}} or {{UPDATE}}, not allowing ttl that's lower or higher than the default ttl, and not allowing deletes. That option when enabled ({{force_default_ttl}}) should allow us to drop more tables during compaction and do so cheaper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9671) sum() and avg() functions missing for smallint and tinyint types
Aleksey Yeschenko created CASSANDRA-9671: Summary: sum() and avg() functions missing for smallint and tinyint types Key: CASSANDRA-9671 URL: https://issues.apache.org/jira/browse/CASSANDRA-9671 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Robert Stupp Fix For: 2.2.x {{AggregateFcts}} does not define {{sum()}} and {{avg()}} aggregates for the new {{tinyint}} and {{smallint}} types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9672) Provide a per-table param that would force default ttl on all updates
[ https://issues.apache.org/jira/browse/CASSANDRA-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-9672: - Description: Many users have tables that rely on TTL entirely - no deletes, and only fixed TTL value. The way that default ttl works now, we only apply it if none is specified. We should provide an option that would *enforce* the specified TTL. Not allowing ttl-less {{INSERT}} or {{UPDATE}}, not allowing ttl that's lower or higher than the default ttl, and not allowing deletes. That option when enabled ({{force_default_ttl}}) should allow us to drop more tables during compaction and do so cheaper. Would also allow the DBAs to enforce the constraint in a guaranteed manner. was: Many users have tables that don't rely on TTL entirely - no deletes, and only fixed TTL value. The way that default ttl works now, we only apply it if none is specified. We should provide an option that would *enforce* the specified TTL. Not allowing ttl-less {{INSERT}} or {{UPDATE}}, not allowing ttl that's lower or higher than the default ttl, and not allowing deletes. That option when enabled ({{force_default_ttl}}) should allow us to drop more tables during compaction and do so cheaper. Provide a per-table param that would force default ttl on all updates - Key: CASSANDRA-9672 URL: https://issues.apache.org/jira/browse/CASSANDRA-9672 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Priority: Minor Many users have tables that rely on TTL entirely - no deletes, and only fixed TTL value. The way that default ttl works now, we only apply it if none is specified. We should provide an option that would *enforce* the specified TTL. Not allowing ttl-less {{INSERT}} or {{UPDATE}}, not allowing ttl that's lower or higher than the default ttl, and not allowing deletes. That option when enabled ({{force_default_ttl}}) should allow us to drop more tables during compaction and do so cheaper. Would also allow the DBAs to enforce the constraint in a guaranteed manner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604790#comment-14604790 ] Aleksey Yeschenko commented on CASSANDRA-9318: -- bq. Yes. My point is that if we start by not accepting more than we can handle coordinator-side we (a) improve things immediately by a nontrivial amount and (b) we will have more clarity on what needs to be done replica-side. Right. I'm all for flow control, in principle, and I'm not insisting on doing it comprehensively in one ticket, or even one version. Sorry if I made it non-clear. Not sure if it can be meaningfully done in 2.1, or that any of the suggested options are workable - will reply later. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9670) Cannot run CQL scripts on Windows AND having error Ubuntu Linux
[ https://issues.apache.org/jira/browse/CASSANDRA-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Patel updated CASSANDRA-9670: Environment: DataStax Community Edition on Windows 7, 64 Bit and Ubuntu was:Windows 7, 64 Bit and Ubuntu Cannot run CQL scripts on Windows AND having error Ubuntu Linux --- Key: CASSANDRA-9670 URL: https://issues.apache.org/jira/browse/CASSANDRA-9670 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax Community Edition on Windows 7, 64 Bit and Ubuntu Reporter: Sanjay Patel Fix For: 2.1.7 Attachments: cities.cql After installation of 2.1.6 and 2.1.7 it is not possible to execute cql scripts, which were earlier executed on windows + Linux environment successfully. I have tried to install Python 2 latest version and try to execute, but having same error. Attaching cities.cql for reference. --- cqlsh source 'shoppoint_setup.cql' ; shoppoint_setup.cql:16:InvalidRequest: code=2200 [Invalid query] message=Keyspace 'shopping' does not exist shoppoint_setup.cql:647:'ascii' codec can't decode byte 0xc3 in position 57: ordinal not in range(128) cities.cql:9:'ascii' codec can't decode byte 0xc3 in position 51: ordinal not in range(128) cities.cql:14: Error starting import process: cities.cql:14:Can't pickle type 'thread.lock': it's not found as thread.lock cities.cql:14:can only join a started process cities.cql:16: Error starting import process: cities.cql:16:Can't pickle type 'thread.lock': it's not found as thread.lock cities.cql:16:can only join a started process Traceback (most recent call last): File string, line 1, in module File I:\programm\python2710\lib\multiprocessing\forking.py, line 380, in main prepare(preparation_data) File I:\programm\python2710\lib\multiprocessing\forking.py, line 489, in prepare Traceback (most recent call last): File string, line 1, in module file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named cqlsh File I:\programm\python2710\lib\multiprocessing\forking.py, line 380, in main prepare(preparation_data) File I:\programm\python2710\lib\multiprocessing\forking.py, line 489, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named cqlsh shoppoint_setup.cql:663:'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128) ipcache.cql:28:ServerError: ErrorMessage code= [Server error] message=java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: I:\var\lib\cassandra\data\syste m\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-300-Data.db (The process cannot access the file because it is being used by another process) ccavn_bulkupdate.cql:75:ServerError: ErrorMessage code= [Server error] message=java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: I:\var\lib\cassandra\d ata\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-tmplink-ka-339-Data.db (The process cannot access the file because it is being used by another process) shoppoint_setup.cql:680:'ascii' codec can't decode byte 0xe2 in position 14: ordinal not in range(128) - In one of Ubuntu development environment we have similar errors. - shoppoint_setup.cql:647:'ascii' codec can't decode byte 0xc3 in position 57: ordinal not in range(128) cities.cql:9:'ascii' codec can't decode byte 0xc3 in position 51: ordinal not in range(128) (corresponding line) COPY cities (city,country_code,state,isactive) FROM 'testdata/india_cities.csv' ; [19:53:18] j.basu: shoppoint_setup.cql:663:'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604775#comment-14604775 ] Jonathan Ellis edited comment on CASSANDRA-9318 at 6/28/15 4:52 PM: Replica and coordinator are only identical on writes when RF=1, hardly the most common case. Nor is it a good idea to try to allow extra reads when write capacity is full or vice versa. They both ultimately use the same resources (cpu, heap, disk i/o). was (Author: jbellis): Replica and coordinator are only identical on writes when RF=1. Nor is it a good idea to try to allow extra reads when write capacity is full or vice versa. They both ultimately use the same resources (cpu, heap, disk i/o). Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604775#comment-14604775 ] Jonathan Ellis commented on CASSANDRA-9318: --- Replica and coordinator are only identical on writes when RF=1. Nor is it a good idea to try to allow extra reads when write capacity is full or vice versa. They both ultimately use the same resources (cpu, heap, disk i/o). Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604778#comment-14604778 ] Aleksey Yeschenko commented on CASSANDRA-9318: -- bq. Nor is it a good idea to try to allow extra reads when write capacity is full or vice versa. Oh, I'm most definitely not suggesting that. I want very tight bounds on reads and writes, separately. Some extra breathing room on writes should not allow for more reads (of vice versa). I care more about the tail latencies than raw throughput, and the former is an issue for us more than the latter is. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604779#comment-14604779 ] Aleksey Yeschenko commented on CASSANDRA-9318: -- bq. Replica and coordinator are only identical on writes when RF=1, hardly the most common case. In non-toy clusters, with tons of clients, even at RF=1 they aren't, no disagreement here. What I meant was that with the roles collocated on the same machines (one request's replica is another node's coordinator), it's insufficient to only handle protecting 'the coordinator' - an OOMd node is an OOMd node. Eventually it has to be full-node. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604783#comment-14604783 ] Jonathan Ellis commented on CASSANDRA-9318: --- Yes. My point is that if we start by not accepting more than we can handle coordinator-side we (a) improve things immediately by a nontrivial amount and (b) we will have more clarity on what needs to be done replica-side. (I don't think it's clear at all whether we will need better load shedding, actual cross-network backpressure, or something else.) Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8479) Timeout Exception on Node Failure in Remote Data Center
[ https://issues.apache.org/jira/browse/CASSANDRA-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604787#comment-14604787 ] Anuj Wadehra commented on CASSANDRA-8479: - @Sam I think it's an issue. As you mentioned in comment: At least one of the digests doesn't match, triggering a blocking full read against all the replicas that were sent digest requests - which includes the down node in the remote DC. Blocking full read must be triggered ONLY against the replicas that were sent digest requests, Why did digest requests went to Remote DC when read CL was LOCAL_QUORUM? This seems to be a major problem. Should I reopen the JIRA ? Timeout Exception on Node Failure in Remote Data Center --- Key: CASSANDRA-8479 URL: https://issues.apache.org/jira/browse/CASSANDRA-8479 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.11 Reporter: Amit Singh Chowdhery Assignee: Sam Tunnicliffe Priority: Minor Attachments: TRACE_LOGS.zip Issue Faced : We have a Geo-red setup with 2 Data centers having 3 nodes each. When we bring down a single Cassandra node down in DC2 by kill -9 Cassandra-pid, reads fail on DC1 with TimedOutException for a brief amount of time (15-20 sec~). Reference : Already a ticket has been opened/resolved and link is provided below : https://issues.apache.org/jira/browse/CASSANDRA-8352 Activity Done as per Resolution Provided : Upgraded to Cassandra 2.0.11 . We have two 3 node clusters in two different DCs and if one or more of the nodes go down in one Data Center , ~5-10% traffic failure is observed on the other. CL: LOCAL_QUORUM RF=3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9670) Cannot run CQL scripts on Windows AND having error Ubuntu Linux
Sanjay Patel created CASSANDRA-9670: --- Summary: Cannot run CQL scripts on Windows AND having error Ubuntu Linux Key: CASSANDRA-9670 URL: https://issues.apache.org/jira/browse/CASSANDRA-9670 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows 7, 64 Bit and Ubuntu Reporter: Sanjay Patel Fix For: 2.1.7 Attachments: cities.cql After installation of 2.1.6 and 2.1.7 it is not possible to execute cql scripts, which were earlier executed on windows + Linux environment successfully. I have tried to install Python 2 latest version and try to execute, but having same error. Attaching cities.cql for reference. --- cqlsh source 'shoppoint_setup.cql' ; shoppoint_setup.cql:16:InvalidRequest: code=2200 [Invalid query] message=Keyspace 'shopping' does not exist shoppoint_setup.cql:647:'ascii' codec can't decode byte 0xc3 in position 57: ordinal not in range(128) cities.cql:9:'ascii' codec can't decode byte 0xc3 in position 51: ordinal not in range(128) cities.cql:14: Error starting import process: cities.cql:14:Can't pickle type 'thread.lock': it's not found as thread.lock cities.cql:14:can only join a started process cities.cql:16: Error starting import process: cities.cql:16:Can't pickle type 'thread.lock': it's not found as thread.lock cities.cql:16:can only join a started process Traceback (most recent call last): File string, line 1, in module File I:\programm\python2710\lib\multiprocessing\forking.py, line 380, in main prepare(preparation_data) File I:\programm\python2710\lib\multiprocessing\forking.py, line 489, in prepare Traceback (most recent call last): File string, line 1, in module file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named cqlsh File I:\programm\python2710\lib\multiprocessing\forking.py, line 380, in main prepare(preparation_data) File I:\programm\python2710\lib\multiprocessing\forking.py, line 489, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named cqlsh shoppoint_setup.cql:663:'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128) ipcache.cql:28:ServerError: ErrorMessage code= [Server error] message=java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: I:\var\lib\cassandra\data\syste m\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-300-Data.db (The process cannot access the file because it is being used by another process) ccavn_bulkupdate.cql:75:ServerError: ErrorMessage code= [Server error] message=java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: I:\var\lib\cassandra\d ata\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-tmplink-ka-339-Data.db (The process cannot access the file because it is being used by another process) shoppoint_setup.cql:680:'ascii' codec can't decode byte 0xe2 in position 14: ordinal not in range(128) - In one of Ubuntu development environment we have similar errors. - shoppoint_setup.cql:647:'ascii' codec can't decode byte 0xc3 in position 57: ordinal not in range(128) cities.cql:9:'ascii' codec can't decode byte 0xc3 in position 51: ordinal not in range(128) (corresponding line) COPY cities (city,country_code,state,isactive) FROM 'testdata/india_cities.csv' ; [19:53:18] j.basu: shoppoint_setup.cql:663:'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9649) Paxos ballot in StorageProxy could clash
[ https://issues.apache.org/jira/browse/CASSANDRA-9649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605000#comment-14605000 ] Stefania commented on CASSANDRA-9649: - I've run the tests one more time. Here is my analysis, all tests are either flacky and have failed at least once in the past on the unpatched branches, or quite simply fail all the time. For some we definitely have JIRAs but I have not mentioned them here as I don't have the numbers at hand. * 2.0 testall: ** no failures * 2.0 dtest: ** jmxmetrics_test.TestJMXMetrics.begin_test: failed on unpatched 2.0, build #80 ** counter_tests.TestCounters.upgrade_test: failed on unpatched 2.0, build #78 ** compaction_test.TestCompaction_with_DateTieredCompactionStrategy.sstable_deletion_test: failed on unpatched 2.0, build #80 ** compaction_test.TestCompaction_with_SizeTieredCompactionStrategy.sstable_deletion_test: failed on unpatched 2.0, build #80 ** thrift_hsha_test.ThriftHSHATest.test_closing_connections: failed on unpatched 2.0, build #76 ** repair_test.TestRepair.dc_repair_test: failed on unpatched 2.0, build #69 ** paging_test.TestPagingWithDeletions.test_single_partition_deletions: failed on unpatched 2.0, build #72 ** paging_test.TestPagingWithDeletions.test_single_row_deletions: failed on unpatched 2.0, build #62 * 2.1 testall: ** org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution: failed on unpatched 2.1, build #106 * 2.1 dtest: ** jmxmetrics_test.TestJMXMetrics.begin_test: failed on unpatched 2.1, build #154 ** repair_test.TestRepair.dc_repair_test: failed on unpatched 2.1, build #154 ** compaction_test.TestCompaction_with_DateTieredCompactionStrategy.sstable_deletion_test: failed on unpatched 2.1, build #154 ** compaction_test.TestCompaction_with_SizeTieredCompactionStrategy.sstable_deletion_test: failed on unpatched 2.1, build #154 ** thrift_hsha_test.ThriftHSHATest.test_closing_connections: failed on unpatched 2.1, build #102 ** upgrade_supercolumns_test.TestSCUpgrade.upgrade_with_counters_test: failed on unpatched 2.1, build #122 ** cql_tests.MiscellaneousCQLTester.prepared_statement_invalidation_test: failed on unpatched 2.1, build #149 * 2.2 testall: ** org.apache.cassandra.db.lifecycle.ViewTest.testSSTablesInBounds: failed on unpatched 2.2, build #88 ** org.apache.cassandra.db.lifecycle.ViewTest.testSSTablesInBounds-compression: failed on unpatched 2.2, build #88 * 2.2 dtest: ** consistency_test.TestConsistency.short_read_test: failed on unpatched 2.2, build #72 ** jmx_test.TestJMX.cfhistograms_test: failed on unpatched 2.2, build #115 ** upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test: failed on unpatched 2.2, build #120 ** compaction_test.TestCompaction_with_DateTieredCompactionStrategy.sstable_deletion_test: failed on unpatched 2.2, build #120 ** compaction_test.TestCompaction_with_LeveledCompactionStrategy.sstable_deletion_test: failed on unpatched 2.2, build #120 Paxos ballot in StorageProxy could clash Key: CASSANDRA-9649 URL: https://issues.apache.org/jira/browse/CASSANDRA-9649 Project: Cassandra Issue Type: Bug Reporter: Stefania Assignee: Stefania Priority: Minor This code in {{StorageProxy.beginAndRepairPaxos()}} takes a timestamp in microseconds but divides it by 1000 before adding one. So if the summary is null, ballotMillis would be the same for up to 1000 possible state timestamp values: {code} long currentTime = (state.getTimestamp() / 1000) + 1; long ballotMillis = summary == null ? currentTime : Math.max(currentTime, 1 + UUIDGen.unixTimestamp(summary.mostRecentInProgressCommit.ballot)); UUID ballot = UUIDGen.getTimeUUID(ballotMillis); {code} {{state.getTimestamp()}} returns the time in micro seconds and it ensures to add one microsecond to any previously used timestamp if the client sends the same or an older timestamp. Initially I used this code in {{ModificationStatement.casInternal()}}, introduced by CASSANDRA-9160 to support cas unit tests, but occasionally these tests were failing. It was only when I ensured uniqueness of the ballot that the tests started to pass reliably. I wonder if we could ever have the same issue in StorageProxy? cc [~jbellis] and [~slebresne] for CASSANDRA-7801 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9673) Improve batchlog write path
[ https://issues.apache.org/jira/browse/CASSANDRA-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605020#comment-14605020 ] Aleksey Yeschenko commented on CASSANDRA-9673: -- Marking as 3.X because it's not blocking 3.0, but it would be nice to have it in before the RC happens. Improve batchlog write path --- Key: CASSANDRA-9673 URL: https://issues.apache.org/jira/browse/CASSANDRA-9673 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Fix For: 3.x Currently we allocate an on-heap {{ByteBuffer}} to serialize the batched mutations into, before sending it to a distant node, generating unnecessary garbage (potentially a lot of it). With materialized views using the batchlog, it would be nice to optimise the write path: - introduce a new verb ({{Batch}}) - introduce a new message ({{BatchMessage}}) that would encapsulate the mutations, expiration, and creation time (similar to {{HintMessage}} in CASSANDRA-6230) - have MS serialize it directly instead of relying on an intermediate buffer To avoid merely shifting the temp buffer to the receiving side(s) we should change the structure of the batchlog table to use a list or a map of individual mutations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9673) Improve batchlog write path
Aleksey Yeschenko created CASSANDRA-9673: Summary: Improve batchlog write path Key: CASSANDRA-9673 URL: https://issues.apache.org/jira/browse/CASSANDRA-9673 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Fix For: 3.x Currently we allocate an on-heap {{ByteBuffer}} to serialize the batched mutations into, before sending it to a distant node, generating unnecessary garbage (potentially a lot of it). With materialized views using the batchlog, it would be nice to optimise the write path: - introduce a new verb ({{Batch}}) - introduce a new message ({{BatchMessage}}) that would encapsulate the mutations, expiration, and creation time (similar to {{HintMessage}} in CASSANDRA-6230) - have MS serialize it directly instead of relying on an intermediate buffer To avoid merely shifting the temp buffer to the receiving side(s) we should change the structure of the batchlog table to use a list or a map of individual mutations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9448) Metrics should use up to date nomenclature
[ https://issues.apache.org/jira/browse/CASSANDRA-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605014#comment-14605014 ] Stefania commented on CASSANDRA-9448: - bq. First of all, isn't it better to keep old names as deprecated until we stop supporting 2.2? Agreed but how? I could not find a way to add a deprecated tag or alias name to metrics or JMX beans. Do you know a way to do this or did you simply mean to keep duplicated metrics and just deprecate in the documentation? bq. In StorageService, there are some operations referring ColumnFamily. We should rename these also. Could you be more specific and list the methods that feed the metrics? I merely renamed the methods feeding the metrics or their parents for consistency as otherwise it would be very confusing. I did not mean to change every single instance of 'ColumnFamily' in the code. That would be best done in a dedicated ticket after CASSANDRA-8099 has been merged. Metrics should use up to date nomenclature -- Key: CASSANDRA-9448 URL: https://issues.apache.org/jira/browse/CASSANDRA-9448 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Sam Tunnicliffe Assignee: Stefania Labels: docs-impacting, jmx Fix For: 3.0 beta 1 There are a number of exposed metrics that currently are named using the old nomenclature of columnfamily and rows (meaning partitions). It would be good to audit all metrics and update any names to match what they actually represent; we should probably do that in a single sweep to avoid a confusing mixture of old and new terminology. As we'd need to do this in a major release, I've initially set the fixver for 3.0 beta1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9232) timestamp is considered as a reserved keyword in cqlsh completion
[ https://issues.apache.org/jira/browse/CASSANDRA-9232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605053#comment-14605053 ] Stefania commented on CASSANDRA-9232: - Unfortunately the cql keywords exported by the new version of the driver are too many. We now have 123 keywords but python regular expressions only support 100 named groups maximum: {code} Traceback (most recent call last): File /home/stefania/git/cstar/cassandra/bin/cqlsh, line 2459, in module main(*read_options(sys.argv[1:], os.environ)) File /home/stefania/git/cstar/cassandra/bin/cqlsh, line 2451, in main shell.cmdloop() File /home/stefania/git/cstar/cassandra/bin/cqlsh, line 942, in cmdloop if self.onecmd(self.statement.getvalue()): File /home/stefania/git/cstar/cassandra/bin/cqlsh, line 959, in onecmd statements, in_batch = cqlruleset.cql_split_statements(statementtext) File /home/stefania/git/cstar/cassandra/bin/../pylib/cqlshlib/cqlhandling.py, line 143, in cql_split_statements tokens = self.lex(text) File /home/stefania/git/cstar/cassandra/bin/../pylib/cqlshlib/pylexotron.py, line 447, in lex self.scanner = self.make_lexer() File /home/stefania/git/cstar/cassandra/bin/../pylib/cqlshlib/pylexotron.py, line 443, in make_lexer return SaferScanner(regexes, re.I | re.S).scan File /home/stefania/git/cstar/cassandra/bin/../pylib/cqlshlib/saferscanner.py, line 37, in __init__ self.scanner = re.sre_compile.compile(p) File /usr/lib/python2.7/sre_compile.py, line 509, in compile sorry, but this version only supports 100 named groups AssertionError: sorry, but this version only supports 100 named groups {code} There is a third party module that might do the job, [regex|https://pypi.python.org/pypi/regex], but it would require changing safescanner.py and adding one more dependency. Can we do without named groups and add separate map alternatively? timestamp is considered as a reserved keyword in cqlsh completion --- Key: CASSANDRA-9232 URL: https://issues.apache.org/jira/browse/CASSANDRA-9232 Project: Cassandra Issue Type: Bug Reporter: Michaël Figuière Assignee: Stefania Priority: Trivial Labels: cqlsh Fix For: 3.x, 2.1.x cqlsh seems to treat timestamp as a reserved keyword when used as an identifier: {code} cqlsh:ks1 create table t1 (int int primary key, ascii ascii, bigint bigint, blob blob, boolean boolean, date date, decimal decimal, double double, float float, inet inet, text text, time time, timestamp timestamp, timeuuid timeuuid, uuid uuid, varchar varchar, varint varint); {code} Leads to the following completion when building an {{INSERT}} statement: {code} cqlsh:ks1 insert into t1 (int, timestamp ascii bigint blobboolean date decimal double float inettexttime timeuuiduuidvarchar varint {code} timestamp is a keyword but not a reserved one and should therefore not be proposed as a quoted string. It looks like this error happens only for timestamp. Not a big deal of course, but it might be worth reviewing the keywords treated as reserved in cqlsh, especially with the many changes introduced in 3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9619) Read performance regression in tables with many columns on trunk and 2.2 vs. 2.1
[ https://issues.apache.org/jira/browse/CASSANDRA-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9619: -- Assignee: Benedict Fix Version/s: (was: 2.2.0 rc2) 2.2.x 2.1.x I think we should figure out why preemptive open causes a slowdown for this workload but I don't think it should block 2.2.0. Read performance regression in tables with many columns on trunk and 2.2 vs. 2.1 Key: CASSANDRA-9619 URL: https://issues.apache.org/jira/browse/CASSANDRA-9619 Project: Cassandra Issue Type: Bug Reporter: Jim Witschey Assignee: Benedict Labels: perfomance Fix For: 2.1.x, 2.2.x There seems to be a regression in read in 2.2 and trunk, as compared to 2.1 and 2.0. I found it running cstar_perf jobs with 50-column tables. 2.2 may be worse than trunk, though my results on that aren't consistent. The relevant cstar_perf jobs are here: http://cstar.datastax.com/tests/id/273e2ea8-0fc8-11e5-816c-42010af0688f http://cstar.datastax.com/tests/id/3a8002d6-1480-11e5-97ff-42010af0688f http://cstar.datastax.com/tests/id/40ff2766-1248-11e5-bac8-42010af0688f The sequence of commands for these jobs is {code} stress write n=6500 -rate threads=300 -col n=FIXED\(50\) stress read n=6500 -rate threads=300 stress read n=6500 -rate threads=300 {code} Have a look at the operations per second going from [the first read operation|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=729.08ymin=0ymax=174379.7] to [the second read operation|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=729.08ymin=0ymax=174379.7]. They've fallen from ~135K to ~100K comparing trunk to 2.1 and 2.0. It's slightly worse for 2.2, and 2.2 operations per second fall continuously from the first to the second read operation. There's a corresponding increase in read latency -- it's noticable on trunk and pretty bad on 2.2. Again, the latency gets higher and higher on 2.2 as the read operations progress (see the graphs [here|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688fmetric=95th_latencyoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=729.08ymin=0ymax=17.27] and [here|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688fmetric=95th_latencyoperation=3_readsmoothing=1show_aggregates=truexmin=0xmax=928.62ymin=0ymax=14.52]). I see a similar regression in a [more recent test|http://cstar.datastax.com/graph?stats=40ff2766-1248-11e5-bac8-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=752.62ymin=0ymax=171799.1], though in this one trunk performed worse than 2.2. This run also didn't display the increasing latency in 2.2. This regression may show for smaller numbers of columns, but not as prominently, as shown [in the results to this test with the stress default of 5 columns|http://cstar.datastax.com/graph?stats=227cb89e-0fc8-11e5-9f14-42010af0688fmetric=99.9th_latencyoperation=3_readsmoothing=1show_aggregates=truexmin=0xmax=498.19ymin=0ymax=334.29]. There's an increase in latency variability on trunk and 2.2, but I don't see a regression in summary statistics. My measurements aren't confounded by [the recent regression in cassandra-stress|https://issues.apache.org/jira/browse/CASSANDRA-9558]; cstar_perf uses the same stress program (from trunk) on all versions on the cluster. I'm currently working to - reproduce with a smaller workload so this is easier to bisect and debug. - get results with larger numbers of columns, since we've seen the regression on 50 columns but not the stress default of 5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9064) [LeveledCompactionStrategy] cqlsh can't run cql produced by its own describe table statement
[ https://issues.apache.org/jira/browse/CASSANDRA-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605076#comment-14605076 ] Jonathan Ellis commented on CASSANDRA-9064: --- Did the python driver ship as planned? Can you integrate it, Benjamin? [LeveledCompactionStrategy] cqlsh can't run cql produced by its own describe table statement Key: CASSANDRA-9064 URL: https://issues.apache.org/jira/browse/CASSANDRA-9064 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.1.3 on mac os x Reporter: Sujeet Gholap Assignee: Adam Holmberg Labels: cqlsh Fix For: 2.2.0 rc2, 2.1.8 Here's how to reproduce: 1) Create a table with LeveledCompactionStrategy CREATE keyspace foo WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor' : 3}; CREATE TABLE foo.bar ( spam text PRIMARY KEY ) WITH compaction = {'class': 'LeveledCompactionStrategy'}; 2) Describe the table and save the output cqlsh -e describe table foo.bar Output should be something like CREATE TABLE foo.bar ( spam text PRIMARY KEY ) WITH bloom_filter_fp_chance = 0.1 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; 3) Save the output to repro.cql 4) Drop the table foo.bar cqlsh -e drop table foo.bar 5) Run the create table statement we saved cqlsh -f repro.cql 6) Expected: normal execution without an error 7) Reality: ConfigurationException: ErrorMessage code=2300 [Query invalid because of configuration issue] message=Properties specified [min_threshold, max_threshold] are not understood by LeveledCompactionStrategy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9636) Duplicate columns in selection causes AssertionError
[ https://issues.apache.org/jira/browse/CASSANDRA-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9636: -- Fix Version/s: (was: 2.2.0 rc2) Duplicate columns in selection causes AssertionError Key: CASSANDRA-9636 URL: https://issues.apache.org/jira/browse/CASSANDRA-9636 Project: Cassandra Issue Type: Bug Reporter: Sam Tunnicliffe Assignee: Sam Tunnicliffe Fix For: 2.1.x, 2.0.x Prior to CASSANDRA-9532, unaliased duplicate fields in a selection would be silently ignored. Now, they trigger a server side exception and an unfriendly error response, which we should clean up. Duplicate columns *with* aliases are not affected. {code} CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; CREATE TABLE ks.t1 (k int PRIMARY KEY, v int); INSERT INTO ks.t2 (k, v) VALUES (0, 0); SELECT k, v FROM ks.t2; SELECT k, v, v AS other_v FROM ks.t2; SELECT k, v, v FROM ks.t2; {code} The final statement results in this error response server side stacktrace: {code} ServerError: ErrorMessage code= [Server error] message=java.lang.AssertionError ERROR 13:01:30 Unexpected exception during request; channel = [id: 0x44d22e61, /127.0.0.1:39463 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[main/:na] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.build(Selection.java:355) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1226) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:299) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:238) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:67) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:260) ~[main/:na] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [main/:na] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [main/:na] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [main/:na] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] {code} This issue also presents on the head of the 2.2 branch and on 2.0.16. However, the prior behaviour is different on both of those branches. In the 2.0 line prior to CASSANDRA-9532, duplicate columns would actually be included in the results, as opposed to being silently dropped as per 2.1.x In 2.2, the assertion error seen above precedes CASSANDRA-9532 and is also triggered for both aliased and unaliased duplicate columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9528) Improve log output from unit tests
[ https://issues.apache.org/jira/browse/CASSANDRA-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9528: -- Fix Version/s: (was: 3.0 beta 1) 3.0.x Improve log output from unit tests -- Key: CASSANDRA-9528 URL: https://issues.apache.org/jira/browse/CASSANDRA-9528 Project: Cassandra Issue Type: Test Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0.x * Single log output file per suite * stdout/stderr to the same log file with proper interleaving * Don't interleave interactive output from unit tests run concurrently to the console. Print everything about the test once the test has completed. * Fetch and compress log files as part of artifacts collected by cassci -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6237) Allow range deletions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6237: -- Fix Version/s: (was: 3.0 beta 1) 3.0.0 rc1 Allow range deletions in CQL Key: CASSANDRA-6237 URL: https://issues.apache.org/jira/browse/CASSANDRA-6237 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Benjamin Lerer Priority: Minor Labels: cql, docs Fix For: 3.0.0 rc1 Attachments: CASSANDRA-6237.txt We uses RangeTombstones internally in a number of places, but we could expose more directly too. Typically, given a table like: {noformat} CREATE TABLE events ( id text, created_at timestamp, content text, PRIMARY KEY (id, created_at) ) {noformat} we could allow queries like: {noformat} DELETE FROM events WHERE id='someEvent' AND created_at 'Jan 3, 2013'; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9505) Expose sparse formatting via JMX and/or sstablemetadata
[ https://issues.apache.org/jira/browse/CASSANDRA-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9505: -- Fix Version/s: (was: 3.0 beta 1) 3.0.0 rc1 Expose sparse formatting via JMX and/or sstablemetadata --- Key: CASSANDRA-9505 URL: https://issues.apache.org/jira/browse/CASSANDRA-9505 Project: Cassandra Issue Type: Improvement Reporter: Jim Witschey Assignee: Sylvain Lebresne Fix For: 3.0.0 rc1 It'd be helpful for us in TE if we could differentiate between data written in the sparse and dense formats as described [here|https://github.com/pcmanus/cassandra/blob/8099/guide_8099.md#storage-format-on-disk-and-on-wire]. It'd help us to measure speed and space performance and to make sure the format is chosen correctly and consistently. I don't know if this would be best exposed through a JMX endpoint, {{sstablemetadata}}, or both, but those seem like the most obvious exposure points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9554) Avoid digest mismatch storm on upgade to 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-9554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9554: -- Fix Version/s: (was: 3.0 beta 1) 3.0.0 rc1 Avoid digest mismatch storm on upgade to 3.0 Key: CASSANDRA-9554 URL: https://issues.apache.org/jira/browse/CASSANDRA-9554 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Tyler Hobbs Fix For: 3.0.0 rc1 CASSANDRA-8099, in {{UnfilteredRowIterators.digest()}}: {code} // TODO: we're not computing digest the same way that old nodes. This // means we'll have digest mismatches during upgrade. We should pass the messaging version of // the node this is for (which might mean computing the digest last, and won't work // for schema (where we announce the version through gossip to everyone)) {code} In a mixed 2.1(2.2) - 3.0 clusters, we need to calculate both digest at the same time and keep both results, and send the appropriate one, depending on receiving nodes' messaging versions. Do that until {{MessagingService.allNodesAtLeast30()}} is true (this is not unprecedented). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9650) CRC32Factory hack can be removed in trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9650: -- Fix Version/s: (was: 3.0 beta 1) 3.0.0 rc1 CRC32Factory hack can be removed in trunk - Key: CASSANDRA-9650 URL: https://issues.apache.org/jira/browse/CASSANDRA-9650 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Priority: Minor Fix For: 3.0.0 rc1 Since we now require Java 8, we can remove the hack for compiling on earlier VMs, and ni fact remove PureJavaCRC32 altogether. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605077#comment-14605077 ] Jonathan Ellis commented on CASSANDRA-9522: --- /cc [~mambocab] Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605085#comment-14605085 ] Stefania commented on CASSANDRA-9658: - I've run an additional test on cperf and here are [the results|http://cstar.datastax.com/graph?stats=9de58a92-1e0a-11e5-bede42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=198.11ymin=0ymax=270914.6] on [blade_11_b|http://cstar.datastax.com/cluster/specs]. The difference between standard and mmap on trunk is about 55k (229,213 vs 175,327) confirming what's already observed in the previous tests. However 8894 reduces the difference somewhat (230,555 vs 207,208). What was the difference when you last tested? The 8894 branch is based on trunk but has the latest page alignment optimizations, CASSANDRA-8894, which are dependent on the page aligned buffers, CASSANDRA-8897, already on trunk but not in 2.2. I'm happy to spend more time to see if there are further optimization to reduce this difference or fix any regressions that contributed to increasing it in the first place. The cleanup ticket that removes temporary descriptors, CASSANDRA-7066, is actually targeted to trunk only, not 2.2. Is this the ticket we need to re-enable mmap on windows (I seem to recall this is the case from a comment posted there) or are CASSANDRA-8893 and CASSANDRA-8984 sufficient? Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)