[jira] [Updated] (CASSANDRA-10056) Fix AggregationTest post-test error messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10056: --- Reviewer: Benjamin Lerer [~blerer] to review Fix AggregationTest post-test error messages Key: CASSANDRA-10056 URL: https://issues.apache.org/jira/browse/CASSANDRA-10056 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Priority: Trivial Fix For: 2.2.x AggregationTest prints error messages after test execution since some UDT cannot be dropped. It's not critical to the tests themselves but makes the log cleaner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9237) Gossip messages subject to head of line blocking by other intra-cluster traffic
[ https://issues.apache.org/jira/browse/CASSANDRA-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694318#comment-14694318 ] Jonathan Ellis commented on CASSANDRA-9237: --- WFM. Gossip messages subject to head of line blocking by other intra-cluster traffic --- Key: CASSANDRA-9237 URL: https://issues.apache.org/jira/browse/CASSANDRA-9237 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0.0 rc1 Reported as an issue over less than perfect networks like VPNs between data centers. Gossip goes over the small message socket where small is 64k which isn't particularly small. This is done for performance to keep most traffic on one hot socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10060) Reuse TemporalRow when updating multiple MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694493#comment-14694493 ] Jonathan Ellis commented on CASSANDRA-10060: does this combine the batchlogs generated at the replica too? would that help? Reuse TemporalRow when updating multiple MaterializedViews -- Key: CASSANDRA-10060 URL: https://issues.apache.org/jira/browse/CASSANDRA-10060 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Fix For: 3.0.0 rc1 If a table has 5 associated MVs the current logic reads the existing row for the incoming mutation 5 times. If we reuse the data from the first MV update we can cut out any further reads. We know the existing data isn't changing because we are holding a lock on the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10052) Bring one node down, makes the whole cluster go down for a second
[ https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693539#comment-14693539 ] Jonathan Ellis commented on CASSANDRA-10052: How do you have clients connecting to non-localhost, if you've configured it to listen on localhost? Bring one node down, makes the whole cluster go down for a second - Key: CASSANDRA-10052 URL: https://issues.apache.org/jira/browse/CASSANDRA-10052 Project: Cassandra Issue Type: Bug Reporter: Sharvanath Pathak Priority: Critical When a node goes down, the other nodes learn that through the gossip. And I do see the log from (Gossiper.java): {code} private void markDead(InetAddress addr, EndpointState localState) { if (logger.isTraceEnabled()) logger.trace(marking as down {}, addr); localState.markDead(); liveEndpoints.remove(addr); unreachableEndpoints.put(addr, System.nanoTime()); logger.info(InetAddress {} is now DOWN, addr); for (IEndpointStateChangeSubscriber subscriber : subscribers) subscriber.onDead(addr, localState); if (logger.isTraceEnabled()) logger.trace(Notified + subscribers); } {code} Saying: InetAddress 192.168.101.1 is now Down, in the Cassandra's system log. Now on all the other nodes the client side (java driver) says, Cannot connect to any host, scheduling retry in 1000 milliseconds. They eventually do reconnect but some queries fail during this intermediate period. To me it seems like when the server pushes the nodeDown event, it call the getRpcAddress(endpoint), and thus sends localhost as the argument in the nodeDown event. As in org.apache.cassandra.transport.Server.java {code} public void onDown(InetAddress endpoint) { server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint), server.socket.getPort())); } {code} the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is using localhost as the configuration for rpc_address (which by the way is the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9917) MVs should validate gc grace seconds on the tables involved
[ https://issues.apache.org/jira/browse/CASSANDRA-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9917: -- Assignee: Paulo Motta (was: Carl Yeksigian) MVs should validate gc grace seconds on the tables involved --- Key: CASSANDRA-9917 URL: https://issues.apache.org/jira/browse/CASSANDRA-9917 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Paulo Motta Labels: materializedviews Fix For: 3.0 beta 1 For correctness reasons (potential resurrection of dropped values), batchlog entries are TTLs with the lowest gc grace second of all the tables involved in a batch. It means that if gc gs is set to 0 in one of the tables, the batchlog entry will be dead on arrival, and never replayed. We should probably warn against such LOGGED writes taking place, in general, but for MVs, we must validate that gc gs on the base table (and on the MV table, if we should allow altering gc gs there at all), is never set too low, or else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10045) Sparse/Dense decision should be made per-row, not per-file
[ https://issues.apache.org/jira/browse/CASSANDRA-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693545#comment-14693545 ] Jonathan Ellis commented on CASSANDRA-10045: Okay, but let's keep fixver targeted at the must have release. Sparse/Dense decision should be made per-row, not per-file -- Key: CASSANDRA-10045 URL: https://issues.apache.org/jira/browse/CASSANDRA-10045 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 3.0.0 rc1 Marking this as beta 1 in the hope I have time to rustle it up and get it reviewed beforehand. If I do not, I will let it slide, but our behaviour right now is not brilliant for workloads with a variance in density, and it should not be challenging to make a more targeted decision. We can also make use of CASSANDRA-9894 to make column encoding more efficient in many, even dense, cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10045) Sparse/Dense decision should be made per-row, not per-file
[ https://issues.apache.org/jira/browse/CASSANDRA-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10045: --- Fix Version/s: (was: 3.0 beta 1) 3.0.0 rc1 Sparse/Dense decision should be made per-row, not per-file -- Key: CASSANDRA-10045 URL: https://issues.apache.org/jira/browse/CASSANDRA-10045 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 3.0.0 rc1 Marking this as beta 1 in the hope I have time to rustle it up and get it reviewed beforehand. If I do not, I will let it slide, but our behaviour right now is not brilliant for workloads with a variance in density, and it should not be challenging to make a more targeted decision. We can also make use of CASSANDRA-9894 to make column encoding more efficient in many, even dense, cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second
[ https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10052: --- Assignee: Stefania I see. Sounds like we should just special-case it and not send anything from onDown if a peer listening on localhost goes down. Bringing one node down, makes the whole cluster go down for a second Key: CASSANDRA-10052 URL: https://issues.apache.org/jira/browse/CASSANDRA-10052 Project: Cassandra Issue Type: Bug Reporter: Sharvanath Pathak Assignee: Stefania Priority: Critical When a node goes down, the other nodes learn that through the gossip. And I do see the log from (Gossiper.java): {code} private void markDead(InetAddress addr, EndpointState localState) { if (logger.isTraceEnabled()) logger.trace(marking as down {}, addr); localState.markDead(); liveEndpoints.remove(addr); unreachableEndpoints.put(addr, System.nanoTime()); logger.info(InetAddress {} is now DOWN, addr); for (IEndpointStateChangeSubscriber subscriber : subscribers) subscriber.onDead(addr, localState); if (logger.isTraceEnabled()) logger.trace(Notified + subscribers); } {code} Saying: InetAddress 192.168.101.1 is now Down, in the Cassandra's system log. Now on all the other nodes the client side (java driver) says, Cannot connect to any host, scheduling retry in 1000 milliseconds. They eventually do reconnect but some queries fail during this intermediate period. To me it seems like when the server pushes the nodeDown event, it call the getRpcAddress(endpoint), and thus sends localhost as the argument in the nodeDown event. As in org.apache.cassandra.transport.Server.java {code} public void onDown(InetAddress endpoint) { server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint), server.socket.getPort())); } {code} the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is using localhost as the configuration for rpc_address (which by the way is the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second
[ https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10052: --- Reviewer: Olivier Michallat (was: Sylvain Lebresne) Bringing one node down, makes the whole cluster go down for a second Key: CASSANDRA-10052 URL: https://issues.apache.org/jira/browse/CASSANDRA-10052 Project: Cassandra Issue Type: Bug Reporter: Sharvanath Pathak Assignee: Stefania Priority: Critical When a node goes down, the other nodes learn that through the gossip. And I do see the log from (Gossiper.java): {code} private void markDead(InetAddress addr, EndpointState localState) { if (logger.isTraceEnabled()) logger.trace(marking as down {}, addr); localState.markDead(); liveEndpoints.remove(addr); unreachableEndpoints.put(addr, System.nanoTime()); logger.info(InetAddress {} is now DOWN, addr); for (IEndpointStateChangeSubscriber subscriber : subscribers) subscriber.onDead(addr, localState); if (logger.isTraceEnabled()) logger.trace(Notified + subscribers); } {code} Saying: InetAddress 192.168.101.1 is now Down, in the Cassandra's system log. Now on all the other nodes the client side (java driver) says, Cannot connect to any host, scheduling retry in 1000 milliseconds. They eventually do reconnect but some queries fail during this intermediate period. To me it seems like when the server pushes the nodeDown event, it call the getRpcAddress(endpoint), and thus sends localhost as the argument in the nodeDown event. As in org.apache.cassandra.transport.Server.java {code} public void onDown(InetAddress endpoint) { server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint), server.socket.getPort())); } {code} the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is using localhost as the configuration for rpc_address (which by the way is the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10049) Commitlog initialization failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10049: --- Fix Version/s: (was: 3.0 beta 1) 3.0.0 rc1 Commitlog initialization failure Key: CASSANDRA-10049 URL: https://issues.apache.org/jira/browse/CASSANDRA-10049 Project: Cassandra Issue Type: Bug Reporter: T Jake Luciani Assignee: Branimir Lambov Fix For: 3.0.0 rc1 I've encountered this error locally during some dtests. It looks like a race condition in the commit log code. http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/consistency_test/TestAccuracy/test_network_topology_strategy_users_2/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10045) Sparse/Dense decision should be made per-row, not per-file
[ https://issues.apache.org/jira/browse/CASSANDRA-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692307#comment-14692307 ] Jonathan Ellis commented on CASSANDRA-10045: Will this change the sstable format? If not there is no rush to get it in before b1. Sparse/Dense decision should be made per-row, not per-file -- Key: CASSANDRA-10045 URL: https://issues.apache.org/jira/browse/CASSANDRA-10045 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 3.0 beta 1 Marking this as beta 1 in the hope I have time to rustle it up and get it reviewed beforehand. If I do not, I will let it slide, but our behaviour right now is not brilliant for workloads with a variance in density, and it should not be challenging to make a more targeted decision. We can also make use of CASSANDRA-9894 to make column encoding more efficient in many, even dense, cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8887) Direct (de)compression of internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8887: -- Fix Version/s: 3.x Direct (de)compression of internode communication - Key: CASSANDRA-8887 URL: https://issues.apache.org/jira/browse/CASSANDRA-8887 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Matt Stump Assignee: Ariel Weisberg Fix For: 3.x Internode compression is on by default. Currently we allocate one set of buffers for the raw data, and then compress which results in another set of buffers. This greatly increases the GC load. We can decrease the GC load by doing direct compression/decompression of the communication buffers. This is the same work as done in CASSANDRA-8464 but applied to internode communication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-4175) Reduce memory, disk space, and cpu usage with a column name/id map
[ https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-4175. --- Resolution: Duplicate Assignee: (was: Jason Brown) Fix Version/s: (was: 3.x) Column name duplication is removed in CASSANDRA-8099. (See https://github.com/pcmanus/cassandra/blob/8099_engine_refactor/guide_8099.md.) (We can do slightly better by encoding column ids in the schema, but doing in on a per-sstable basis is almost as good from a disk space perspective.) IMO we should leave dealing with highly duplicated column *values* to the compression layer. Reduce memory, disk space, and cpu usage with a column name/id map -- Key: CASSANDRA-4175 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Labels: performance We spend a lot of memory on column names, both transiently (during reads) and more permanently (in the row cache). Compression mitigates this on disk but not on the heap. The overhead is significant for typical small column values, e.g., ints. Even though we intern once we get to the memtable, this affects writes too via very high allocation rates in the young generation, hence more GC activity. Now that CQL3 provides us some guarantees that column names must be defined before they are inserted, we could create a map of (say) 32-bit int column id, to names, and use that internally right up until we return a resultset to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9749) CommitLogReplayer continues startup after encountering errors
[ https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9749: -- Reviewer: Ariel Weisberg [~aweisberg] to review CommitLogReplayer continues startup after encountering errors - Key: CASSANDRA-9749 URL: https://issues.apache.org/jira/browse/CASSANDRA-9749 Project: Cassandra Issue Type: Bug Reporter: Blake Eggleston Assignee: Branimir Lambov Fix For: 2.2.x There are a few places where the commit log recovery method either skips sections or just returns when it encounters errors. Specifically if it can't read the header here: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298 Or if there are compressor problems here: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314 and here: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366 Whether these are user-fixable or not, I think we should require more direct user intervention (ie: fix what's wrong, or remove the bad file and restart) since we're basically losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9945) Add transparent data encryption core classes
[ https://issues.apache.org/jira/browse/CASSANDRA-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9945: -- Fix Version/s: (was: 3.x) 3.2 Add transparent data encryption core classes Key: CASSANDRA-9945 URL: https://issues.apache.org/jira/browse/CASSANDRA-9945 Project: Cassandra Issue Type: Improvement Reporter: Jason Brown Assignee: Jason Brown Labels: encryption Fix For: 3.2 This patch will add the core infrastructure classes necessary for transparent data encryption (file-level encryption), as required for CASSANDRA-6018 and CASSANDRA-9633. The phrase transparent data encryption, while not the most aesthetically pleasing, seems to be used throughout the database industry (Oracle, SQLQServer, Datastax Enterprise) to describe file level encryption, so we'll go with that, as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682256#comment-14682256 ] Jonathan Ellis commented on CASSANDRA-9882: --- I don't see any value in making this configurable. DTCS (maybe other strategies) can block flushing when there are lots of sstables Key: CASSANDRA-9882 URL: https://issues.apache.org/jira/browse/CASSANDRA-9882 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jeremiah Jordan Assignee: Marcus Eriksson Labels: dtcs Fix For: 2.1.x, 2.2.x MemtableFlushWriter tasks can get blocked by Compaction getNextBackgroundTask. This is in a wonky cluster with 200k sstables in the CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we are trying to make these new smart strategies that may take some time to calculate what to do. {noformat} MemtableFlushWriter:21 daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 waiting for monitor entry [0x7ff78a667000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237) - waiting to lock 0x0006fcdbbf60 (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518) at org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178) at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475) at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - 0x000743b3ac38 (a java.util.concurrent.ThreadPoolExecutor$Worker) MemtableFlushWriter:19 daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b waiting for monitor entry [0x7ff78b8ee000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237) - waiting to lock 0x0006fcdbbf60 (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518) at org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178) at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475) at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) CompactionExecutor:14 daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 runnable [0x7fecce3ea000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.io.sstable.SSTableReader.equals(SSTableReader.java:628) at com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:206) at com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:220) at com.google.common.collect.ImmutableSet.access$000(ImmutableSet.java:74) at com.google.common.collect.ImmutableSet$Builder.build(ImmutableSet.java:531) at com.google.common.collect.Sets$1.immutableCopy(Sets.java:606) at org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(ColumnFamilyStore.java:1352) at
[jira] [Updated] (CASSANDRA-9487) CommitLogTest hangs intermittently in 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9487: -- Reviewer: Ariel Weisberg Not merged yet AFAIK. Assigning aweisberg to review. CommitLogTest hangs intermittently in 2.0 - Key: CASSANDRA-9487 URL: https://issues.apache.org/jira/browse/CASSANDRA-9487 Project: Cassandra Issue Type: Bug Components: Tests Reporter: Michael Shuler Assignee: Branimir Lambov Fix For: 2.0.x Attachments: system.log Possibly related to CASSANDRA-8992 ? 2.0 unit tests are hanging periodically in the same way (I have not gone through all the branches, so can't say we're in the clear everywhere - marking for just 2.x at the moment). CommitLogTest hung system.log attached from local reproduction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682261#comment-14682261 ] Jonathan Ellis commented on CASSANDRA-4338: --- Is this obsoleted by CASSANDRA-9500? Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Branimir Lambov Priority: Minor Labels: performance Fix For: 2.1.x Attachments: 4338-gc.tar.gz, 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338.single_node.read.png, 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10047) nodetool aborts when attempting to cleanup a keyspace with no ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10047: --- Priority: Minor (was: Critical) nodetool aborts when attempting to cleanup a keyspace with no ranges Key: CASSANDRA-10047 URL: https://issues.apache.org/jira/browse/CASSANDRA-10047 Project: Cassandra Issue Type: Bug Components: Core Environment: 2.1.8 Reporter: Russell Bradberry Priority: Minor When running nodetool cleanup in a DC that has no ranges for a keyspace, nodetool will abort with the following message when attempting to cleanup that keyspace: {code} Aborted cleaning up atleast one column family in keyspace ks, check server logs for more information. error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} The error messages in the logs are : {code} CompactionManager.java:370 - Cleanup cannot run before a node has joined the ring {code} This behavior prevents subsequent keyspaces from getting cleaned up. The error message is also misleading as it suggests that the only reason a node may not have ranges for a keyspace is because it has yet to join the ring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8887) Direct (de)compression of internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8887: -- Priority: Minor (was: Major) Direct (de)compression of internode communication - Key: CASSANDRA-8887 URL: https://issues.apache.org/jira/browse/CASSANDRA-8887 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Matt Stump Assignee: Ariel Weisberg Priority: Minor Fix For: 3.x Internode compression is on by default. Currently we allocate one set of buffers for the raw data, and then compress which results in another set of buffers. This greatly increases the GC load. We can decrease the GC load by doing direct compression/decompression of the communication buffers. This is the same work as done in CASSANDRA-8464 but applied to internode communication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8457: -- Priority: Minor (was: Major) nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Priority: Minor Labels: performance Fix For: 3.x Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9259) Bulk Reading from Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9259: -- Priority: Critical (was: Major) Bulk Reading from Cassandra --- Key: CASSANDRA-9259 URL: https://issues.apache.org/jira/browse/CASSANDRA-9259 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Brian Hess Assignee: Ariel Weisberg Priority: Critical Fix For: 3.x This ticket is following on from the 2015 NGCC. This ticket is designed to be a place for discussing and designing an approach to bulk reading. The goal is to have a bulk reading path for Cassandra. That is, a path optimized to grab a large portion of the data for a table (potentially all of it). This is a core element in the Spark integration with Cassandra, and the speed at which Cassandra can deliver bulk data to Spark is limiting the performance of Spark-plus-Cassandra operations. This is especially of importance as Cassandra will (likely) leverage Spark for internal operations (for example CASSANDRA-8234). The core CQL to consider is the following: SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey) X AND Token(partitionKey) = Y Here, we choose X and Y to be contained within one token range (perhaps considering the primary range of a node without vnodes, for example). This query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk operations via Spark (or other processing frameworks - ETL, etc). There are a few causes (e.g., inefficient paging). There are a few approaches that could be considered. First, we consider a new Streaming Compaction approach. The key observation here is that a bulk read from Cassandra is a lot like a major compaction, though instead of outputting a new SSTable we would output CQL rows to a stream/socket/etc. This would be similar to a CompactionTask, but would strip out some unnecessary things in there (e.g., some of the indexing, etc). Predicates and projections could also be encapsulated in this new StreamingCompactionTask, for example. Another approach would be an alternate storage format. For example, we might employ Parquet (just as an example) to store the same data as in the primary Cassandra storage (aka SSTables). This is akin to Global Indexes (an alternate storage of the same data optimized for a particular query). Then, Cassandra can choose to leverage this alternate storage for particular CQL queries (e.g., range scans). These are just 2 suggestions to get the conversation going. One thing to note is that it will be useful to have this storage segregated by token range so that when you extract via these mechanisms you do not get replications-factor numbers of copies of the data. That will certainly be an issue for some Spark operations (e.g., counting). Thus, we will want per-token-range storage (even for single disks), so this will likely leverage CASSANDRA-6696 (though, we'll want to also consider the single disk case). It is also worth discussing what the success criteria is here. It is unlikely to be as fast as EDW or HDFS performance (though, that is still a good goal), but being within some percentage of that performance should be set as success. For example, 2x as long as doing bulk operations on HDFS with similar node count/size/etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8906) Experiment with optimizing partition merging when we can prove that some sources don't overlap
[ https://issues.apache.org/jira/browse/CASSANDRA-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8906: -- Priority: Minor (was: Major) Experiment with optimizing partition merging when we can prove that some sources don't overlap -- Key: CASSANDRA-8906 URL: https://issues.apache.org/jira/browse/CASSANDRA-8906 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Ariel Weisberg Priority: Minor Labels: compaction, performance Fix For: 3.x When we merge a partition from two sources and it turns out that those 2 sources don't overlap for that partition, we still end up doing one comparison by row in the first source. However, if we can prove that the 2 sources don't overlap, for example by using the sstable min/max clustering values that we store, we could speed this up. Note that it practice it's little bit more hairy because we need to deal with N sources, but that's probably not too hard either. I'll note that using the sstable min/max clustering values is not terribly precise. We could do better if we were to push the same reasoning inside the merge iterator, by for instance using the sstable per-partition index, which can in theory tell use things like don't bother comparing rows until the end of this row block. This is quite a bit more involved though so maybe note worth the complexity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9259) Bulk Reading from Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9259: -- Issue Type: New Feature (was: Improvement) Bulk Reading from Cassandra --- Key: CASSANDRA-9259 URL: https://issues.apache.org/jira/browse/CASSANDRA-9259 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Brian Hess Assignee: Ariel Weisberg Fix For: 3.x This ticket is following on from the 2015 NGCC. This ticket is designed to be a place for discussing and designing an approach to bulk reading. The goal is to have a bulk reading path for Cassandra. That is, a path optimized to grab a large portion of the data for a table (potentially all of it). This is a core element in the Spark integration with Cassandra, and the speed at which Cassandra can deliver bulk data to Spark is limiting the performance of Spark-plus-Cassandra operations. This is especially of importance as Cassandra will (likely) leverage Spark for internal operations (for example CASSANDRA-8234). The core CQL to consider is the following: SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey) X AND Token(partitionKey) = Y Here, we choose X and Y to be contained within one token range (perhaps considering the primary range of a node without vnodes, for example). This query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk operations via Spark (or other processing frameworks - ETL, etc). There are a few causes (e.g., inefficient paging). There are a few approaches that could be considered. First, we consider a new Streaming Compaction approach. The key observation here is that a bulk read from Cassandra is a lot like a major compaction, though instead of outputting a new SSTable we would output CQL rows to a stream/socket/etc. This would be similar to a CompactionTask, but would strip out some unnecessary things in there (e.g., some of the indexing, etc). Predicates and projections could also be encapsulated in this new StreamingCompactionTask, for example. Another approach would be an alternate storage format. For example, we might employ Parquet (just as an example) to store the same data as in the primary Cassandra storage (aka SSTables). This is akin to Global Indexes (an alternate storage of the same data optimized for a particular query). Then, Cassandra can choose to leverage this alternate storage for particular CQL queries (e.g., range scans). These are just 2 suggestions to get the conversation going. One thing to note is that it will be useful to have this storage segregated by token range so that when you extract via these mechanisms you do not get replications-factor numbers of copies of the data. That will certainly be an issue for some Spark operations (e.g., counting). Thus, we will want per-token-range storage (even for single disks), so this will likely leverage CASSANDRA-6696 (though, we'll want to also consider the single disk case). It is also worth discussing what the success criteria is here. It is unlikely to be as fast as EDW or HDFS performance (though, that is still a good goal), but being within some percentage of that performance should be set as success. For example, 2x as long as doing bulk operations on HDFS with similar node count/size/etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9237) Gossip messages subject to head of line blocking by other intra-cluster traffic
[ https://issues.apache.org/jira/browse/CASSANDRA-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9237: -- Fix Version/s: 3.0.0 rc1 Gossip messages subject to head of line blocking by other intra-cluster traffic --- Key: CASSANDRA-9237 URL: https://issues.apache.org/jira/browse/CASSANDRA-9237 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0.0 rc1 Reported as an issue over less than perfect networks like VPNs between data centers. Gossip goes over the small message socket where small is 64k which isn't particularly small. This is done for performance to keep most traffic on one hot socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9259) Bulk Reading from Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9259: -- Fix Version/s: 3.x Bulk Reading from Cassandra --- Key: CASSANDRA-9259 URL: https://issues.apache.org/jira/browse/CASSANDRA-9259 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Hess Assignee: Ariel Weisberg Fix For: 3.x This ticket is following on from the 2015 NGCC. This ticket is designed to be a place for discussing and designing an approach to bulk reading. The goal is to have a bulk reading path for Cassandra. That is, a path optimized to grab a large portion of the data for a table (potentially all of it). This is a core element in the Spark integration with Cassandra, and the speed at which Cassandra can deliver bulk data to Spark is limiting the performance of Spark-plus-Cassandra operations. This is especially of importance as Cassandra will (likely) leverage Spark for internal operations (for example CASSANDRA-8234). The core CQL to consider is the following: SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey) X AND Token(partitionKey) = Y Here, we choose X and Y to be contained within one token range (perhaps considering the primary range of a node without vnodes, for example). This query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk operations via Spark (or other processing frameworks - ETL, etc). There are a few causes (e.g., inefficient paging). There are a few approaches that could be considered. First, we consider a new Streaming Compaction approach. The key observation here is that a bulk read from Cassandra is a lot like a major compaction, though instead of outputting a new SSTable we would output CQL rows to a stream/socket/etc. This would be similar to a CompactionTask, but would strip out some unnecessary things in there (e.g., some of the indexing, etc). Predicates and projections could also be encapsulated in this new StreamingCompactionTask, for example. Another approach would be an alternate storage format. For example, we might employ Parquet (just as an example) to store the same data as in the primary Cassandra storage (aka SSTables). This is akin to Global Indexes (an alternate storage of the same data optimized for a particular query). Then, Cassandra can choose to leverage this alternate storage for particular CQL queries (e.g., range scans). These are just 2 suggestions to get the conversation going. One thing to note is that it will be useful to have this storage segregated by token range so that when you extract via these mechanisms you do not get replications-factor numbers of copies of the data. That will certainly be an issue for some Spark operations (e.g., counting). Thus, we will want per-token-range storage (even for single disks), so this will likely leverage CASSANDRA-6696 (though, we'll want to also consider the single disk case). It is also worth discussing what the success criteria is here. It is unlikely to be as fast as EDW or HDFS performance (though, that is still a good goal), but being within some percentage of that performance should be set as success. For example, 2x as long as doing bulk operations on HDFS with similar node count/size/etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9237) Gossip messages subject to head of line blocking by other intra-cluster traffic
[ https://issues.apache.org/jira/browse/CASSANDRA-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692509#comment-14692509 ] Jonathan Ellis commented on CASSANDRA-9237: --- (IMO either of those would also be appropriate for 2.2.x.) Gossip messages subject to head of line blocking by other intra-cluster traffic --- Key: CASSANDRA-9237 URL: https://issues.apache.org/jira/browse/CASSANDRA-9237 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0.0 rc1 Reported as an issue over less than perfect networks like VPNs between data centers. Gossip goes over the small message socket where small is 64k which isn't particularly small. This is done for performance to keep most traffic on one hot socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9023) 2.0.13 write timeouts on driver
[ https://issues.apache.org/jira/browse/CASSANDRA-9023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-9023. --- Resolution: Cannot Reproduce Fix Version/s: (was: 2.0.x) 2.0.13 write timeouts on driver --- Key: CASSANDRA-9023 URL: https://issues.apache.org/jira/browse/CASSANDRA-9023 Project: Cassandra Issue Type: Bug Environment: For testing using only Single node hardware configuration as follows: cpu : CPU(s):16 On-line CPU(s) list: 0-15 Thread(s) per core:2 Core(s) per socket:8 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU MHz: 2000.174 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K NUMA node0 CPU(s): 0-15 OS: Linux version 2.6.32-504.8.1.el6.x86_64 (mockbu...@c6b9.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) Disk: There only single disk in Raid i think space is 500 GB used is 5 GB Reporter: anishek Assignee: Ariel Weisberg Attachments: out_system.log Initially asked @ http://www.mail-archive.com/user@cassandra.apache.org/msg41621.html Was suggested to post here. If any more details are required please let me know -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9237) Gossip messages subject to head of line blocking by other intra-cluster traffic
[ https://issues.apache.org/jira/browse/CASSANDRA-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692554#comment-14692554 ] Jonathan Ellis commented on CASSANDRA-9237: --- Why not just switch it back to GOSSIP + INTERNAL if we're going to consider that? Gossip messages subject to head of line blocking by other intra-cluster traffic --- Key: CASSANDRA-9237 URL: https://issues.apache.org/jira/browse/CASSANDRA-9237 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0.0 rc1 Reported as an issue over less than perfect networks like VPNs between data centers. Gossip goes over the small message socket where small is 64k which isn't particularly small. This is done for performance to keep most traffic on one hot socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9940) ReadResponse serializes and then deserializes local responses
[ https://issues.apache.org/jira/browse/CASSANDRA-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9940: -- Fix Version/s: 3.x ReadResponse serializes and then deserializes local responses - Key: CASSANDRA-9940 URL: https://issues.apache.org/jira/browse/CASSANDRA-9940 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.x Noticed this reviewing CASSANDRA-9894. It would be nice to not have to do this busy work. Benedict said it wasn't straightforward to avoid because it's being done to allow the read op order group to close. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7061) High accuracy, low overhead local read/write tracing
[ https://issues.apache.org/jira/browse/CASSANDRA-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-7061: -- Assignee: (was: Ariel Weisberg) Fix Version/s: (was: 3.x) High accuracy, low overhead local read/write tracing Key: CASSANDRA-7061 URL: https://issues.apache.org/jira/browse/CASSANDRA-7061 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict External profilers are pretty inadequate for getting accurate information at the granularity we're working at: tracing is too high overhead, so measures something completely different, and sampling suffers from bias of attribution due to the way the stack traces are retrieved. Hyperthreading can make this even worse. I propose to introduce an extremely low overhead tracing feature that must be enabled with a system property that will trace operations within the node only, so that we can perform various accurate low level analyses of performance. This information will include threading info, so that we can trace hand off delays and actual active time spent processing an operation. With the property disabled there will be no increased burden of tracing, however I hope to keep the total trace burden to less than one microsecond, and any single trace command to a few tens of nanos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8906) Experiment with optimizing partition merging when we can prove that some sources don't overlap
[ https://issues.apache.org/jira/browse/CASSANDRA-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8906: -- Assignee: (was: Ariel Weisberg) Fix Version/s: (was: 3.x) Experiment with optimizing partition merging when we can prove that some sources don't overlap -- Key: CASSANDRA-8906 URL: https://issues.apache.org/jira/browse/CASSANDRA-8906 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Priority: Minor Labels: compaction, performance When we merge a partition from two sources and it turns out that those 2 sources don't overlap for that partition, we still end up doing one comparison by row in the first source. However, if we can prove that the 2 sources don't overlap, for example by using the sstable min/max clustering values that we store, we could speed this up. Note that it practice it's little bit more hairy because we need to deal with N sources, but that's probably not too hard either. I'll note that using the sstable min/max clustering values is not terribly precise. We could do better if we were to push the same reasoning inside the merge iterator, by for instance using the sstable per-partition index, which can in theory tell use things like don't bother comparing rows until the end of this row block. This is quite a bit more involved though so maybe note worth the complexity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9241) ByteBuffer.array() without ByteBuffer.arrayOffset() + ByteBuffer.position() is a bug
[ https://issues.apache.org/jira/browse/CASSANDRA-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9241: -- Reviewer: Stefania [~Stefania] to review ByteBuffer.array() without ByteBuffer.arrayOffset() + ByteBuffer.position() is a bug Key: CASSANDRA-9241 URL: https://issues.apache.org/jira/browse/CASSANDRA-9241 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Assignee: Ariel Weisberg Priority: Minor Fix For: 3.0.x I found one instance of this on OHCProvider so it make sense to review all usages since there aren't that many. Some suspect things: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/FastByteOperations.java#L197 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1877 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/gms/TokenSerializer.java#L40 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/compress/CompressedRandomAccessReader.java#L178 https://github.com/apache/cassandra/blob/trunk/tools/stress/src/org/apache/cassandra/stress/operations/predefined/CqlOperation.java#L104 https://github.com/apache/cassandra/blob/trunk/tools/stress/src/org/apache/cassandra/stress/operations/predefined/CqlOperation.java#L543 https://github.com/apache/cassandra/blob/trunk/tools/stress/src/org/apache/cassandra/stress/operations/predefined/CqlOperation.java#L563 I made this list off of 8099 so I might have missed some instances on trunk. FastByteOperations makes me cross eyed so it is worth a second pass to make sure offsets in byte buffers are handled correctly. Generally I like to use the full incantation even when I have done things like allocate the buffer on the stack locally for copy pasta/refactoring reasons and to make clear to new users how the API is supposed to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9237) Gossip messages subject to head of line blocking by other intra-cluster traffic
[ https://issues.apache.org/jira/browse/CASSANDRA-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692505#comment-14692505 ] Jonathan Ellis commented on CASSANDRA-9237: --- I see two options for 3.0, both of which are better than the status quo: # Reduce the small-message threshold # Go back to the old heuristic of putting gossip and internal responses on a separate socket The problem with #2 in the past was that read responses, which are quite large, got jumbled in too. (REQUEST_RESPONSE is too large an umbrella.) We could split those out to their own verb, but it's not clear to me that putting write acks on the low traffic socket is a win. Any redefinition of liveness or heartbeat generation belongs in a new ticket and is something of an open-ended research project with no clear answers imo. Gossip messages subject to head of line blocking by other intra-cluster traffic --- Key: CASSANDRA-9237 URL: https://issues.apache.org/jira/browse/CASSANDRA-9237 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0.0 rc1 Reported as an issue over less than perfect networks like VPNs between data centers. Gossip goes over the small message socket where small is 64k which isn't particularly small. This is done for performance to keep most traffic on one hot socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6434) Repair-aware gc grace period
[ https://issues.apache.org/jira/browse/CASSANDRA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6434: -- Reviewer: Yuki Morishita (was: sankalp kohli) Repair-aware gc grace period - Key: CASSANDRA-6434 URL: https://issues.apache.org/jira/browse/CASSANDRA-6434 Project: Cassandra Issue Type: New Feature Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 beta 1 Since the reason for gcgs is to ensure that we don't purge tombstones until every replica has been notified, it's redundant in a world where we're tracking repair times per sstable (and repairing frequentily), i.e., a world where we default to incremental repair a la CASSANDRA-5351. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-6434) Repair-aware gc grace period
[ https://issues.apache.org/jira/browse/CASSANDRA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680212#comment-14680212 ] Jonathan Ellis edited comment on CASSANDRA-6434 at 8/10/15 2:48 PM: Sylvain is out for another week. Can you review [~kohlisankalp]? Edit: turns out Yuki is already working on it, assigning to him. was (Author: jbellis): Sylvain is out for another week. Can you review [~kohlisankalp]? Repair-aware gc grace period - Key: CASSANDRA-6434 URL: https://issues.apache.org/jira/browse/CASSANDRA-6434 Project: Cassandra Issue Type: New Feature Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 beta 1 Since the reason for gcgs is to ensure that we don't purge tombstones until every replica has been notified, it's redundant in a world where we're tracking repair times per sstable (and repairing frequentily), i.e., a world where we default to incremental repair a la CASSANDRA-5351. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6434) Repair-aware gc grace period
[ https://issues.apache.org/jira/browse/CASSANDRA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6434: -- Reviewer: sankalp kohli (was: Sylvain Lebresne) Sylvain is out for another week. Can you review [~kohlisankalp]? Repair-aware gc grace period - Key: CASSANDRA-6434 URL: https://issues.apache.org/jira/browse/CASSANDRA-6434 Project: Cassandra Issue Type: New Feature Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 beta 1 Since the reason for gcgs is to ensure that we don't purge tombstones until every replica has been notified, it's redundant in a world where we're tracking repair times per sstable (and repairing frequentily), i.e., a world where we default to incremental repair a la CASSANDRA-5351. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10006) 2.1 format sstable filenames with tmp are not handled by 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10006: --- Reviewer: Yuki Morishita [~yukim] to review 2.1 format sstable filenames with tmp are not handled by 3.0 -- Key: CASSANDRA-10006 URL: https://issues.apache.org/jira/browse/CASSANDRA-10006 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Stefania Fix For: 3.0 beta 1 In 3.0, {{Descriptor.fromFilename()}} doesn't handle tmp in sstable filenames in the 2.1 (ka) format. If you start 3.0 with one of these filenames, you'll see an exception like the following: {noformat} ERROR [main] 2015-08-05 10:15:57,872 CassandraDaemon.java:623 - Exception encountered during startup java.lang.AssertionError: Invalid file name system-schema_columns-tmp-ka-5-Filter.db in /tmp/dtest-Jstsy2/test/node1/data/system/schema_columns-296e9c049bec3085827dc17d3df2122a at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:291) ~[main/:na] at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:190) ~[main/:na] at org.apache.cassandra.service.StartupChecks$7$1.visitFile(StartupChecks.java:226) ~[main/:na] at org.apache.cassandra.service.StartupChecks$7$1.visitFile(StartupChecks.java:218) ~[main/:na] at java.nio.file.Files.walkFileTree(Files.java:2670) ~[na:1.8.0_45] at java.nio.file.Files.walkFileTree(Files.java:2742) ~[na:1.8.0_45] at org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:251) ~[main/:na] at org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:163) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:504) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:610) [main/:na] {noformat} I can reliably reproduce this with an [upgrade dtest|https://github.com/thobbs/cassandra-dtest/blob/8099-backwards-compat/upgrade_tests/cql_tests.py#L5126-L5162] from CASSANDRA-9704, but it should also be reproducible by simply starting 3.0 with a filename like the one from the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7423) Allow updating individual subfields of UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-7423: -- Assignee: Benjamin Lerer Allow updating individual subfields of UDT -- Key: CASSANDRA-7423 URL: https://issues.apache.org/jira/browse/CASSANDRA-7423 Project: Cassandra Issue Type: Improvement Components: API, Core Reporter: Tupshin Harper Assignee: Benjamin Lerer Labels: cql Fix For: 3.x Since user defined types were implemented in CASSANDRA-5590 as blobs (you have to rewrite the entire type in order to make any modifications), they can't be safely used without LWT for any operation that wants to modify a subset of the UDT's fields by any client process that is not authoritative for the entire blob. When trying to use UDTs to model complex records (particularly with nesting), this is not an exceptional circumstance, this is the totally expected normal situation. The use of UDTs for anything non-trivial is harmful to either performance or consistency or both. edit: to clarify, i believe that most potential uses of UDTs should be considered anti-patterns until/unless we have field-level r/w access to individual elements of the UDT, with individual timestamps and standard LWW semantics -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10020) Support eager retries for range queries
[ https://issues.apache.org/jira/browse/CASSANDRA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10020: --- Priority: Minor (was: Critical) Issue Type: New Feature (was: Bug) Summary: Support eager retries for range queries (was: Range queries don't go on all replicas. ) Support eager retries for range queries --- Key: CASSANDRA-10020 URL: https://issues.apache.org/jira/browse/CASSANDRA-10020 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Gautam Kumar Priority: Minor A simple query like `select * from table` may time out if one of the nodes fail. We had a 4-node cassandra cluster with RF=3 and CL=LOCAL_QUORUM. The range query is issued to only two as per ConsistencyLevel.java: liveEndpoints.subList(0, Math.min(liveEndpoints.size(), blockFor(keyspace))); If a node fails amongst this sublist, the query will time out. Why don't you issue range queries to all replicas? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10014) Deletions using clustering keys not reflected in MV
[ https://issues.apache.org/jira/browse/CASSANDRA-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10014: --- Since Version: 3.0 alpha 1 (was: 3.0.x) Deletions using clustering keys not reflected in MV --- Key: CASSANDRA-10014 URL: https://issues.apache.org/jira/browse/CASSANDRA-10014 Project: Cassandra Issue Type: Bug Reporter: Stefan Podkowinski Assignee: Carl Yeksigian Fix For: 3.0.0 rc1 I wrote a test to reproduce an [issue|http://stackoverflow.com/questions/31810841/cassandra-materialized-view-shows-stale-data/31860487] reported on SO and turns out this is easily reproducible. There seems to be a bug preventing deletes to be propagated to MVs in case a clustering key is used. See [here|https://github.com/spodkowinski/cassandra/commit/1c064523c8d8dbee30d46a03a0f58d3be97800dc] for test case (testClusteringKeyTombstone should fail). It seems {{MaterializedView.updateAffectsView()}} will not consider the delete relevant for the view as {{partition.deletionInfo().isLive()}} will be true during the test. In other test cases isLive will return false, which seems to be the actual problem here. I'm not even sure the root cause is MV specific, but wasn't able to dig much deeper as I'm not familiar with the slightly confusing semantics around DeletionInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10015) Create tool to debug why expired sstables are not getting dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10015: --- Reviewer: Stefania Create tool to debug why expired sstables are not getting dropped - Key: CASSANDRA-10015 URL: https://issues.apache.org/jira/browse/CASSANDRA-10015 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 3.x, 2.1.x, 2.0.x, 2.2.x Sometimes fully expired sstables are not getting dropped, and it is a real pain to manually find out why. A tool that outputs which sstable blocks (by having older data than the newest tombstone in an expired sstable) expired ones would save a lot of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10006) 2.1 format sstable filenames with tmp are not handled by 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10006: --- Assignee: Stefania (was: Yuki Morishita) 2.1 format sstable filenames with tmp are not handled by 3.0 -- Key: CASSANDRA-10006 URL: https://issues.apache.org/jira/browse/CASSANDRA-10006 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Stefania Fix For: 3.0 beta 1 In 3.0, {{Descriptor.fromFilename()}} doesn't handle tmp in sstable filenames in the 2.1 (ka) format. If you start 3.0 with one of these filenames, you'll see an exception like the following: {noformat} ERROR [main] 2015-08-05 10:15:57,872 CassandraDaemon.java:623 - Exception encountered during startup java.lang.AssertionError: Invalid file name system-schema_columns-tmp-ka-5-Filter.db in /tmp/dtest-Jstsy2/test/node1/data/system/schema_columns-296e9c049bec3085827dc17d3df2122a at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:291) ~[main/:na] at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:190) ~[main/:na] at org.apache.cassandra.service.StartupChecks$7$1.visitFile(StartupChecks.java:226) ~[main/:na] at org.apache.cassandra.service.StartupChecks$7$1.visitFile(StartupChecks.java:218) ~[main/:na] at java.nio.file.Files.walkFileTree(Files.java:2670) ~[na:1.8.0_45] at java.nio.file.Files.walkFileTree(Files.java:2742) ~[na:1.8.0_45] at org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:251) ~[main/:na] at org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:163) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:504) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:610) [main/:na] {noformat} I can reliably reproduce this with an [upgrade dtest|https://github.com/thobbs/cassandra-dtest/blob/8099-backwards-compat/upgrade_tests/cql_tests.py#L5126-L5162] from CASSANDRA-9704, but it should also be reproducible by simply starting 3.0 with a filename like the one from the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10008) Upgrading SSTables fails on 2.2.0 (after upgrade from 2.1.2)
[ https://issues.apache.org/jira/browse/CASSANDRA-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10008: --- Assignee: Chris Moos Upgrading SSTables fails on 2.2.0 (after upgrade from 2.1.2) Key: CASSANDRA-10008 URL: https://issues.apache.org/jira/browse/CASSANDRA-10008 Project: Cassandra Issue Type: Bug Reporter: Chris Moos Assignee: Chris Moos Fix For: 2.2.x Attachments: CASSANDRA-10008.patch Running *nodetool upgradesstables* fails with the following after upgrading to 2.2.0 from 2.1.2: {code} error: null -- StackTrace -- java.lang.AssertionError at org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkUnused(LifecycleTransaction.java:428) at org.apache.cassandra.db.lifecycle.LifecycleTransaction.split(LifecycleTransaction.java:408) at org.apache.cassandra.db.compaction.CompactionManager.parallelAllSSTableOperation(CompactionManager.java:268) at org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:373) at org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:1524) at org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2521) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10008) Upgrading SSTables fails on 2.2.0 (after upgrade from 2.1.2)
[ https://issues.apache.org/jira/browse/CASSANDRA-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10008: --- Reviewer: Joshua McKenzie [~JoshuaMcKenzie] to review Upgrading SSTables fails on 2.2.0 (after upgrade from 2.1.2) Key: CASSANDRA-10008 URL: https://issues.apache.org/jira/browse/CASSANDRA-10008 Project: Cassandra Issue Type: Bug Reporter: Chris Moos Assignee: Chris Moos Fix For: 2.2.x Attachments: CASSANDRA-10008.patch Running *nodetool upgradesstables* fails with the following after upgrading to 2.2.0 from 2.1.2: {code} error: null -- StackTrace -- java.lang.AssertionError at org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkUnused(LifecycleTransaction.java:428) at org.apache.cassandra.db.lifecycle.LifecycleTransaction.split(LifecycleTransaction.java:408) at org.apache.cassandra.db.compaction.CompactionManager.parallelAllSSTableOperation(CompactionManager.java:268) at org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:373) at org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:1524) at org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2521) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10014) Deletions using clustering keys not reflected in MV
[ https://issues.apache.org/jira/browse/CASSANDRA-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10014: --- Assignee: Carl Yeksigian Deletions using clustering keys not reflected in MV --- Key: CASSANDRA-10014 URL: https://issues.apache.org/jira/browse/CASSANDRA-10014 Project: Cassandra Issue Type: Bug Reporter: Stefan Podkowinski Assignee: Carl Yeksigian Fix For: 3.0.0 rc1 I wrote a test to reproduce an [issue|http://stackoverflow.com/questions/31810841/cassandra-materialized-view-shows-stale-data/31860487] reported on SO and turns out this is easily reproducible. There seems to be a bug preventing deletes to be propagated to MVs in case a clustering key is used. See [here|https://github.com/spodkowinski/cassandra/commit/1c064523c8d8dbee30d46a03a0f58d3be97800dc] for test case (testClusteringKeyTombstone should fail). It seems {{MaterializedView.updateAffectsView()}} will not consider the delete relevant for the view as {{partition.deletionInfo().isLive()}} will be true during the test. In other test cases isLive will return false, which seems to be the actual problem here. I'm not even sure the root cause is MV specific, but wasn't able to dig much deeper as I'm not familiar with the slightly confusing semantics around DeletionInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10014) Deletions using clustering keys not reflected in MV
[ https://issues.apache.org/jira/browse/CASSANDRA-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10014: --- Fix Version/s: 3.0.0 rc1 Deletions using clustering keys not reflected in MV --- Key: CASSANDRA-10014 URL: https://issues.apache.org/jira/browse/CASSANDRA-10014 Project: Cassandra Issue Type: Bug Reporter: Stefan Podkowinski Assignee: Carl Yeksigian Fix For: 3.0.0 rc1 I wrote a test to reproduce an [issue|http://stackoverflow.com/questions/31810841/cassandra-materialized-view-shows-stale-data/31860487] reported on SO and turns out this is easily reproducible. There seems to be a bug preventing deletes to be propagated to MVs in case a clustering key is used. See [here|https://github.com/spodkowinski/cassandra/commit/1c064523c8d8dbee30d46a03a0f58d3be97800dc] for test case (testClusteringKeyTombstone should fail). It seems {{MaterializedView.updateAffectsView()}} will not consider the delete relevant for the view as {{partition.deletionInfo().isLive()}} will be true during the test. In other test cases isLive will return false, which seems to be the actual problem here. I'm not even sure the root cause is MV specific, but wasn't able to dig much deeper as I'm not familiar with the slightly confusing semantics around DeletionInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9916) batch_mutate failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9916: -- Assignee: Paulo Motta (was: Benjamin Lerer) batch_mutate failing on trunk - Key: CASSANDRA-9916 URL: https://issues.apache.org/jira/browse/CASSANDRA-9916 Project: Cassandra Issue Type: Bug Components: Core Reporter: Mike Adamson Assignee: Paulo Motta Fix For: 3.0.0 rc1 {{batch_mutate}} is failing on trunk with the following error: {noformat} java.lang.AssertionError: current = ColumnDefinition{name=b@706172656e745f70617468, type=org.apache.cassandra.db.marshal.BytesType, kind=STATIC, componentIndex=null, indexName=cfs_parent_path, indexType=KEYS}, new = ColumnDefinition{name=b@70617468, type=org.apache.cassandra.db.marshal.BytesType, kind=STATIC, componentIndex=null, indexName=cfs_path, indexType=KEYS} at org.apache.cassandra.db.rows.ArrayBackedRow$SortedBuilder.setColumn(ArrayBackedRow.java:617) at org.apache.cassandra.db.rows.ArrayBackedRow$SortedBuilder.addCell(ArrayBackedRow.java:630) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:891) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:843) at org.apache.cassandra.db.LegacyLayout.getNextRow(LegacyLayout.java:390) at org.apache.cassandra.db.LegacyLayout.toUnfilteredRowIterator(LegacyLayout.java:326) at org.apache.cassandra.db.LegacyLayout.toUnfilteredRowIterator(LegacyLayout.java:288) at org.apache.cassandra.thrift.CassandraServer.createMutationList(CassandraServer.java:1110) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:1249) {noformat} The following mutations was passed to {{batch_mutate}} to get this error {noformat} mutationMap = {java.nio.HeapByteBuffer[pos=0 lim=32 cap=32]= {inode=[ Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:80 62 00 04 70 61 74 68 00, value:2F, timestamp:1438165021749))), Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:80 62 00 0B 70 61 72 65 6E 74 5F 70 61 74 68 00, value:6E 75 6C 6C, timestamp:1438165021749))), Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:80 62 00 08 73 65 6E 74 69 6E 65 6C 00, value:78, timestamp:1438165021749))), Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:80 62 00 04 64 61 74 61 00, value:00 00 00 08 64 61 74 61 73 74 61 78 00 00 00 05 75 73 65 72 73 01 FF 00 00 00 00 00 04 00 00 00 01, timestamp:1438165021749))) ]}} {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine
[ https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660772#comment-14660772 ] Jonathan Ellis commented on CASSANDRA-8099: --- That's on Sylvain's list when he gets back in two weeks. Refactor and modernize the storage engine - Key: CASSANDRA-8099 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 3.0 beta 1 Attachments: 8099-nit The current storage engine (which for this ticket I'll loosely define as the code implementing the read/write path) is suffering from old age. One of the main problem is that the only structure it deals with is the cell, which completely ignores the more high level CQL structure that groups cell into (CQL) rows. This leads to many inefficiencies, like the fact that during a reads we have to group cells multiple times (to count on replica, then to count on the coordinator, then to produce the CQL resultset) because we forget about the grouping right away each time (so lots of useless cell names comparisons in particular). But outside inefficiencies, having to manually recreate the CQL structure every time we need it for something is hindering new features and makes the code more complex that it should be. Said storage engine also has tons of technical debt. To pick an example, the fact that during range queries we update {{SliceQueryFilter.count}} is pretty hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to go into to simply remove the last query result. So I want to bite the bullet and modernize this storage engine. I propose to do 2 main things: # Make the storage engine more aware of the CQL structure. In practice, instead of having partitions be a simple iterable map of cells, it should be an iterable list of row (each being itself composed of per-column cells, though obviously not exactly the same kind of cell we have today). # Make the engine more iterative. What I mean here is that in the read path, we end up reading all cells in memory (we put them in a ColumnFamily object), but there is really no reason to. If instead we were working with iterators all the way through, we could get to a point where we're basically transferring data from disk to the network, and we should be able to reduce GC substantially. Please note that such refactor should provide some performance improvements right off the bat but it's not it's primary goal either. It's primary goal is to simplify the storage engine and adds abstraction that are better suited to further optimizations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9966) batched CAS statements are not serializable
[ https://issues.apache.org/jira/browse/CASSANDRA-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660777#comment-14660777 ] Jonathan Ellis commented on CASSANDRA-9966: --- [~kohlisankalp] may also be able to help. batched CAS statements are not serializable --- Key: CASSANDRA-9966 URL: https://issues.apache.org/jira/browse/CASSANDRA-9966 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sam Overton Assignee: Sylvain Lebresne Priority: Critical Fix For: 3.x, 2.2.x It is possible to batch CAS statements such that their outcome is different to the outcome were they executed sequentially outside of a batch. eg. a | b | c a | 1 | 1 BEGIN BATCH UPDATE foo SET b=2 WHERE a='a' iF c=1 UPDATE foo SET c=2 WHERE a='a' IF b=1 APPLY BATCH results in a | b | c a | 2 | 2 If these statements were not batched, the outcome would be UPDATE foo SET b=2 WHERE a='a' iF c=1 a | b | c a | 2 | 1 UPDATE foo SET c=2 WHERE a='a' IF b=1 applied=false (pre-condition b=1 not met) Cassandra already checks for incompatible preconditions within a batch (eg one statement with IF c=1 and another statement with IF c=2). It should also check for mutations to columns in one statement that affect the pre-conditions of another statement, or it should evaluate the statement pre-conditions sequentially after applying the mutations of the previous statement to an in-memory model of the partition. For backwards compatibility this would have to be a new strict batch mode, eg. BEGIN STRICT BATCH -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10007) Repeated rows in paged result
[ https://issues.apache.org/jira/browse/CASSANDRA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660775#comment-14660775 ] Jonathan Ellis commented on CASSANDRA-10007: Is this the same as CASSANDRA-10010? Repeated rows in paged result - Key: CASSANDRA-10007 URL: https://issues.apache.org/jira/browse/CASSANDRA-10007 Project: Cassandra Issue Type: Bug Components: Core Reporter: Adam Holmberg Assignee: Benjamin Lerer Labels: client-impacting Fix For: 3.x Attachments: paging-test.py We noticed an anomaly in paged results while testing against 3.0.0-alpha1. It seems that unbounded selects can return rows repeated at page boundaries. Furthermore, the number of repeated rows seems to dither in count across consecutive runs of the same query. Does not reproduce on 2.2.0 and earlier. I also noted that this behavior only manifests on multi-node clusters. The attached script shows this behavior when run against 3.0.0-alpha1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9966) batched CAS statements are not serializable
[ https://issues.apache.org/jira/browse/CASSANDRA-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9966: -- Fix Version/s: 2.2.x batched CAS statements are not serializable --- Key: CASSANDRA-9966 URL: https://issues.apache.org/jira/browse/CASSANDRA-9966 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sam Overton Assignee: Sylvain Lebresne Priority: Critical Fix For: 3.x, 2.2.x It is possible to batch CAS statements such that their outcome is different to the outcome were they executed sequentially outside of a batch. eg. a | b | c a | 1 | 1 BEGIN BATCH UPDATE foo SET b=2 WHERE a='a' iF c=1 UPDATE foo SET c=2 WHERE a='a' IF b=1 APPLY BATCH results in a | b | c a | 2 | 2 If these statements were not batched, the outcome would be UPDATE foo SET b=2 WHERE a='a' iF c=1 a | b | c a | 2 | 1 UPDATE foo SET c=2 WHERE a='a' IF b=1 applied=false (pre-condition b=1 not met) Cassandra already checks for incompatible preconditions within a batch (eg one statement with IF c=1 and another statement with IF c=2). It should also check for mutations to columns in one statement that affect the pre-conditions of another statement, or it should evaluate the statement pre-conditions sequentially after applying the mutations of the previous statement to an in-memory model of the partition. For backwards compatibility this would have to be a new strict batch mode, eg. BEGIN STRICT BATCH -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10001) Bug in encoding of sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10001: --- Assignee: Stefania Bug in encoding of sstables --- Key: CASSANDRA-10001 URL: https://issues.apache.org/jira/browse/CASSANDRA-10001 Project: Cassandra Issue Type: Bug Reporter: T Jake Luciani Assignee: Stefania Priority: Blocker Fix For: 3.0 beta 1 Fixing the compaction dtest I noticed we aren't encoding map data correctly in sstables. The following code fails from newly committed {{ compaction_test.py:TestCompaction_with_SizeTieredCompactionStrategy.large_compaction_warning_test}} {code} session.execute(CREATE TABLE large(userid text PRIMARY KEY, properties mapint, text) with compression = {}) for i in range(200): # ensures partition size larger than compaction_large_partition_warning_threshold_mb session.execute(UPDATE ks.large SET properties[%i] = '%s' WHERE userid = 'user' % (i, get_random_word(strlen))) ret = session.execute(SELECT properties from ks.large where userid = 'user') assert len(ret) == 1 self.assertEqual(200, len(ret[0][0].keys())) {code} The last assert is failing with only 91 keys. The large values are causing flushes vs staying in the memtable so the issue is somewhere in the serialization of collections in sstables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9966) batched CAS statements are not serializable
[ https://issues.apache.org/jira/browse/CASSANDRA-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9966: -- Fix Version/s: 3.x batched CAS statements are not serializable --- Key: CASSANDRA-9966 URL: https://issues.apache.org/jira/browse/CASSANDRA-9966 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sam Overton Assignee: Sylvain Lebresne Priority: Critical Fix For: 3.x It is possible to batch CAS statements such that their outcome is different to the outcome were they executed sequentially outside of a batch. eg. a | b | c a | 1 | 1 BEGIN BATCH UPDATE foo SET b=2 WHERE a='a' iF c=1 UPDATE foo SET c=2 WHERE a='a' IF b=1 APPLY BATCH results in a | b | c a | 2 | 2 If these statements were not batched, the outcome would be UPDATE foo SET b=2 WHERE a='a' iF c=1 a | b | c a | 2 | 1 UPDATE foo SET c=2 WHERE a='a' IF b=1 applied=false (pre-condition b=1 not met) Cassandra already checks for incompatible preconditions within a batch (eg one statement with IF c=1 and another statement with IF c=2). It should also check for mutations to columns in one statement that affect the pre-conditions of another statement, or it should evaluate the statement pre-conditions sequentially after applying the mutations of the previous statement to an in-memory model of the partition. For backwards compatibility this would have to be a new strict batch mode, eg. BEGIN STRICT BATCH -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9927: -- Assignee: Paulo Motta Security for MaterializedViews -- Key: CASSANDRA-9927 URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 Project: Cassandra Issue Type: Task Reporter: T Jake Luciani Assignee: Paulo Motta Labels: materializedviews Fix For: 3.0 beta 1 We need to think about how to handle security wrt materialized views. Since they are based on a source table we should possibly inherit the same security model as that table. However I can see cases where users would want to create different security auth for different views. esp once we have CASSANDRA-9664 and users can filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9928: -- Fix Version/s: (was: 3.0 beta 1) 3.x Add Support for multiple non-primary key columns in Materialized View primary keys -- Key: CASSANDRA-9928 URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Labels: materializedviews Fix For: 3.x Currently we don't allow 1 non primary key from the base table in a MV primary key. We should remove this restriction assuming we continue filtering out nulls. With allowing nulls in the MV columns there are a lot of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660218#comment-14660218 ] Jonathan Ellis edited comment on CASSANDRA-9927 at 8/7/15 4:23 AM: --- I'm okay with either require explicit grants or always validate against base table for 3.0. So let's go with the latter. was (Author: jbellis): I'm okay with either require explicit grants or always validate against base table for 3.0. Security for MaterializedViews -- Key: CASSANDRA-9927 URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 Project: Cassandra Issue Type: Task Reporter: T Jake Luciani Labels: materializedviews Fix For: 3.0 beta 1 We need to think about how to handle security wrt materialized views. Since they are based on a source table we should possibly inherit the same security model as that table. However I can see cases where users would want to create different security auth for different views. esp once we have CASSANDRA-9664 and users can filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9967) Determine if a Materialized View is built (consistent with its base table after its creation)
[ https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9967: -- Issue Type: New Feature (was: Improvement) Determine if a Materialized View is built (consistent with its base table after its creation) - Key: CASSANDRA-9967 URL: https://issues.apache.org/jira/browse/CASSANDRA-9967 Project: Cassandra Issue Type: New Feature Reporter: Alan Boudreault Priority: Minor Fix For: 3.x Since MVs are eventually consistent with its base table, It would nice if we could easily know the state of the MV after its creation, so we could wait until the MV is built before doing some operations. // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9967) Determine if a Materialized View is built (consistent with its base table after its creation)
[ https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9967: -- Fix Version/s: (was: 3.0 beta 1) 3.x Determine if a Materialized View is built (consistent with its base table after its creation) - Key: CASSANDRA-9967 URL: https://issues.apache.org/jira/browse/CASSANDRA-9967 Project: Cassandra Issue Type: Improvement Reporter: Alan Boudreault Fix For: 3.x Since MVs are eventually consistent with its base table, It would nice if we could easily know the state of the MV after its creation, so we could wait until the MV is built before doing some operations. // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8684) Replace usage of Adler32 with CRC32
[ https://issues.apache.org/jira/browse/CASSANDRA-8684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8684: -- Reviewer: T Jake Luciani (was: Aleksey Yeschenko) Giving review to Jake since he did the original benchmarking back in CASSANDRA-5862. Replace usage of Adler32 with CRC32 --- Key: CASSANDRA-8684 URL: https://issues.apache.org/jira/browse/CASSANDRA-8684 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 beta 1 Attachments: CRCBenchmark.java, PureJavaCrc32.java, Sample.java I could not find a situation in which Adler32 outperformed PureJavaCrc32 much less the intrinsic from Java 8. For small allocations PureJavaCrc32 was much faster probably due to the JNI overhead of invoking the native Adler32 implementation where the array has to be allocated and copied. I tested on a 65w Sandy Bridge i5 running Ubuntu 14.04 with JDK 1.7.0_71 as well as a c3.8xlarge running Ubuntu 14.04. I think it makes sense to stop using Adler32 when generating new checksums. c3.8xlarge, results are time in milliseconds, lower is better ||Allocation size|Adler32|CRC32|PureJavaCrc32|| |64|47636|46075|25782| |128|36755|36712|23782| |256|31194|32211|22731| |1024|27194|28792|22010| |1048576|25941|27807|21808| |536870912|25957|27840|21836| i5 ||Allocation size|Adler32|CRC32|PureJavaCrc32|| |64|50539|50466|26826| |128|37092|38533|24553| |256|30630|32938|23459| |1024|26064|29079|22592| |1048576|24357|27911|22481| |536870912|24838|28360|22853| Another fun fact. Performance of the CRC32 intrinsic appears to double from Sandy Bridge - Haswell. Unless I am measuring something different when going from Linux/Sandy to Haswell/OS X. The intrinsic/JDK 8 implementation also operates against DirectByteBuffers better and coding against the wrapper will get that boost when run with Java 8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9967) Determine if a Materialized View is built (consistent with its base table after its creation)
[ https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9967: -- Priority: Minor (was: Major) Determine if a Materialized View is built (consistent with its base table after its creation) - Key: CASSANDRA-9967 URL: https://issues.apache.org/jira/browse/CASSANDRA-9967 Project: Cassandra Issue Type: Improvement Reporter: Alan Boudreault Priority: Minor Fix For: 3.x Since MVs are eventually consistent with its base table, It would nice if we could easily know the state of the MV after its creation, so we could wait until the MV is built before doing some operations. // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9967) Determine if a Materialized View is finished building, without having to query each node
[ https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9967: -- Summary: Determine if a Materialized View is finished building, without having to query each node (was: Determine if a Materialized View is built (consistent with its base table after its creation)) Determine if a Materialized View is finished building, without having to query each node Key: CASSANDRA-9967 URL: https://issues.apache.org/jira/browse/CASSANDRA-9967 Project: Cassandra Issue Type: New Feature Reporter: Alan Boudreault Priority: Minor Fix For: 3.x Since MVs are eventually consistent with its base table, It would nice if we could easily know the state of the MV after its creation, so we could wait until the MV is built before doing some operations. // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9967) Determine if a Materialized View is built (consistent with its base table after its creation)
[ https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661287#comment-14661287 ] Jonathan Ellis commented on CASSANDRA-9967: --- For 3.0 you can query each node's local state. ([~carlyeks], can you explain how?) For 3.x I agree it would be useful to simplify this. Determine if a Materialized View is built (consistent with its base table after its creation) - Key: CASSANDRA-9967 URL: https://issues.apache.org/jira/browse/CASSANDRA-9967 Project: Cassandra Issue Type: Improvement Reporter: Alan Boudreault Fix For: 3.x Since MVs are eventually consistent with its base table, It would nice if we could easily know the state of the MV after its creation, so we could wait until the MV is built before doing some operations. // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10001) Bug in encoding of sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10001: --- Reviewer: T Jake Luciani Bug in encoding of sstables --- Key: CASSANDRA-10001 URL: https://issues.apache.org/jira/browse/CASSANDRA-10001 Project: Cassandra Issue Type: Bug Reporter: T Jake Luciani Assignee: Stefania Priority: Blocker Fix For: 3.0 beta 1 Fixing the compaction dtest I noticed we aren't encoding map data correctly in sstables. The following code fails from newly committed {{ compaction_test.py:TestCompaction_with_SizeTieredCompactionStrategy.large_compaction_warning_test}} {code} session.execute(CREATE TABLE large(userid text PRIMARY KEY, properties mapint, text) with compression = {}) for i in range(200): # ensures partition size larger than compaction_large_partition_warning_threshold_mb session.execute(UPDATE ks.large SET properties[%i] = '%s' WHERE userid = 'user' % (i, get_random_word(strlen))) ret = session.execute(SELECT properties from ks.large where userid = 'user') assert len(ret) == 1 self.assertEqual(200, len(ret[0][0].keys())) {code} The last assert is failing with only 91 keys. The large values are causing flushes vs staying in the memtable so the issue is somewhere in the serialization of collections in sstables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10002) Repeated slices on RowSearchers are incorrect
[ https://issues.apache.org/jira/browse/CASSANDRA-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10002: --- Reviewer: Stefania (was: Aleksey Yeschenko) [~Stefania] to review Repeated slices on RowSearchers are incorrect - Key: CASSANDRA-10002 URL: https://issues.apache.org/jira/browse/CASSANDRA-10002 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Tyler Hobbs Fix For: 3.0 beta 1 In {{AbstractThreadUnsafePartition}}, repeated {{slice()}} calls on a {{RowSearcher}} can produce incorrect results. This is caused by only performing a binary search over a sublist (based on {{nextIdx}}), but not taking {{nextIdx}} into account when using the search result index. I made a quick fix in [this commit|https://github.com/thobbs/cassandra/commit/73725ea6825c9c0da1fa4986b01f39ae08130e10] on one of my branches, but the full fix also needs to cover {{ReverseRowSearcher}} and include a test to reproduce the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659996#comment-14659996 ] Jonathan Ellis commented on CASSANDRA-9738: --- What does code coverage show? Because I know that would be Ariel's first question. :) Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8234) CTAS (CREATE TABLE AS SELECT)
[ https://issues.apache.org/jira/browse/CASSANDRA-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8234: -- Description: Continuous request from users is the ability to do CREATE TABLE AS SELECT. The simplest form would be copying the entire table. More advanced would allow specifying thes column and UDF to call as well as filtering rows out in WHERE. More advanced still would be to get all the way to allowing JOIN, for which we probably want to integrate Spark. was: Continuous request from users is the ability to do CREATE TABLE AS SELECT... The COPY command can be enhanced to perform simple and customized copies of existing tables to satisfy the need. - Simple copy is COPY table a TO new table b. - Custom copy can mimic Postgres: (e.g. COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO …) Summary: CTAS (CREATE TABLE AS SELECT) (was: CTAS for COPY) CTAS (CREATE TABLE AS SELECT) - Key: CASSANDRA-8234 URL: https://issues.apache.org/jira/browse/CASSANDRA-8234 Project: Cassandra Issue Type: New Feature Components: Tools Reporter: Robin Schumacher Fix For: 3.x Continuous request from users is the ability to do CREATE TABLE AS SELECT. The simplest form would be copying the entire table. More advanced would allow specifying thes column and UDF to call as well as filtering rows out in WHERE. More advanced still would be to get all the way to allowing JOIN, for which we probably want to integrate Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes
[ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660003#comment-14660003 ] Jonathan Ellis commented on CASSANDRA-5220: --- Committed. Thanks, Marcus and Stefania! Repair improvements when using vnodes - Key: CASSANDRA-5220 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.2.0 beta 1 Reporter: Brandon Williams Assignee: Marcus Olsson Labels: performance, repair Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, cassandra-3.0-5220.patch Currently when using vnodes, repair takes much longer to complete than without them. This appears at least in part because it's using a session per range and processing them sequentially. This generates a lot of log spam with vnodes, and while being gentler and lighter on hard disk deployments, ssd-based deployments would often prefer that repair be as fast as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660201#comment-14660201 ] Jonathan Ellis commented on CASSANDRA-9927: --- I'm happy with leaving MV permissions explicity. Inheriting base permissions is definitely not the right thing in all situations. NB: Aleksey pointed out that we do to require SELECT on the base table when CREATEing an MV. Security for MaterializedViews -- Key: CASSANDRA-9927 URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 Project: Cassandra Issue Type: Task Reporter: T Jake Luciani Labels: materializedviews Fix For: 3.0 beta 1 We need to think about how to handle security wrt materialized views. Since they are based on a source table we should possibly inherit the same security model as that table. However I can see cases where users would want to create different security auth for different views. esp once we have CASSANDRA-9664 and users can filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660201#comment-14660201 ] Jonathan Ellis edited comment on CASSANDRA-9927 at 8/6/15 3:58 PM: --- I'm happy with leaving MV permissions to be set explicitly. Inheriting base permissions is definitely not the right thing in all situations. NB: Aleksey pointed out that we do to require SELECT on the base table when CREATEing an MV. was (Author: jbellis): I'm happy with leaving MV permissions explicity. Inheriting base permissions is definitely not the right thing in all situations. NB: Aleksey pointed out that we do to require SELECT on the base table when CREATEing an MV. Security for MaterializedViews -- Key: CASSANDRA-9927 URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 Project: Cassandra Issue Type: Task Reporter: T Jake Luciani Labels: materializedviews Fix For: 3.0 beta 1 We need to think about how to handle security wrt materialized views. Since they are based on a source table we should possibly inherit the same security model as that table. However I can see cases where users would want to create different security auth for different views. esp once we have CASSANDRA-9664 and users can filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9953) Snapshot file handlers are not released after snapshot deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-9953. --- Resolution: Duplicate Snapshot file handlers are not released after snapshot deleted -- Key: CASSANDRA-9953 URL: https://issues.apache.org/jira/browse/CASSANDRA-9953 Project: Cassandra Issue Type: Bug Components: Core Reporter: Imri Zvik We are seeing a lot of opened file descriptors to deleted snapshots (deleted using nodetool clearsnapshot): {code} java128657 cassandra DELREG 253,2 569272514 /var/lib/cassandra/data/accounts/account_store_data/snapshots/feb5f790-316e-11e5-aec0-472b0d6e3fd4/accounts-account_store_data-jb-264593-Index.db java128657 cassandra DELREG 253,2 1610616657 /var/lib/cassandra/data/accounts/account_store_counters/snapshots/03aa4710-316f-11e5-aec0-472b0d6e3fd4/accounts-account_store_counters-jb-635527-Index.db java128657 cassandra DELREG 253,2 1610613856 /var/lib/cassandra/data/accounts/account_store_counters/snapshots/43c17170-316f-11e5-aec0-472b0d6e3fd4/accounts-account_store_counters-jb-635675-Index.db java128657 cassandra DELREG 253,2 1610613052 /var/lib/cassandra/data/accounts/account_store_counters/snapshots/18e001a0-3170-11e5-aec0-472b0d6e3fd4/accounts-account_store_counters-jb-636200-Index.db [root@cassandra002 ~]# lsof -np 128657 |grep -c DEL 56682 {code} They are probably created by the routine repair process, but they are never cleared (restarting the Cassandra process clears them, of course). We are seeing these also after all repair processes finished, and no repair process is running in the cluster. There are no errors or fatals in the system.log. We are using Datastax community edition 2.0.13, installed from RPMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660218#comment-14660218 ] Jonathan Ellis commented on CASSANDRA-9927: --- I'm okay with either require explicit grants or always validate against base table for 3.0. Security for MaterializedViews -- Key: CASSANDRA-9927 URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 Project: Cassandra Issue Type: Task Reporter: T Jake Luciani Labels: materializedviews Fix For: 3.0 beta 1 We need to think about how to handle security wrt materialized views. Since they are based on a source table we should possibly inherit the same security model as that table. However I can see cases where users would want to create different security auth for different views. esp once we have CASSANDRA-9664 and users can filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9985) Introduce our own AbstractIterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9985: -- Reviewer: Ariel Weisberg Introduce our own AbstractIterator -- Key: CASSANDRA-9985 URL: https://issues.apache.org/jira/browse/CASSANDRA-9985 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Benedict Assignee: Benedict Priority: Trivial Fix For: 3.0.0 rc1 The Guava AbstractIterator not only has unnecessary method call depth, it is difficult to debug without attaching source. Since it's absolutely trivial to write our own, and it's used widely within the codebase, I think we should do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9533) Make batch commitlog mode easier to tune
[ https://issues.apache.org/jira/browse/CASSANDRA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9533: -- Reviewer: Ariel Weisberg [~aweisberg] to review Make batch commitlog mode easier to tune Key: CASSANDRA-9533 URL: https://issues.apache.org/jira/browse/CASSANDRA-9533 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Benedict Fix For: 3.x As discussed in CASSANDRA-9504, 2.1 changed commitlog_sync_batch_window_in_ms from a maximum time to wait between fsync to the minimum time, so one must be very careful to keep it small enough that most writers aren't kept waiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9992) Sending batchlog verb to previous versions
[ https://issues.apache.org/jira/browse/CASSANDRA-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9992: -- Fix Version/s: (was: 3.0 beta 1) 3.0.0 rc1 Sending batchlog verb to previous versions -- Key: CASSANDRA-9992 URL: https://issues.apache.org/jira/browse/CASSANDRA-9992 Project: Cassandra Issue Type: Bug Reporter: Carl Yeksigian Assignee: Carl Yeksigian Fix For: 3.0.0 rc1 We are currently sending {{Verb.BATCHLOG_MUTATION}} in {{StorageProxy.syncWriteToBatchlog}} and {{StorageProxy.asyncRemoveFromBatchlog}}. to previous versions which do not have that Verb. We should be sending them {{Verb.MUTATION}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658812#comment-14658812 ] Jonathan Ellis commented on CASSANDRA-9927: --- Why can't we just inherit base table permissions for 3.0? Security for MaterializedViews -- Key: CASSANDRA-9927 URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 Project: Cassandra Issue Type: Task Reporter: T Jake Luciani Labels: materializedviews Fix For: 3.0 beta 1 We need to think about how to handle security wrt materialized views. Since they are based on a source table we should possibly inherit the same security model as that table. However I can see cases where users would want to create different security auth for different views. esp once we have CASSANDRA-9664 and users can filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9927) Security for MaterializedViews
[ https://issues.apache.org/jira/browse/CASSANDRA-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659028#comment-14659028 ] Jonathan Ellis commented on CASSANDRA-9927: --- Long term, we do want to support this. But it's pretty late to start design for 3.0. Security for MaterializedViews -- Key: CASSANDRA-9927 URL: https://issues.apache.org/jira/browse/CASSANDRA-9927 Project: Cassandra Issue Type: Task Reporter: T Jake Luciani Labels: materializedviews Fix For: 3.0 beta 1 We need to think about how to handle security wrt materialized views. Since they are based on a source table we should possibly inherit the same security model as that table. However I can see cases where users would want to create different security auth for different views. esp once we have CASSANDRA-9664 and users can filter out sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3
[ https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659423#comment-14659423 ] Jonathan Ellis commented on CASSANDRA-9302: --- There's no need to be passive aggressive. Here's the reason it was tagged Later, straight from the comments: bq. Whatever we end up with under the hood, I think that cqlsh and COPY are the right front end to present to users rather than a separate loader executable. Optimize cqlsh COPY FROM, part 3 Key: CASSANDRA-9302 URL: https://issues.apache.org/jira/browse/CASSANDRA-9302 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jonathan Ellis Assignee: David Kua Fix For: 2.1.x We've had some discussion moving to Spark CSV import for bulk load in 3.x, but people need a good bulk load tool now. One option is to add a separate Java bulk load tool (CASSANDRA-9048), but if we can match that performance from cqlsh I would prefer to leave COPY FROM as the preferred option to which we point people, rather than adding more tools that need to be supported indefinitely. Previous work on COPY FROM optimization was done in CASSANDRA-7405 and CASSANDRA-8225. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes
[ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659424#comment-14659424 ] Jonathan Ellis commented on CASSANDRA-5220: --- I'm okay with adding this to 3.0, since otherwise we'll need to wait for either 8110 or 4.0, and I don't think that's fair to Marcus since he had the first version written months ago. Repair improvements when using vnodes - Key: CASSANDRA-5220 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.2.0 beta 1 Reporter: Brandon Williams Assignee: Marcus Olsson Labels: performance, repair Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, cassandra-3.0-5220.patch Currently when using vnodes, repair takes much longer to complete than without them. This appears at least in part because it's using a session per range and processing them sequentially. This generates a lot of log spam with vnodes, and while being gentler and lighter on hard disk deployments, ssd-based deployments would often prefer that repair be as fast as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes
[ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654755#comment-14654755 ] Jonathan Ellis commented on CASSANDRA-5220: --- We can't support repair anyway with older-version nodes until we have CASSANDRA-8110, so don't worry about it here. Repair improvements when using vnodes - Key: CASSANDRA-5220 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.2.0 beta 1 Reporter: Brandon Williams Assignee: Marcus Olsson Labels: performance, repair Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, cassandra-3.0-5220.patch Currently when using vnodes, repair takes much longer to complete than without them. This appears at least in part because it's using a session per range and processing them sequentially. This generates a lot of log spam with vnodes, and while being gentler and lighter on hard disk deployments, ssd-based deployments would often prefer that repair be as fast as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9945) Add transparent data encryption core classes
[ https://issues.apache.org/jira/browse/CASSANDRA-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654250#comment-14654250 ] Jonathan Ellis commented on CASSANDRA-9945: --- 3.2 actually. (We should branch 3.1 from 3.0 on release.) Add transparent data encryption core classes Key: CASSANDRA-9945 URL: https://issues.apache.org/jira/browse/CASSANDRA-9945 Project: Cassandra Issue Type: Improvement Reporter: Jason Brown Assignee: Jason Brown Labels: encryption Fix For: 3.x This patch will add the core infrastructure classes necessary for transparent data encryption (file-level encryption), as required for CASSANDRA-6018 and CASSANDRA-9633. The phrase transparent data encryption, while not the most aesthetically pleasing, seems to be used throughout the database industry (Oracle, SQLQServer, Datastax Enterprise) to describe file level encryption, so we'll go with that, as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9129) HintedHandoff in pending state forever after upgrading to 2.0.14 from 2.0.11 and 2.0.12
[ https://issues.apache.org/jira/browse/CASSANDRA-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9129: -- Reviewer: Aleksey Yeschenko HintedHandoff in pending state forever after upgrading to 2.0.14 from 2.0.11 and 2.0.12 --- Key: CASSANDRA-9129 URL: https://issues.apache.org/jira/browse/CASSANDRA-9129 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04.5 LTS AWS (m3.xlarge) 15G RAM 4 core Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Cassandra 2.0.14 Reporter: Russ Lavoie Assignee: Sam Tunnicliffe Fix For: 2.0.x Attachments: 9129-2.0.txt Upgrading from Cassandra 2.0.11 or 2.0.12 to 2.0.14 I am seeing a pending hinted hand off that never clears. New hinted hand offs that go into pending waiting for a node to come up clear as expected. But 1 always remains. I through the following steps. 1) stop cassandra 2) Upgrade cassandra to 2.0.14 3) Start cassandra 4) nodetool tpstats There are no errors in the logs, to help with this issue. I ran a few nodetool commands to get some data and pasted them below: Below is what is shown after running nodetool status on each node in the ring {code}Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN NODE1 279.8 MB 256 34.9% HOSTID rack1 UN NODE2 279.79 MB 256 33.0% HOSTID rack1 UN NODE3 279.87 MB 256 32.1% HOSTID rack1 {code} Below is what is shown after running nodetool tpstats on each node in the ring showing a single HintedHandoff in pending status that never clears {code} Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 14550 0 0 RequestResponseStage 0 0 113040 0 0 MutationStage 0 0 168873 0 0 ReadRepairStage 0 0 1147 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 232112 0 0 CacheCleanupExecutor 0 0 0 0 0 MigrationStage0 0 0 0 0 MemoryMeter 0 0 6 0 0 FlushWriter 0 0 38 0 0 ValidationExecutor0 0 0 0 0 InternalResponseStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 MemtablePostFlusher 0 0 1333 0 0 MiscStage 0 0 0 0 0 PendingRangeCalculator0 0 6 0 0 CompactionExecutor0 0178 0 0 commitlog_archiver0 0 0 0 0 HintedHandoff 0 1133 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 PAGED_RANGE 0 BINARY 0 READ 0 MUTATION 0 _TRACE 0 REQUEST_RESPONSE 0 COUNTER_MUTATION 0 {code} Below is what is shown after running nodetool cfstats system.hints on all 3 nodes. {code} Keyspace: system Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Table: hints SSTable count: 0 Space used (live), bytes: 0 Space used (total), bytes: 0 Off heap memory used (total), bytes: 0 SSTable Compression Ratio: 0.0 Number of keys (estimate): 0 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 0 Local read count: 0 Local read latency: 0.000 ms Local write count: 0 Local write latency: 0.000 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0
[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes
[ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654147#comment-14654147 ] Jonathan Ellis commented on CASSANDRA-5220: --- Very substantial. Excited to get this in! Repair improvements when using vnodes - Key: CASSANDRA-5220 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.2.0 beta 1 Reporter: Brandon Williams Assignee: Marcus Olsson Labels: performance, repair Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, cassandra-3.0-5220.patch Currently when using vnodes, repair takes much longer to complete than without them. This appears at least in part because it's using a session per range and processing them sequentially. This generates a lot of log spam with vnodes, and while being gentler and lighter on hard disk deployments, ssd-based deployments would often prefer that repair be as fast as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9932) Make all partitions btree backed
[ https://issues.apache.org/jira/browse/CASSANDRA-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9932: -- Reviewer: Ariel Weisberg Make all partitions btree backed Key: CASSANDRA-9932 URL: https://issues.apache.org/jira/browse/CASSANDRA-9932 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0.0 rc1 Following on from the other btree related refactors, this patch makes all partition (and partition-like) objects backed by the same basic structure: {{AbstractBTreePartition}}. With two main offshoots: {{ImmutableBTreePartition}} and {{AtomicBTreePartition}} The main upshot is a 30% net code reduction, meaning better exercise of btree code paths and fewer new code paths to go wrong. A secondary upshort is that, by funnelling all our comparisons through a btree, there is a higher likelihood of icache occupancy and we have only one area to focus delivery of improvements for their enjoyment by all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9459) SecondaryIndex API redesign
[ https://issues.apache.org/jira/browse/CASSANDRA-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653870#comment-14653870 ] Jonathan Ellis commented on CASSANDRA-9459: --- First reaction: I'd rather use some kind of function call syntax so that it's distinct from normal columns. Second reaction: Not sure conflating with UDF is much better. Maybe need to think on this some more. SecondaryIndex API redesign --- Key: CASSANDRA-9459 URL: https://issues.apache.org/jira/browse/CASSANDRA-9459 Project: Cassandra Issue Type: Improvement Reporter: Sam Tunnicliffe Assignee: Sam Tunnicliffe Fix For: 3.0 beta 1 For some time now the index subsystem has been a pain point and in large part this is due to the way that the APIs and principal classes have grown organically over the years. It would be a good idea to conduct a wholesale review of the area and see if we can come up with something a bit more coherent. A few starting points: * There's a lot in AbstractPerColumnSecondaryIndex its subclasses which could be pulled up into SecondaryIndexSearcher (note that to an extent, this is done in CASSANDRA-8099). * SecondayIndexManager is overly complex and several of its functions should be simplified/re-examined. The handling of which columns are indexed and index selection on both the read and write paths are somewhat dense and unintuitive. * The SecondaryIndex class hierarchy is rather convoluted and could use some serious rework. There are a number of outstanding tickets which we should be able to roll into this higher level one as subtasks (but I'll defer doing that until getting into the details of the redesign): * CASSANDRA-7771 * CASSANDRA-8103 * CASSANDRA-9041 * CASSANDRA-4458 * CASSANDRA-8505 Whilst they're not hard dependencies, I propose that this be done on top of both CASSANDRA-8099 and CASSANDRA-6717. The former largely because the storage engine changes may facilitate a friendlier index API, but also because of the changes to SIS mentioned above. As for 6717, the changes to schema tables there will help facilitate CASSANDRA-7771. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9917) MVs should validate gc grace seconds on the tables involved
[ https://issues.apache.org/jira/browse/CASSANDRA-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9917: -- Reviewer: Marcus Eriksson MVs should validate gc grace seconds on the tables involved --- Key: CASSANDRA-9917 URL: https://issues.apache.org/jira/browse/CASSANDRA-9917 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Carl Yeksigian Fix For: 3.0 beta 1 For correctness reasons (potential resurrection of dropped values), batchlog entries are TTLs with the lowest gc grace second of all the tables involved in a batch. It means that if gc gs is set to 0 in one of the tables, the batchlog entry will be dead on arrival, and never replayed. We should probably warn against such LOGGED writes taking place, in general, but for MVs, we must validate that gc gs on the base table (and on the MV table, if we should allow altering gc gs there at all), is never set too low, or else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9917) MVs should validate gc grace seconds on the tables involved
[ https://issues.apache.org/jira/browse/CASSANDRA-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9917: -- Assignee: Carl Yeksigian MVs should validate gc grace seconds on the tables involved --- Key: CASSANDRA-9917 URL: https://issues.apache.org/jira/browse/CASSANDRA-9917 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Carl Yeksigian Fix For: 3.0 beta 1 For correctness reasons (potential resurrection of dropped values), batchlog entries are TTLs with the lowest gc grace second of all the tables involved in a batch. It means that if gc gs is set to 0 in one of the tables, the batchlog entry will be dead on arrival, and never replayed. We should probably warn against such LOGGED writes taking place, in general, but for MVs, we must validate that gc gs on the base table (and on the MV table, if we should allow altering gc gs there at all), is never set too low, or else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9961) cqlsh should have DESCRIBE MATERIALIZED VIEW
[ https://issues.apache.org/jira/browse/CASSANDRA-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9961: -- Assignee: Stefania cqlsh should have DESCRIBE MATERIALIZED VIEW Key: CASSANDRA-9961 URL: https://issues.apache.org/jira/browse/CASSANDRA-9961 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Stefania Labels: materializedviews Fix For: 3.0 beta 1 cqlsh doesn't currently produce describe output that can be used to recreate a MV. Needs to add a new {{DESCRIBE MATERIALIZED VIEW}} command, and also add to {{DESCRIBE KEYSPACE}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9961) cqlsh should have DESCRIBE MATERIALIZED VIEW
[ https://issues.apache.org/jira/browse/CASSANDRA-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9961: -- Reviewer: Benjamin Lerer cqlsh should have DESCRIBE MATERIALIZED VIEW Key: CASSANDRA-9961 URL: https://issues.apache.org/jira/browse/CASSANDRA-9961 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Stefania Labels: materializedviews Fix For: 3.0 beta 1 cqlsh doesn't currently produce describe output that can be used to recreate a MV. Needs to add a new {{DESCRIBE MATERIALIZED VIEW}} command, and also add to {{DESCRIBE KEYSPACE}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9963) Compaction not starting for new tables
[ https://issues.apache.org/jira/browse/CASSANDRA-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652030#comment-14652030 ] Jonathan Ellis commented on CASSANDRA-9963: --- is this something we could catch with a utest? Compaction not starting for new tables -- Key: CASSANDRA-9963 URL: https://issues.apache.org/jira/browse/CASSANDRA-9963 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jeremiah Jordan Assignee: Marcus Eriksson Fix For: 2.1.x Attachments: 0001-dont-use-isEnabled-since-that-checks-isActive.patch Something committed since 2.1.8 broke cassandra-2.1 HEAD {noformat} create keyspace test with replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; create table test.stcs ( a int PRIMARY KEY , b int); {noformat} repeat more than 4 times: {noformat} insert into test.stcs (a, b) VALUES ( 1, 1); nodetool flush test stcs ls data dir/test/stcs-* {noformat} See a bunch of sstables where STCS should have kicked in and compacted them down some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9971) Static variables with small page sizes
[ https://issues.apache.org/jira/browse/CASSANDRA-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9971: -- Assignee: Benjamin Lerer Static variables with small page sizes -- Key: CASSANDRA-9971 URL: https://issues.apache.org/jira/browse/CASSANDRA-9971 Project: Cassandra Issue Type: Bug Components: Tests Environment: Local Reporter: Steve Wang Assignee: Benjamin Lerer Fix For: 3.x Attachments: static_paging_test.py Selecting static variables with small page sizes causes them to display as None. With large page sizes and non-static variables, test still pass. Works fine in 2.1.x. Not sure if it runs in 2.2.x (I can't seem to run C* version 2.2.x). Run the test below to see error. Remove the list on line 21 to see what's actually erroring. Related to 8502. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9945) Add transparent data encryption core classes
[ https://issues.apache.org/jira/browse/CASSANDRA-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9945: -- Fix Version/s: (was: 3.0 beta 1) 3.x Add transparent data encryption core classes Key: CASSANDRA-9945 URL: https://issues.apache.org/jira/browse/CASSANDRA-9945 Project: Cassandra Issue Type: Improvement Reporter: Jason Brown Assignee: Jason Brown Labels: encryption Fix For: 3.x This patch will add the core infrastructure classes necessary for transparent data encryption (file-level encryption), as required for CASSANDRA-6018 and CASSANDRA-9633. The phrase transparent data encryption, while not the most aesthetically pleasing, seems to be used throughout the database industry (Oracle, SQLQServer, Datastax Enterprise) to describe file level encryption, so we'll go with that, as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default
[ https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652574#comment-14652574 ] Jonathan Ellis commented on CASSANDRA-9889: --- Very well, I do not throw a binding -1. Disable scripted UDFs by default Key: CASSANDRA-9889 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Priority: Minor Fix For: 3.0.0 rc1 (Follow-up to CASSANDRA-9402) TL;DR this ticket is about to add an other config option to enable scripted UDFs. Securing Java-UDFs is much easier than scripted UDFs. The secure execution of scripted UDFs heavily relies on how secure a particular script provider implementation is. Nashorn is probably pretty good at this - but (as discussed offline with [~iamaleksey]) we are not certain. This becomes worse with other JSR-223 providers (which need to be installed by the user anyway). E.g.: {noformat} # Enables use of scripted UDFs. # Java UDFs are always enabled, if enable_user_defined_functions is true. # Enable this option to be able to use UDFs with language javascript or any custom JSR-223 provider. enable_scripted_user_defined_functions: false {noformat} TBH: I would feel more comfortable to have this one. But we should review this along with enable_user_defined_functions for 4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6018) Add option to encrypt commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6018: -- Reviewer: Branimir Lambov Add option to encrypt commitlog Key: CASSANDRA-6018 URL: https://issues.apache.org/jira/browse/CASSANDRA-6018 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jason Brown Assignee: Jason Brown Labels: commit_log, encryption, security Fix For: 3.x We are going to start using cassandra for a billing system, and while I can encrypt sstables at rest (via Datastax Enterprise), commit logs are more or less plain text. Thus, an attacker would be able to easily read, for example, credit card numbers in the clear text commit log (if the calling app does not encrypt the data itself before sending it to cassandra). I want to allow the option of encrypting the commit logs, most likely controlled by a property in the yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6018) Add option to encrypt commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6018: -- Fix Version/s: (was: 3.0 beta 1) 3.x Add option to encrypt commitlog Key: CASSANDRA-6018 URL: https://issues.apache.org/jira/browse/CASSANDRA-6018 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jason Brown Assignee: Jason Brown Labels: commit_log, encryption, security Fix For: 3.x We are going to start using cassandra for a billing system, and while I can encrypt sstables at rest (via Datastax Enterprise), commit logs are more or less plain text. Thus, an attacker would be able to easily read, for example, credit card numbers in the clear text commit log (if the calling app does not encrypt the data itself before sending it to cassandra). I want to allow the option of encrypting the commit logs, most likely controlled by a property in the yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default
[ https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652564#comment-14652564 ] Jonathan Ellis commented on CASSANDRA-9889: --- I could be missing something, but I'm not a huge fan of adding config switches that replicate limited pieces of authz functionality. Isn't this config switch the equivalent of don't grant EXECUTE TRUSTED to anyone? Disable scripted UDFs by default Key: CASSANDRA-9889 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Priority: Minor Fix For: 3.0.0 rc1 (Follow-up to CASSANDRA-9402) TL;DR this ticket is about to add an other config option to enable scripted UDFs. Securing Java-UDFs is much easier than scripted UDFs. The secure execution of scripted UDFs heavily relies on how secure a particular script provider implementation is. Nashorn is probably pretty good at this - but (as discussed offline with [~iamaleksey]) we are not certain. This becomes worse with other JSR-223 providers (which need to be installed by the user anyway). E.g.: {noformat} # Enables use of scripted UDFs. # Java UDFs are always enabled, if enable_user_defined_functions is true. # Enable this option to be able to use UDFs with language javascript or any custom JSR-223 provider. enable_scripted_user_defined_functions: false {noformat} TBH: I would feel more comfortable to have this one. But we should review this along with enable_user_defined_functions for 4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9967) Determine if a Materialized View is built (consistent with its base table after its creation)
[ https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9967: -- Fix Version/s: (was: 3.0.0 rc1) 3.0 beta 1 Determine if a Materialized View is built (consistent with its base table after its creation) - Key: CASSANDRA-9967 URL: https://issues.apache.org/jira/browse/CASSANDRA-9967 Project: Cassandra Issue Type: Improvement Reporter: Alan Boudreault Fix For: 3.0 beta 1 Since MVs are eventually consistent with its base table, It would nice if we could easily know the state of the MV after its creation, so we could wait until the MV is built before doing some operations. // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9967) Determine if a Materialized View is built (consistent with its base table after its creation)
[ https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9967: -- Priority: Major (was: Minor) Determine if a Materialized View is built (consistent with its base table after its creation) - Key: CASSANDRA-9967 URL: https://issues.apache.org/jira/browse/CASSANDRA-9967 Project: Cassandra Issue Type: Improvement Reporter: Alan Boudreault Fix For: 3.0 beta 1 Since MVs are eventually consistent with its base table, It would nice if we could easily know the state of the MV after its creation, so we could wait until the MV is built before doing some operations. // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9955) In 3 node Cluster, when 1 node was forced down, data failures are observed in other 2 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651868#comment-14651868 ] Jonathan Ellis commented on CASSANDRA-9955: --- I'm not sure what failures you're referring to. Nothing in the log you posted looks unexpected. In 3 node Cluster, when 1 node was forced down, data failures are observed in other 2 nodes. Key: CASSANDRA-9955 URL: https://issues.apache.org/jira/browse/CASSANDRA-9955 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.14, Hector Client (1.0.1), Red Hat Linux OS, Reporter: Amit Singh Chowdhery Issue : On 3 node cluster, inserts are happening normally but when 1 node was pulled down, After few minutes application stops and then failure start coming on both nodes. Hector keeps Exception Logs hector: ERROR m.p.c.c.ConcurrentHClientPool - Transport exception in re-opening client in release on ConcurrentCassandraClientPoolByHost. Cassandra Debug Logs : DEBUG [OptionalTasks:1] 2015-07-31 11:57:37,698 ColumnFamilyStore.java (line 300) retryPolicy for local is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:38,969 ColumnFamilyStore.java (line 300) retryPolicy for encryptionKey is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:39,492 ColumnFamilyStore.java (line 300) retryPolicy for vouchers.c_per__batchIdIdx is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:39,504 ColumnFamilyStore.java (line 300) retryPolicy for vouchers.TX_STATEIdx is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:39,824 ColumnFamilyStore.java (line 300) retryPolicy for vouchers.c_per__serialNumberIdx is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:39,824 ColumnFamilyStore.java (line 300) retryPolicy for vouchers.c_per__subStateIdx is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:39,828 ColumnFamilyStore.java (line 300) retryPolicy for vouchers is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:40,011 ColumnFamilyStore.java (line 300) retryPolicy for voucherHistory is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:40,021 ColumnFamilyStore.java (line 300) retryPolicy for vshash is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:40,180 ColumnFamilyStore.java (line 300) retryPolicy for vouchersByPurgeDate is 0.99 DEBUG [OptionalTasks:1] 2015-07-31 11:57:40,395 ColumnFamilyStore.java (line 300) retryPolicy for serialNums is 0.99 DEBUG [Thrift:7] 2015-07-31 11:57:40,452 CassandraServer.java (line 311) get_slice DEBUG [Thrift:35] 2015-07-31 11:57:40,452 CassandraServer.java (line 943) batch_mutate DEBUG [MutationStage:56] 2015-07-31 11:57:40,453 StorageProxy.java (line 928) Adding hint for /192.168.5.65 DEBUG [Thrift:35] 2015-07-31 11:57:40,453 Tracing.java (line 159) request complete DEBUG [Thrift:7] 2015-07-31 11:57:40,453 RowDigestResolver.java (line 62) resolving 2 responses DEBUG [Thrift:7] 2015-07-31 11:57:40,453 RowDigestResolver.java (line 94) resolve: 0 ms. DEBUG [Thrift:7] 2015-07-31 11:57:40,454 StorageProxy.java (line 1275) Read: 1 ms. Steps to reproduce Step 1 : In 3 node cluster, on any 2 node start inserting records. Step 2 : Take down the node on which data insertion was not happening (init 0). Step 3: Failures can be seen on other 2 nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9957) Unable to build Apache Cassandra Under Debian 8 OS with the provided ant script
[ https://issues.apache.org/jira/browse/CASSANDRA-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-9957. --- Resolution: Not A Problem Something is broken in your environment, but this is not a C* bug. Unable to build Apache Cassandra Under Debian 8 OS with the provided ant script --- Key: CASSANDRA-9957 URL: https://issues.apache.org/jira/browse/CASSANDRA-9957 Project: Cassandra Issue Type: Bug Environment: PRETTY_NAME=Debian GNU/Linux 8 (jessie) NAME=Debian GNU/Linux VERSION_ID=8 VERSION=8 (jessie) ID=debian HOME_URL=http://www.debian.org/; SUPPORT_URL=http://www.debian.org/support/; BUG_REPORT_URL=https://bugs.debian.org/; java version 1.8.0_45 Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) Apache Ant(TM) version 1.9.5 compiled on May 31 2015 Reporter: Adelin M.Ghanayem Labels: Cassandra, ant, build, build.xml Trying to use the tool CCM ( Cassandra Cluster Manger ) I've been blocked by an issue related to compiling Cassandra source. CCM installs Cassandra builds it source before anything else. However the CCM thrown an error https://gist.github.com/AdelinGhanaem/593d1c8a63857113d0a7 here you can find all info you need. I've then tried to download the source and compile it using ant jar but I've got the same error. Basically the jars that are installed then running ant jar are corrupted ! Extract them with jar xf thrown an error. The only way that I could build the source is by downloading the jars by hand from maven. I've described the error and the process in this post here http://mradelin.blogspot.com/2015/07/error-packaging-cassandra-220-db-source_31.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)