[jira] [Commented] (CASSANDRA-6506) counters++ split counter context shards into separate cells
[ https://issues.apache.org/jira/browse/CASSANDRA-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944813#comment-13944813 ] Sylvain Lebresne commented on CASSANDRA-6506: - Are we talking about 6506: Clean up CFMetaData and 6506: Clean up Cell/OnDiskAtom? Asking because I can't seem to match the commit hash of your comment to the 2nd one in particular (nor git was able to find said commit hash on your branch from above). Anyway, if that's the ones, definitively no objections on the first one. On the second one, shouldn't we continue passing the allocator down to CounterContext up until we do this properly? (but I'm good with the min/maxTimestamp refactoring part). counters++ split counter context shards into separate cells --- Key: CASSANDRA-6506 URL: https://issues.apache.org/jira/browse/CASSANDRA-6506 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 2.1 beta2 This change is related to, but somewhat orthogonal to CASSANDRA-6504. Currently all the shard tuples for a given counter cell are packed, in sorted order, in one binary blob. Thus reconciling N counter cells requires allocating a new byte buffer capable of holding the union of the two context's shards N-1 times. For writes, in post CASSANDRA-6504 world, it also means reading more data than we have to (the complete context, when all we need is the local node's global shard). Splitting the context into separate cells, one cell per shard, will help to improve this. We did a similar thing with super columns for CASSANDRA-3237. Incidentally, doing this split is now possible thanks to CASSANDRA-3237. Doing this would also simplify counter reconciliation logic. Getting rid of old contexts altogether can be done trivially with upgradesstables. In fact, we should be able to put the logical clock into the cell's timestamp, and use regular Cell-s and regular Cell reconcile() logic for the shards, especially once we get rid of the local/remote shards some time in the future (until then we still have to differentiate between global/remote/local shards and their priority rules). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6575) By default, Cassandra should refuse to start if JNA can't be initialized properly
[ https://issues.apache.org/jira/browse/CASSANDRA-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944817#comment-13944817 ] Benedict commented on CASSANDRA-6575: - Note that this dependency is not on a *functioning* JNA, but on the JNA jar itself only, for Java internal functionality. This dependency is removed anyway once we get CASSANDRA-6694, so I don't think it is an issue, and if it is we will hopefully fix it shortly regardless. By default, Cassandra should refuse to start if JNA can't be initialized properly - Key: CASSANDRA-6575 URL: https://issues.apache.org/jira/browse/CASSANDRA-6575 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Tupshin Harper Assignee: Clément Lardeur Priority: Minor Labels: lhf Fix For: 2.1 beta1 Attachments: trunk-6575-v2.patch, trunk-6575-v3.patch, trunk-6575-v4.patch, trunk-6575.patch Failure to have JNA working properly is such a common undetected problem that it would be far preferable to have Cassandra refuse to startup unless JNA is initialized. In theory, this should be much less of a problem with Cassandra 2.1 due to CASSANDRA-5872, but even there, it might fail due to native lib problems, or might otherwise be misconfigured. A yaml override, such as boot_without_jna would allow the deliberate overriding of this policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed
[ https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944821#comment-13944821 ] Benedict commented on CASSANDRA-6746: - @enigmacurry looks kosher. It won't be eliminating any drop at the start, it is just moving the timing of the drops (and making them shorter). I think we should rename this ticket to compaction destroys page cache and split out a new ticket for [~xedin]'s changes (page cache population is suboptimal) which may be sensible in principle. In practice, moving the WILLNEED into the getSegment() call is dangerous as the segment is used past the initial 64Kb, and if we rely on ourselves only for read-ahead this could result in very substandard performance for larger rows. We also probably want to only WILLNEED the actual size of the buffer we expect to read for compressed files. But this was only a proof-of-concept, and in principle the idea is probably sound. Reads have a slow ramp up in speed -- Key: CASSANDRA-6746 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746 Project: Cassandra Issue Type: Bug Components: Core Reporter: Ryan McGuire Assignee: Benedict Labels: performance Fix For: 2.1 beta2 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 6746-patched.png, 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, cassandra-2.1-bdplab-trial-fincore.tar.bz2 On a physical four node cluister I am doing a big write and then a big read. The read takes a long time to ramp up to respectable speeds. !2.1_vs_2.0_read.png! [See data here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6912) SSTableReader.isReplaced does not allow for safe resource cleanup
Benedict created CASSANDRA-6912: --- Summary: SSTableReader.isReplaced does not allow for safe resource cleanup Key: CASSANDRA-6912 URL: https://issues.apache.org/jira/browse/CASSANDRA-6912 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Fix For: 2.1 beta2 There are a number of possible race conditions on resource cleanup from the use of cloneWithNewSummarySamplingLevel, because the replacement sstable can be itself replaced/obsoleted while the prior sstable is still referenced (this is actually quite easy with compaction, but can happen in other circumstances less commonly). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6912) SSTableReader.isReplaced does not allow for safe resource cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944873#comment-13944873 ] Benedict commented on CASSANDRA-6912: - Patch available [here|https://github.com/belliottsmith/cassandra/tree/6912-fix.sstablereader.cloneWith] SSTableReader.isReplaced does not allow for safe resource cleanup - Key: CASSANDRA-6912 URL: https://issues.apache.org/jira/browse/CASSANDRA-6912 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Fix For: 2.1 beta2 There are a number of possible race conditions on resource cleanup from the use of cloneWithNewSummarySamplingLevel, because the replacement sstable can be itself replaced/obsoleted while the prior sstable is still referenced (this is actually quite easy with compaction, but can happen in other circumstances less commonly). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6912) SSTableReader.isReplaced does not allow for safe resource cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944873#comment-13944873 ] Benedict edited comment on CASSANDRA-6912 at 3/24/14 9:54 AM: -- Patch available [here|https://github.com/belliottsmith/cassandra/tree/6912-fix.sstablereader.cloneWith] Also fixes size maintenance in DataTracker, which was almost certainly not actually accounting for the reduction in disk utilisation, as the calculation looks at the files on disk which have been replaced by then. was (Author: benedict): Patch available [here|https://github.com/belliottsmith/cassandra/tree/6912-fix.sstablereader.cloneWith] SSTableReader.isReplaced does not allow for safe resource cleanup - Key: CASSANDRA-6912 URL: https://issues.apache.org/jira/browse/CASSANDRA-6912 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Fix For: 2.1 beta2 There are a number of possible race conditions on resource cleanup from the use of cloneWithNewSummarySamplingLevel, because the replacement sstable can be itself replaced/obsoleted while the prior sstable is still referenced (this is actually quite easy with compaction, but can happen in other circumstances less commonly). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6911) Netty dependency update broke stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944876#comment-13944876 ] Sylvain Lebresne commented on CASSANDRA-6911: - Oh, that's because the java driver is still on Netty 3. I do plan on migrating it too to Netty 4 but haven't got to it. That being said, since Netty completely changed it's package name between 3 and 4, I suspect just dropping back the Netty 3 jar in the stress lib dir should be good enough. Can you give that a shot [~enigmacurry]? Netty dependency update broke stress Key: CASSANDRA-6911 URL: https://issues.apache.org/jira/browse/CASSANDRA-6911 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Ryan McGuire Assignee: Benedict I compiled stress fresh from cassandra-2.1 and running this command: {code} cassandra-stress write n=1900 -rate threads=50 -node bdplab {code} I get the following traceback: {code} Exception in thread Thread-49 java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelFactory at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:941) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:889) at com.datastax.driver.core.Cluster.init(Cluster.java:88) at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:144) at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:854) at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:74) at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:155) at org.apache.cassandra.stress.settings.StressSettings.getSmartThriftClient(StressSettings.java:70) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:275) Caused by: java.lang.ClassNotFoundException: org.jboss.netty.channel.ChannelFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more {code} It seems this was introduced with an updated netty jar in cbf304ebd0436a321753e81231545b705aa8dd23 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6913) Compaction of system keyspaces during startup can cause early loading of non-system keyspaces
Benedict created CASSANDRA-6913: --- Summary: Compaction of system keyspaces during startup can cause early loading of non-system keyspaces Key: CASSANDRA-6913 URL: https://issues.apache.org/jira/browse/CASSANDRA-6913 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.1 beta2 This then can result in an inconsistent CFS state, as cleanup of e.g. compaction leftovers does not get reflected in DataTracker. It happens because StorageService.getLoad() iterates over and opens all CFS, and this is called by Compaction. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944885#comment-13944885 ] Sylvain Lebresne commented on CASSANDRA-6694: - bq. I see a 25% throughput improvement using offheap_objects as the allocator type vs either on/off heap buffers. That's definitively good to know, but does that suggest that without this, there isn't much notable performance difference between on and off heap buffers? Because if that's the case, I'm still of the opinion that it could be worth moving this to 3.0 on the argument that we've moved stuff in 2.1 last minute enough as it is. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944889#comment-13944889 ] Benedict commented on CASSANDRA-6694: - This specific patch only permits larger amounts of data to be retained in memtables; the only speed-wise performance implications of this are the ones stated here, i.e. improved write throughput through reduced write amplification and the writing of larger files. For this workload there's basically no difference between on and off-heap (CASSANDRA-6689) ByteBuffer backed storage, if that's what you're asking, because the on-heap overhead still heavily outweighs the off-heap utilisation. This would not be true for workloads with large per-column payloads. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6781) ByteBuffer write() methods for serializing sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6781: Attachment: 6781.removeavro.txt Attaching simple patch that removes avro dependency for DataOutputTest ByteBuffer write() methods for serializing sstables --- Key: CASSANDRA-6781 URL: https://issues.apache.org/jira/browse/CASSANDRA-6781 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.1 beta2 Attachments: 6781.removeavro.txt As mentioned in CASSANDRA-6689, there may be some performance issues with writing sstables from offheap memtables. This is mostly plausibly caused by the single-byte-at-a-time write path for ByteBuffers, as we use DataOutput which only accepts byte[]. I propose extending DataOutput to include ByteBuffer methods, and to use this extended interface for serializing sstables instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination
[ https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Brosius reopened CASSANDRA-6311: - Add CqlRecordReader to take advantage of native CQL pagination -- Key: CASSANDRA-6311 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Fix For: 2.0.7 Attachments: 6311-v10.txt, 6311-v3-2.0-branch.txt, 6311-v4.txt, 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6311-v7.txt, 6311-v8.txt, 6311-v9.txt, 6331-2.0-branch.txt, 6331-v2-2.0-branch.txt Since the latest Cql pagination is done and it should be more efficient, so we need update CqlPagingRecordReader to use it instead of the custom thrift paging. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination
[ https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944922#comment-13944922 ] Dave Brosius commented on CASSANDRA-6311: - CqlRecordReader.next() doesn't appear to be correct. It assigns values to parameters as if that does something. Add CqlRecordReader to take advantage of native CQL pagination -- Key: CASSANDRA-6311 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Fix For: 2.0.7 Attachments: 6311-v10.txt, 6311-v3-2.0-branch.txt, 6311-v4.txt, 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6311-v7.txt, 6311-v8.txt, 6311-v9.txt, 6331-2.0-branch.txt, 6331-v2-2.0-branch.txt Since the latest Cql pagination is done and it should be more efficient, so we need update CqlPagingRecordReader to use it instead of the custom thrift paging. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination
[ https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944925#comment-13944925 ] Piotr Kołaczkowski commented on CASSANDRA-6311: --- Indeed. Add CqlRecordReader to take advantage of native CQL pagination -- Key: CASSANDRA-6311 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Fix For: 2.0.7 Attachments: 6311-v10.txt, 6311-v3-2.0-branch.txt, 6311-v4.txt, 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6311-v7.txt, 6311-v8.txt, 6311-v9.txt, 6331-2.0-branch.txt, 6331-v2-2.0-branch.txt Since the latest Cql pagination is done and it should be more efficient, so we need update CqlPagingRecordReader to use it instead of the custom thrift paging. -- This message was sent by Atlassian JIRA (v6.2#6252)
git commit: Remove avro usage in DataOutputTest
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 fbc112d4b - 5baa72f7f Remove avro usage in DataOutputTest Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5baa72f7 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5baa72f7 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5baa72f7 Branch: refs/heads/cassandra-2.1 Commit: 5baa72f7f299b4ec190ddb30b897d5519a6a2a75 Parents: fbc112d Author: Marcus Eriksson marc...@apache.org Authored: Mon Mar 24 12:36:05 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Mon Mar 24 12:36:05 2014 +0100 -- test/unit/org/apache/cassandra/io/util/DataOutputTest.java | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5baa72f7/test/unit/org/apache/cassandra/io/util/DataOutputTest.java -- diff --git a/test/unit/org/apache/cassandra/io/util/DataOutputTest.java b/test/unit/org/apache/cassandra/io/util/DataOutputTest.java index 4eeec4d..2a8c7a9 100644 --- a/test/unit/org/apache/cassandra/io/util/DataOutputTest.java +++ b/test/unit/org/apache/cassandra/io/util/DataOutputTest.java @@ -31,14 +31,12 @@ import java.io.IOException; import java.io.RandomAccessFile; import java.nio.ByteBuffer; import java.nio.channels.Channels; -import java.util.Arrays; import java.util.Random; import java.util.concurrent.ThreadLocalRandom; import org.junit.Assert; import org.junit.Test; -import org.apache.avro.util.ByteBufferInputStream; import org.apache.cassandra.utils.ByteBufferUtil; public class DataOutputTest @@ -79,7 +77,7 @@ public class DataOutputTest ByteBuffer buf = wrap(new byte[345], true); DataOutputByteBuffer write = new DataOutputByteBuffer(buf.duplicate()); DataInput canon = testWrite(write); -DataInput test = new DataInputStream(new ByteBufferInputStream(Arrays.asList(buf))); +DataInput test = new DataInputStream(new ByteArrayInputStream(ByteBufferUtil.getArray(buf))); testRead(test, canon); } @@ -89,7 +87,7 @@ public class DataOutputTest ByteBuffer buf = wrap(new byte[345], false); DataOutputByteBuffer write = new DataOutputByteBuffer(buf.duplicate()); DataInput canon = testWrite(write); -DataInput test = new DataInputStream(new ByteBufferInputStream(Arrays.asList(buf))); +DataInput test = new DataInputStream(new ByteArrayInputStream(ByteBufferUtil.getArray(buf))); testRead(test, canon); }
[1/2] git commit: Remove avro usage in DataOutputTest
Repository: cassandra Updated Branches: refs/heads/trunk 6f24097f6 - 6838790f8 Remove avro usage in DataOutputTest Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5baa72f7 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5baa72f7 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5baa72f7 Branch: refs/heads/trunk Commit: 5baa72f7f299b4ec190ddb30b897d5519a6a2a75 Parents: fbc112d Author: Marcus Eriksson marc...@apache.org Authored: Mon Mar 24 12:36:05 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Mon Mar 24 12:36:05 2014 +0100 -- test/unit/org/apache/cassandra/io/util/DataOutputTest.java | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5baa72f7/test/unit/org/apache/cassandra/io/util/DataOutputTest.java -- diff --git a/test/unit/org/apache/cassandra/io/util/DataOutputTest.java b/test/unit/org/apache/cassandra/io/util/DataOutputTest.java index 4eeec4d..2a8c7a9 100644 --- a/test/unit/org/apache/cassandra/io/util/DataOutputTest.java +++ b/test/unit/org/apache/cassandra/io/util/DataOutputTest.java @@ -31,14 +31,12 @@ import java.io.IOException; import java.io.RandomAccessFile; import java.nio.ByteBuffer; import java.nio.channels.Channels; -import java.util.Arrays; import java.util.Random; import java.util.concurrent.ThreadLocalRandom; import org.junit.Assert; import org.junit.Test; -import org.apache.avro.util.ByteBufferInputStream; import org.apache.cassandra.utils.ByteBufferUtil; public class DataOutputTest @@ -79,7 +77,7 @@ public class DataOutputTest ByteBuffer buf = wrap(new byte[345], true); DataOutputByteBuffer write = new DataOutputByteBuffer(buf.duplicate()); DataInput canon = testWrite(write); -DataInput test = new DataInputStream(new ByteBufferInputStream(Arrays.asList(buf))); +DataInput test = new DataInputStream(new ByteArrayInputStream(ByteBufferUtil.getArray(buf))); testRead(test, canon); } @@ -89,7 +87,7 @@ public class DataOutputTest ByteBuffer buf = wrap(new byte[345], false); DataOutputByteBuffer write = new DataOutputByteBuffer(buf.duplicate()); DataInput canon = testWrite(write); -DataInput test = new DataInputStream(new ByteBufferInputStream(Arrays.asList(buf))); +DataInput test = new DataInputStream(new ByteArrayInputStream(ByteBufferUtil.getArray(buf))); testRead(test, canon); }
[2/2] git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6838790f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6838790f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6838790f Branch: refs/heads/trunk Commit: 6838790f869dfcd773c83fd0165c699e6984540d Parents: 6f24097 5baa72f Author: Marcus Eriksson marc...@apache.org Authored: Mon Mar 24 12:36:20 2014 +0100 Committer: Marcus Eriksson marc...@apache.org Committed: Mon Mar 24 12:36:20 2014 +0100 -- test/unit/org/apache/cassandra/io/util/DataOutputTest.java | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) --
[jira] [Commented] (CASSANDRA-6781) ByteBuffer write() methods for serializing sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944941#comment-13944941 ] Marcus Eriksson commented on CASSANDRA-6781: committed, thanks ByteBuffer write() methods for serializing sstables --- Key: CASSANDRA-6781 URL: https://issues.apache.org/jira/browse/CASSANDRA-6781 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.1 beta2 Attachments: 6781.removeavro.txt As mentioned in CASSANDRA-6689, there may be some performance issues with writing sstables from offheap memtables. This is mostly plausibly caused by the single-byte-at-a-time write path for ByteBuffers, as we use DataOutput which only accepts byte[]. I propose extending DataOutput to include ByteBuffer methods, and to use this extended interface for serializing sstables instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6913) Compaction of system keyspaces during startup can cause early loading of non-system keyspaces
[ https://issues.apache.org/jira/browse/CASSANDRA-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6913: Attachment: 6913.txt Simple patch that uses getKeyspaceInstance() in getLoad() so that compaction can happen without opening any keyspaces, and introduces assertions that prevent non-system keyspace loading during startup. An alternative to the getLoad() change would be to disable compaction of system keyspaces during startup, but that's probably not necessary, and this should be sufficient. The assertions are to try and catch such problems earlier in future. Compaction of system keyspaces during startup can cause early loading of non-system keyspaces - Key: CASSANDRA-6913 URL: https://issues.apache.org/jira/browse/CASSANDRA-6913 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.1 beta2 Attachments: 6913.txt This then can result in an inconsistent CFS state, as cleanup of e.g. compaction leftovers does not get reflected in DataTracker. It happens because StorageService.getLoad() iterates over and opens all CFS, and this is called by Compaction. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6858) Strange Exception on cassandra node restart
[ https://issues.apache.org/jira/browse/CASSANDRA-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clément Lardeur updated CASSANDRA-6858: --- Description: Strange exception on cassandra restart. It seems that it is not bad, but any exception is not good. {code} WARN [MutationStage:169] 2014-03-14 21:14:52,723 JmxReporter.java (line 397) Error processing org.apache.cassandra.metrics:type=Connection,scope=95.163.80.90,name=CommandPendingTasks javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Connection,scope=95.163.80.90,name=CommandPendingTasks at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) at com.yammer.metrics.reporting.JmxReporter.registerBean(JmxReporter.java:462) at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:438) at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:16) at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28) at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395) at com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516) at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491) at com.yammer.metrics.core.MetricsRegistry.newGauge(MetricsRegistry.java:79) at com.yammer.metrics.Metrics.newGauge(Metrics.java:70) at org.apache.cassandra.metrics.ConnectionMetrics.init(ConnectionMetrics.java:71) at org.apache.cassandra.net.OutboundTcpConnectionPool.init(OutboundTcpConnectionPool.java:55) at org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:493) at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:507) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:639) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:613) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:59) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} was: Strange exception on cassandra restart. It seems that it is not bad, but any exception is not good. WARN [MutationStage:169] 2014-03-14 21:14:52,723 JmxReporter.java (line 397) Error processing org.apache.cassandra.metrics:type=Connection,scope=95.163.80.90,name=CommandPendingTasks javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Connection,scope=95.163.80.90,name=CommandPendingTasks at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) at com.yammer.metrics.reporting.JmxReporter.registerBean(JmxReporter.java:462) at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:438) at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:16) at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28) at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395) at com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516) at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491) at com.yammer.metrics.core.MetricsRegistry.newGauge(MetricsRegistry.java:79) at com.yammer.metrics.Metrics.newGauge(Metrics.java:70) at org.apache.cassandra.metrics.ConnectionMetrics.init(ConnectionMetrics.java:71) at org.apache.cassandra.net.OutboundTcpConnectionPool.init(OutboundTcpConnectionPool.java:55) at org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:493) at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:507) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:639)
git commit: Fix SSTable not released if stream fails before it starts
Repository: cassandra Updated Branches: refs/heads/cassandra-1.2 b7bb2fb20 - 35d4b5de8 Fix SSTable not released if stream fails before it starts patch by yukim; reviewed by Richard Low for CASSANDRA-6818 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/35d4b5de Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/35d4b5de Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/35d4b5de Branch: refs/heads/cassandra-1.2 Commit: 35d4b5de8f3ee18ec98b01f3aa0951df0e11e8d2 Parents: b7bb2fb Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:44:19 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:44:19 2014 -0500 -- CHANGES.txt | 1 + .../org/apache/cassandra/streaming/AbstractStreamSession.java | 2 -- src/java/org/apache/cassandra/streaming/StreamInSession.java| 5 + src/java/org/apache/cassandra/streaming/StreamOutSession.java | 5 + 4 files changed, 11 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 960b0e9..fa46c2e 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -23,6 +23,7 @@ * Avoid NPEs when receiving table changes for an unknown keyspace (CASSANDRA-5631) * Fix bootstrapping when there is no schema (CASSANDRA-6685) * Fix truncating compression metadata (CASSANDRA-6791) + * Fix SSTable not released if stream session fails before starts (CASSANDRA-6818) 1.2.15 http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java b/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java index 89fbf5f..f8de827 100644 --- a/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java +++ b/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java @@ -44,8 +44,6 @@ public abstract class AbstractStreamSession implements IEndpointStateChangeSubsc this.sessionId = sessionId; this.table = table; this.callback = callback; -Gossiper.instance.register(this); -FailureDetector.instance.registerFailureDetectionEventListener(this); } public UUID getSessionId() http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/StreamInSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/StreamInSession.java b/src/java/org/apache/cassandra/streaming/StreamInSession.java index e83a5b6..f9cdc31 100644 --- a/src/java/org/apache/cassandra/streaming/StreamInSession.java +++ b/src/java/org/apache/cassandra/streaming/StreamInSession.java @@ -24,6 +24,7 @@ import java.net.Socket; import java.util.*; import java.util.concurrent.ConcurrentMap; +import org.apache.cassandra.gms.Gossiper; import org.apache.cassandra.io.sstable.SSTableWriter; import org.cliffc.high_scale_lib.NonBlockingHashMap; import org.cliffc.high_scale_lib.NonBlockingHashSet; @@ -61,6 +62,8 @@ public class StreamInSession extends AbstractStreamSession public static StreamInSession create(InetAddress host, IStreamCallback callback) { StreamInSession session = new StreamInSession(host, UUIDGen.getTimeUUID(), callback); +Gossiper.instance.register(session); + FailureDetector.instance.registerFailureDetectionEventListener(session); sessions.put(session.getSessionId(), session); return session; } @@ -71,6 +74,8 @@ public class StreamInSession extends AbstractStreamSession if (session == null) { StreamInSession possibleNew = new StreamInSession(host, sessionId, null); +Gossiper.instance.register(possibleNew); + FailureDetector.instance.registerFailureDetectionEventListener(possibleNew); if ((session = sessions.putIfAbsent(sessionId, possibleNew)) == null) session = possibleNew; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/StreamOutSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/StreamOutSession.java b/src/java/org/apache/cassandra/streaming/StreamOutSession.java index edc07ca..c4d7695 100644 --- a/src/java/org/apache/cassandra/streaming/StreamOutSession.java +++ b/src/java/org/apache/cassandra/streaming/StreamOutSession.java @@ -25,6
[jira] [Commented] (CASSANDRA-6818) SSTable references not released if stream session fails before it starts
[ https://issues.apache.org/jira/browse/CASSANDRA-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944974#comment-13944974 ] Yuki Morishita commented on CASSANDRA-6818: --- Committed 1.2. version to be released in 1.2.16. SSTable references not released if stream session fails before it starts Key: CASSANDRA-6818 URL: https://issues.apache.org/jira/browse/CASSANDRA-6818 Project: Cassandra Issue Type: Bug Components: Core Reporter: Richard Low Assignee: Yuki Morishita Fix For: 1.2.16, 2.0.7, 2.1 beta2 Attachments: 6818-1.2.txt, 6818-2.0-v2.txt, 6818-2.0.txt I observed a large number of 'orphan' SSTables - SSTables that are in the data directory but not loaded by Cassandra - on a 1.1.12 node that had a large stream fail before it started. These orphan files are particularly dangerous because if the node is restarted and picks up these SSTables it could bring data back to life if tombstones have been GCed. To confirm the SSTables are orphan, I created a snapshot and it didn't contain these files. I can see in the logs that they have been compacted so should have been deleted. The log entries for the stream are: {{INFO [StreamStage:1] 2014-02-21 19:41:48,742 StreamOut.java (line 115) Beginning transfer to /10.0.0.1}} {{INFO [StreamStage:1] 2014-02-21 19:41:48,743 StreamOut.java (line 96) Flushing memtables for [CFS(Keyspace='ks', ColumnFamily='cf1'), CFS(Keyspace='ks', ColumnFamily='cf2')]...}} {{ERROR [GossipTasks:1] 2014-02-21 19:41:49,239 AbstractStreamSession.java (line 113) Stream failed because /10.0.0.1 died or was restarted/removed (streams may still be active in background, but further streams won't be started)}} {{INFO [StreamStage:1] 2014-02-21 19:41:51,783 StreamOut.java (line 161) Stream context metadata [...] 2267 sstables.}} {{INFO [StreamStage:1] 2014-02-21 19:41:51,789 StreamOutSession.java (line 182) Streaming to /10.0.0.1}} {{INFO [Streaming to /10.0.0.1:1] 2014-02-21 19:42:02,218 FileStreamTask.java (line 99) Found no stream out session at end of file stream task - this is expected if the receiver went down}} After digging in the code, here's what I think the issue is: 1. StreamOutSession.transferRanges() creates a streaming session, which is registered with the failure detector in AbstractStreamSession's constructor. 2. Memtables are flushed, potentially taking a long time. 3. The remote node fails, convict() is called and the StreamOutSession is closed. However, at this time StreamOutSession.files is empty because it's still waiting for the memtables to flush. 4. Memtables finish flusing, references are obtained to SSTables to be streamed and the PendingFiles are added to StreamOutSession.files. 5. The first stream fails but the StreamOutSession isn't found so is never closed and the references are never released. This code is more or less the same on 1.2 so I would expect it to reproduce there. I looked at 2.0 and can't even see where SSTable references are released when the stream fails. Some possible fixes for 1.1/1.2: 1. Don't register with the failure detector until after the PendingFiles are set up. I think this is the behaviour in 2.0 but I don't know if it was done like this to avoid this issue. 2. Detect the above case in (e.g.) StreamOutSession.begin() by noticing the session has been closed with care to avoid double frees. 3. Add some synchronization so closeInternal() doesn't race with setting up the session. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6914) Map element is not allowed in CAS condition with DELETE/UPDATE query
Dmitriy Ukhlov created CASSANDRA-6914: - Summary: Map element is not allowed in CAS condition with DELETE/UPDATE query Key: CASSANDRA-6914 URL: https://issues.apache.org/jira/browse/CASSANDRA-6914 Project: Cassandra Issue Type: Bug Reporter: Dmitriy Ukhlov CREATE TABLE test (id int, data maptext,text, PRIMARY KEY(id)); INSERT INTO test (id, data) VALUES (1,{'a':'1'}); DELETE FROM test WHERE id=1 IF data['a']=null; Bad Request: line 1:40 missing EOF at '=' UPDATE test SET data['b']='2' WHERE id=1 IF data['a']='1'; Bad Request: line 1:53 missing EOF at '=' These queries was successfuly executed with cassandra 2.0.5, but don't work in 2.0.6 release -- This message was sent by Atlassian JIRA (v6.2#6252)
[9/9] git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5f6e780d Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5f6e780d Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5f6e780d Branch: refs/heads/trunk Commit: 5f6e780d8bde7c88960e05e8f96192762137bc4c Parents: 6838790 874a341 Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:55:11 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:55:11 2014 -0500 -- --
[1/9] git commit: Fix SSTable not released if stream fails before it starts
Repository: cassandra Updated Branches: refs/heads/cassandra-2.0 b7231ff8a - e6c8034b1 refs/heads/cassandra-2.1 5baa72f7f - 874a34174 refs/heads/trunk 6838790f8 - 5f6e780d8 Fix SSTable not released if stream fails before it starts patch by yukim; reviewed by Richard Low for CASSANDRA-6818 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/35d4b5de Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/35d4b5de Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/35d4b5de Branch: refs/heads/cassandra-2.0 Commit: 35d4b5de8f3ee18ec98b01f3aa0951df0e11e8d2 Parents: b7bb2fb Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:44:19 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:44:19 2014 -0500 -- CHANGES.txt | 1 + .../org/apache/cassandra/streaming/AbstractStreamSession.java | 2 -- src/java/org/apache/cassandra/streaming/StreamInSession.java| 5 + src/java/org/apache/cassandra/streaming/StreamOutSession.java | 5 + 4 files changed, 11 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 960b0e9..fa46c2e 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -23,6 +23,7 @@ * Avoid NPEs when receiving table changes for an unknown keyspace (CASSANDRA-5631) * Fix bootstrapping when there is no schema (CASSANDRA-6685) * Fix truncating compression metadata (CASSANDRA-6791) + * Fix SSTable not released if stream session fails before starts (CASSANDRA-6818) 1.2.15 http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java b/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java index 89fbf5f..f8de827 100644 --- a/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java +++ b/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java @@ -44,8 +44,6 @@ public abstract class AbstractStreamSession implements IEndpointStateChangeSubsc this.sessionId = sessionId; this.table = table; this.callback = callback; -Gossiper.instance.register(this); -FailureDetector.instance.registerFailureDetectionEventListener(this); } public UUID getSessionId() http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/StreamInSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/StreamInSession.java b/src/java/org/apache/cassandra/streaming/StreamInSession.java index e83a5b6..f9cdc31 100644 --- a/src/java/org/apache/cassandra/streaming/StreamInSession.java +++ b/src/java/org/apache/cassandra/streaming/StreamInSession.java @@ -24,6 +24,7 @@ import java.net.Socket; import java.util.*; import java.util.concurrent.ConcurrentMap; +import org.apache.cassandra.gms.Gossiper; import org.apache.cassandra.io.sstable.SSTableWriter; import org.cliffc.high_scale_lib.NonBlockingHashMap; import org.cliffc.high_scale_lib.NonBlockingHashSet; @@ -61,6 +62,8 @@ public class StreamInSession extends AbstractStreamSession public static StreamInSession create(InetAddress host, IStreamCallback callback) { StreamInSession session = new StreamInSession(host, UUIDGen.getTimeUUID(), callback); +Gossiper.instance.register(session); + FailureDetector.instance.registerFailureDetectionEventListener(session); sessions.put(session.getSessionId(), session); return session; } @@ -71,6 +74,8 @@ public class StreamInSession extends AbstractStreamSession if (session == null) { StreamInSession possibleNew = new StreamInSession(host, sessionId, null); +Gossiper.instance.register(possibleNew); + FailureDetector.instance.registerFailureDetectionEventListener(possibleNew); if ((session = sessions.putIfAbsent(sessionId, possibleNew)) == null) session = possibleNew; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/StreamOutSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/StreamOutSession.java b/src/java/org/apache/cassandra/streaming/StreamOutSession.java index edc07ca..c4d7695 100644 ---
[5/9] git commit: Merge branch 'cassandra-1.2' into cassandra-2.0
Merge branch 'cassandra-1.2' into cassandra-2.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e6c8034b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e6c8034b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e6c8034b Branch: refs/heads/cassandra-2.1 Commit: e6c8034b186e4091927b7b234dae086cd47009be Parents: b7231ff 35d4b5d Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:54:27 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:54:27 2014 -0500 -- --
[2/9] git commit: Fix SSTable not released if stream fails before it starts
Fix SSTable not released if stream fails before it starts patch by yukim; reviewed by Richard Low for CASSANDRA-6818 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/35d4b5de Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/35d4b5de Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/35d4b5de Branch: refs/heads/cassandra-2.1 Commit: 35d4b5de8f3ee18ec98b01f3aa0951df0e11e8d2 Parents: b7bb2fb Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:44:19 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:44:19 2014 -0500 -- CHANGES.txt | 1 + .../org/apache/cassandra/streaming/AbstractStreamSession.java | 2 -- src/java/org/apache/cassandra/streaming/StreamInSession.java| 5 + src/java/org/apache/cassandra/streaming/StreamOutSession.java | 5 + 4 files changed, 11 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 960b0e9..fa46c2e 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -23,6 +23,7 @@ * Avoid NPEs when receiving table changes for an unknown keyspace (CASSANDRA-5631) * Fix bootstrapping when there is no schema (CASSANDRA-6685) * Fix truncating compression metadata (CASSANDRA-6791) + * Fix SSTable not released if stream session fails before starts (CASSANDRA-6818) 1.2.15 http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java b/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java index 89fbf5f..f8de827 100644 --- a/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java +++ b/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java @@ -44,8 +44,6 @@ public abstract class AbstractStreamSession implements IEndpointStateChangeSubsc this.sessionId = sessionId; this.table = table; this.callback = callback; -Gossiper.instance.register(this); -FailureDetector.instance.registerFailureDetectionEventListener(this); } public UUID getSessionId() http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/StreamInSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/StreamInSession.java b/src/java/org/apache/cassandra/streaming/StreamInSession.java index e83a5b6..f9cdc31 100644 --- a/src/java/org/apache/cassandra/streaming/StreamInSession.java +++ b/src/java/org/apache/cassandra/streaming/StreamInSession.java @@ -24,6 +24,7 @@ import java.net.Socket; import java.util.*; import java.util.concurrent.ConcurrentMap; +import org.apache.cassandra.gms.Gossiper; import org.apache.cassandra.io.sstable.SSTableWriter; import org.cliffc.high_scale_lib.NonBlockingHashMap; import org.cliffc.high_scale_lib.NonBlockingHashSet; @@ -61,6 +62,8 @@ public class StreamInSession extends AbstractStreamSession public static StreamInSession create(InetAddress host, IStreamCallback callback) { StreamInSession session = new StreamInSession(host, UUIDGen.getTimeUUID(), callback); +Gossiper.instance.register(session); + FailureDetector.instance.registerFailureDetectionEventListener(session); sessions.put(session.getSessionId(), session); return session; } @@ -71,6 +74,8 @@ public class StreamInSession extends AbstractStreamSession if (session == null) { StreamInSession possibleNew = new StreamInSession(host, sessionId, null); +Gossiper.instance.register(possibleNew); + FailureDetector.instance.registerFailureDetectionEventListener(possibleNew); if ((session = sessions.putIfAbsent(sessionId, possibleNew)) == null) session = possibleNew; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/StreamOutSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/StreamOutSession.java b/src/java/org/apache/cassandra/streaming/StreamOutSession.java index edc07ca..c4d7695 100644 --- a/src/java/org/apache/cassandra/streaming/StreamOutSession.java +++ b/src/java/org/apache/cassandra/streaming/StreamOutSession.java @@ -25,6 +25,8 @@ import org.apache.commons.lang.StringUtils; import org.slf4j.Logger; import
[7/9] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/874a3417 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/874a3417 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/874a3417 Branch: refs/heads/cassandra-2.1 Commit: 874a34174a521c8b02ebe89ee91511beb994bd3b Parents: 5baa72f e6c8034 Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:54:50 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:54:50 2014 -0500 -- --
[8/9] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/874a3417 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/874a3417 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/874a3417 Branch: refs/heads/trunk Commit: 874a34174a521c8b02ebe89ee91511beb994bd3b Parents: 5baa72f e6c8034 Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:54:50 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:54:50 2014 -0500 -- --
[3/9] git commit: Fix SSTable not released if stream fails before it starts
Fix SSTable not released if stream fails before it starts patch by yukim; reviewed by Richard Low for CASSANDRA-6818 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/35d4b5de Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/35d4b5de Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/35d4b5de Branch: refs/heads/trunk Commit: 35d4b5de8f3ee18ec98b01f3aa0951df0e11e8d2 Parents: b7bb2fb Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:44:19 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:44:19 2014 -0500 -- CHANGES.txt | 1 + .../org/apache/cassandra/streaming/AbstractStreamSession.java | 2 -- src/java/org/apache/cassandra/streaming/StreamInSession.java| 5 + src/java/org/apache/cassandra/streaming/StreamOutSession.java | 5 + 4 files changed, 11 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 960b0e9..fa46c2e 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -23,6 +23,7 @@ * Avoid NPEs when receiving table changes for an unknown keyspace (CASSANDRA-5631) * Fix bootstrapping when there is no schema (CASSANDRA-6685) * Fix truncating compression metadata (CASSANDRA-6791) + * Fix SSTable not released if stream session fails before starts (CASSANDRA-6818) 1.2.15 http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java b/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java index 89fbf5f..f8de827 100644 --- a/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java +++ b/src/java/org/apache/cassandra/streaming/AbstractStreamSession.java @@ -44,8 +44,6 @@ public abstract class AbstractStreamSession implements IEndpointStateChangeSubsc this.sessionId = sessionId; this.table = table; this.callback = callback; -Gossiper.instance.register(this); -FailureDetector.instance.registerFailureDetectionEventListener(this); } public UUID getSessionId() http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/StreamInSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/StreamInSession.java b/src/java/org/apache/cassandra/streaming/StreamInSession.java index e83a5b6..f9cdc31 100644 --- a/src/java/org/apache/cassandra/streaming/StreamInSession.java +++ b/src/java/org/apache/cassandra/streaming/StreamInSession.java @@ -24,6 +24,7 @@ import java.net.Socket; import java.util.*; import java.util.concurrent.ConcurrentMap; +import org.apache.cassandra.gms.Gossiper; import org.apache.cassandra.io.sstable.SSTableWriter; import org.cliffc.high_scale_lib.NonBlockingHashMap; import org.cliffc.high_scale_lib.NonBlockingHashSet; @@ -61,6 +62,8 @@ public class StreamInSession extends AbstractStreamSession public static StreamInSession create(InetAddress host, IStreamCallback callback) { StreamInSession session = new StreamInSession(host, UUIDGen.getTimeUUID(), callback); +Gossiper.instance.register(session); + FailureDetector.instance.registerFailureDetectionEventListener(session); sessions.put(session.getSessionId(), session); return session; } @@ -71,6 +74,8 @@ public class StreamInSession extends AbstractStreamSession if (session == null) { StreamInSession possibleNew = new StreamInSession(host, sessionId, null); +Gossiper.instance.register(possibleNew); + FailureDetector.instance.registerFailureDetectionEventListener(possibleNew); if ((session = sessions.putIfAbsent(sessionId, possibleNew)) == null) session = possibleNew; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/35d4b5de/src/java/org/apache/cassandra/streaming/StreamOutSession.java -- diff --git a/src/java/org/apache/cassandra/streaming/StreamOutSession.java b/src/java/org/apache/cassandra/streaming/StreamOutSession.java index edc07ca..c4d7695 100644 --- a/src/java/org/apache/cassandra/streaming/StreamOutSession.java +++ b/src/java/org/apache/cassandra/streaming/StreamOutSession.java @@ -25,6 +25,8 @@ import org.apache.commons.lang.StringUtils; import org.slf4j.Logger; import
[4/9] git commit: Merge branch 'cassandra-1.2' into cassandra-2.0
Merge branch 'cassandra-1.2' into cassandra-2.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e6c8034b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e6c8034b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e6c8034b Branch: refs/heads/cassandra-2.0 Commit: e6c8034b186e4091927b7b234dae086cd47009be Parents: b7231ff 35d4b5d Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:54:27 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:54:27 2014 -0500 -- --
[6/9] git commit: Merge branch 'cassandra-1.2' into cassandra-2.0
Merge branch 'cassandra-1.2' into cassandra-2.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e6c8034b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e6c8034b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e6c8034b Branch: refs/heads/trunk Commit: e6c8034b186e4091927b7b234dae086cd47009be Parents: b7231ff 35d4b5d Author: Yuki Morishita yu...@apache.org Authored: Mon Mar 24 07:54:27 2014 -0500 Committer: Yuki Morishita yu...@apache.org Committed: Mon Mar 24 07:54:27 2014 -0500 -- --
[jira] [Created] (CASSANDRA-6915) Show storage rows in cqlsh
Robbie Strickland created CASSANDRA-6915: Summary: Show storage rows in cqlsh Key: CASSANDRA-6915 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Robbie Strickland In Cassandra it's super important to understand how your CQL schema translates to the underlying storage rows. Right now the only way to see this is to create the schema in cqlsh, write some data, then query it using the CLI. Obviously we don't want to be encouraging people to use the CLI when it's supposed to be deprecated. So I'd like to see a function in cqlsh to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6506) counters++ split counter context shards into separate cells
[ https://issues.apache.org/jira/browse/CASSANDRA-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944997#comment-13944997 ] Aleksey Yeschenko commented on CASSANDRA-6506: -- Yeah, those. We don't need to pass it, because we don't really want to use non-HeapAllocator for merging counter contexts - these objects are extremely short-lived. The important bit is the localCopy() allocator, and there we do use the configured memtable allocator. counters++ split counter context shards into separate cells --- Key: CASSANDRA-6506 URL: https://issues.apache.org/jira/browse/CASSANDRA-6506 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 2.1 beta2 This change is related to, but somewhat orthogonal to CASSANDRA-6504. Currently all the shard tuples for a given counter cell are packed, in sorted order, in one binary blob. Thus reconciling N counter cells requires allocating a new byte buffer capable of holding the union of the two context's shards N-1 times. For writes, in post CASSANDRA-6504 world, it also means reading more data than we have to (the complete context, when all we need is the local node's global shard). Splitting the context into separate cells, one cell per shard, will help to improve this. We did a similar thing with super columns for CASSANDRA-3237. Incidentally, doing this split is now possible thanks to CASSANDRA-3237. Doing this would also simplify counter reconciliation logic. Getting rid of old contexts altogether can be done trivially with upgradesstables. In fact, we should be able to put the logical clock into the cell's timestamp, and use regular Cell-s and regular Cell reconcile() logic for the shards, especially once we get rid of the local/remote shards some time in the future (until then we still have to differentiate between global/remote/local shards and their priority rules). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6506) counters++ split counter context shards into separate cells
[ https://issues.apache.org/jira/browse/CASSANDRA-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945099#comment-13945099 ] Aleksey Yeschenko commented on CASSANDRA-6506: -- That's one of the differences between 2.0 and 2.1 - in 2.0 we localCopy() first, then reconcile(), and thus have to use the same allocator all the way, but in 2.1 we reconcile() first, and localCopy() the result. counters++ split counter context shards into separate cells --- Key: CASSANDRA-6506 URL: https://issues.apache.org/jira/browse/CASSANDRA-6506 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 2.1 beta2 This change is related to, but somewhat orthogonal to CASSANDRA-6504. Currently all the shard tuples for a given counter cell are packed, in sorted order, in one binary blob. Thus reconciling N counter cells requires allocating a new byte buffer capable of holding the union of the two context's shards N-1 times. For writes, in post CASSANDRA-6504 world, it also means reading more data than we have to (the complete context, when all we need is the local node's global shard). Splitting the context into separate cells, one cell per shard, will help to improve this. We did a similar thing with super columns for CASSANDRA-3237. Incidentally, doing this split is now possible thanks to CASSANDRA-3237. Doing this would also simplify counter reconciliation logic. Getting rid of old contexts altogether can be done trivially with upgradesstables. In fact, we should be able to put the logical clock into the cell's timestamp, and use regular Cell-s and regular Cell reconcile() logic for the shards, especially once we get rid of the local/remote shards some time in the future (until then we still have to differentiate between global/remote/local shards and their priority rules). -- This message was sent by Atlassian JIRA (v6.2#6252)
[1/2] git commit: Clean up CFMetaData
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 874a34174 - 8a2a0c3d4 Clean up CFMetaData patch by Aleksey Yeschenko; reviewed by Sylvain Lebresne for CASSANDRA-6506 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/69bfca06 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/69bfca06 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/69bfca06 Branch: refs/heads/cassandra-2.1 Commit: 69bfca06f2b048c43b0dc4c3423227946b7f6523 Parents: 874a341 Author: Aleksey Yeschenko alek...@apache.org Authored: Mon Mar 24 16:52:38 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Mon Mar 24 16:52:38 2014 +0300 -- src/java/org/apache/cassandra/auth/Auth.java| 2 +- .../org/apache/cassandra/config/CFMetaData.java | 94 +++- .../cassandra/cql/AlterTableStatement.java | 2 +- .../cassandra/cql/DropIndexStatement.java | 2 +- .../apache/cassandra/cql/QueryProcessor.java| 2 +- .../cql3/statements/AlterTableStatement.java| 2 +- .../cql3/statements/AlterTypeStatement.java | 2 +- .../cql3/statements/CreateIndexStatement.java | 2 +- .../cql3/statements/CreateTriggerStatement.java | 2 +- .../cql3/statements/DropIndexStatement.java | 2 +- .../cql3/statements/DropTriggerStatement.java | 2 +- .../apache/cassandra/config/CFMetaDataTest.java | 2 +- .../org/apache/cassandra/config/DefsTest.java | 8 +- .../cassandra/thrift/ThriftValidationTest.java | 2 +- .../cassandra/triggers/TriggersSchemaTest.java | 6 +- 15 files changed, 49 insertions(+), 83 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/69bfca06/src/java/org/apache/cassandra/auth/Auth.java -- diff --git a/src/java/org/apache/cassandra/auth/Auth.java b/src/java/org/apache/cassandra/auth/Auth.java index 90b1215..237fc99 100644 --- a/src/java/org/apache/cassandra/auth/Auth.java +++ b/src/java/org/apache/cassandra/auth/Auth.java @@ -205,7 +205,7 @@ public class Auth CFStatement parsed = (CFStatement)QueryProcessor.parseStatement(cql); parsed.prepareKeyspace(AUTH_KS); CreateTableStatement statement = (CreateTableStatement) parsed.prepare().statement; -CFMetaData cfm = statement.getCFMetaData().clone(CFMetaData.generateLegacyCfId(AUTH_KS, name)); +CFMetaData cfm = statement.getCFMetaData().copy(CFMetaData.generateLegacyCfId(AUTH_KS, name)); assert cfm.cfName.equals(name); MigrationManager.announceNewColumnFamily(cfm); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/69bfca06/src/java/org/apache/cassandra/config/CFMetaData.java -- diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java b/src/java/org/apache/cassandra/config/CFMetaData.java index f38dd5e..9c8ceaf 100644 --- a/src/java/org/apache/cassandra/config/CFMetaData.java +++ b/src/java/org/apache/cassandra/config/CFMetaData.java @@ -20,26 +20,23 @@ package org.apache.cassandra.config; import java.io.DataInput; import java.lang.reflect.Constructor; import java.lang.reflect.InvocationTargetException; -import java.lang.reflect.Method; import java.nio.ByteBuffer; import java.util.*; import com.google.common.annotations.VisibleForTesting; import com.google.common.base.Objects; +import com.google.common.base.Strings; import com.google.common.collect.AbstractIterator; import com.google.common.collect.Iterables; import com.google.common.collect.MapDifference; import com.google.common.collect.Maps; - -import org.apache.cassandra.cache.CachingOptions; -import org.apache.cassandra.db.composites.*; - import org.apache.commons.lang3.ArrayUtils; import org.apache.commons.lang3.builder.HashCodeBuilder; import org.apache.commons.lang3.builder.ToStringBuilder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import org.apache.cassandra.cache.CachingOptions; import org.apache.cassandra.cql3.*; import org.apache.cassandra.cql3.statements.CFStatement; import org.apache.cassandra.cql3.statements.CreateTableStatement; @@ -47,6 +44,7 @@ import org.apache.cassandra.db.*; import org.apache.cassandra.db.compaction.AbstractCompactionStrategy; import org.apache.cassandra.db.compaction.LeveledCompactionStrategy; import org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy; +import org.apache.cassandra.db.composites.*; import org.apache.cassandra.db.index.SecondaryIndex; import org.apache.cassandra.db.marshal.*; import org.apache.cassandra.exceptions.ConfigurationException; @@ -66,14 +64,11 @@ import org.apache.cassandra.utils.UUIDGen; import static
[1/3] git commit: Clean up CFMetaData
Repository: cassandra Updated Branches: refs/heads/trunk 5f6e780d8 - e5314641a Clean up CFMetaData patch by Aleksey Yeschenko; reviewed by Sylvain Lebresne for CASSANDRA-6506 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/69bfca06 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/69bfca06 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/69bfca06 Branch: refs/heads/trunk Commit: 69bfca06f2b048c43b0dc4c3423227946b7f6523 Parents: 874a341 Author: Aleksey Yeschenko alek...@apache.org Authored: Mon Mar 24 16:52:38 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Mon Mar 24 16:52:38 2014 +0300 -- src/java/org/apache/cassandra/auth/Auth.java| 2 +- .../org/apache/cassandra/config/CFMetaData.java | 94 +++- .../cassandra/cql/AlterTableStatement.java | 2 +- .../cassandra/cql/DropIndexStatement.java | 2 +- .../apache/cassandra/cql/QueryProcessor.java| 2 +- .../cql3/statements/AlterTableStatement.java| 2 +- .../cql3/statements/AlterTypeStatement.java | 2 +- .../cql3/statements/CreateIndexStatement.java | 2 +- .../cql3/statements/CreateTriggerStatement.java | 2 +- .../cql3/statements/DropIndexStatement.java | 2 +- .../cql3/statements/DropTriggerStatement.java | 2 +- .../apache/cassandra/config/CFMetaDataTest.java | 2 +- .../org/apache/cassandra/config/DefsTest.java | 8 +- .../cassandra/thrift/ThriftValidationTest.java | 2 +- .../cassandra/triggers/TriggersSchemaTest.java | 6 +- 15 files changed, 49 insertions(+), 83 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/69bfca06/src/java/org/apache/cassandra/auth/Auth.java -- diff --git a/src/java/org/apache/cassandra/auth/Auth.java b/src/java/org/apache/cassandra/auth/Auth.java index 90b1215..237fc99 100644 --- a/src/java/org/apache/cassandra/auth/Auth.java +++ b/src/java/org/apache/cassandra/auth/Auth.java @@ -205,7 +205,7 @@ public class Auth CFStatement parsed = (CFStatement)QueryProcessor.parseStatement(cql); parsed.prepareKeyspace(AUTH_KS); CreateTableStatement statement = (CreateTableStatement) parsed.prepare().statement; -CFMetaData cfm = statement.getCFMetaData().clone(CFMetaData.generateLegacyCfId(AUTH_KS, name)); +CFMetaData cfm = statement.getCFMetaData().copy(CFMetaData.generateLegacyCfId(AUTH_KS, name)); assert cfm.cfName.equals(name); MigrationManager.announceNewColumnFamily(cfm); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/69bfca06/src/java/org/apache/cassandra/config/CFMetaData.java -- diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java b/src/java/org/apache/cassandra/config/CFMetaData.java index f38dd5e..9c8ceaf 100644 --- a/src/java/org/apache/cassandra/config/CFMetaData.java +++ b/src/java/org/apache/cassandra/config/CFMetaData.java @@ -20,26 +20,23 @@ package org.apache.cassandra.config; import java.io.DataInput; import java.lang.reflect.Constructor; import java.lang.reflect.InvocationTargetException; -import java.lang.reflect.Method; import java.nio.ByteBuffer; import java.util.*; import com.google.common.annotations.VisibleForTesting; import com.google.common.base.Objects; +import com.google.common.base.Strings; import com.google.common.collect.AbstractIterator; import com.google.common.collect.Iterables; import com.google.common.collect.MapDifference; import com.google.common.collect.Maps; - -import org.apache.cassandra.cache.CachingOptions; -import org.apache.cassandra.db.composites.*; - import org.apache.commons.lang3.ArrayUtils; import org.apache.commons.lang3.builder.HashCodeBuilder; import org.apache.commons.lang3.builder.ToStringBuilder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import org.apache.cassandra.cache.CachingOptions; import org.apache.cassandra.cql3.*; import org.apache.cassandra.cql3.statements.CFStatement; import org.apache.cassandra.cql3.statements.CreateTableStatement; @@ -47,6 +44,7 @@ import org.apache.cassandra.db.*; import org.apache.cassandra.db.compaction.AbstractCompactionStrategy; import org.apache.cassandra.db.compaction.LeveledCompactionStrategy; import org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy; +import org.apache.cassandra.db.composites.*; import org.apache.cassandra.db.index.SecondaryIndex; import org.apache.cassandra.db.marshal.*; import org.apache.cassandra.exceptions.ConfigurationException; @@ -66,14 +64,11 @@ import org.apache.cassandra.utils.UUIDGen; import static
[3/3] git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Conflicts: src/java/org/apache/cassandra/cql/AlterTableStatement.java src/java/org/apache/cassandra/cql/DropIndexStatement.java src/java/org/apache/cassandra/cql/QueryProcessor.java Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e5314641 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e5314641 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e5314641 Branch: refs/heads/trunk Commit: e5314641a41b9464d04ed156e46465b3c213abe8 Parents: 5f6e780 8a2a0c3 Author: Aleksey Yeschenko alek...@apache.org Authored: Mon Mar 24 17:00:00 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Mon Mar 24 17:00:00 2014 +0300 -- src/java/org/apache/cassandra/auth/Auth.java| 2 +- .../org/apache/cassandra/config/CFMetaData.java | 94 - .../cql3/statements/AlterTableStatement.java| 2 +- .../cql3/statements/AlterTypeStatement.java | 2 +- .../cql3/statements/CreateIndexStatement.java | 2 +- .../cql3/statements/CreateTriggerStatement.java | 2 +- .../cql3/statements/DropIndexStatement.java | 2 +- .../cql3/statements/DropTriggerStatement.java | 2 +- .../apache/cassandra/db/AtomicBTreeColumns.java | 14 +- src/java/org/apache/cassandra/db/Cell.java | 23 +--- .../org/apache/cassandra/db/ColumnFamily.java | 6 +- .../org/apache/cassandra/db/CounterCell.java| 9 +- .../apache/cassandra/db/CounterMutation.java| 4 +- .../apache/cassandra/db/CounterUpdateCell.java | 4 +- .../org/apache/cassandra/db/DeletedCell.java| 8 +- .../org/apache/cassandra/db/DeletionTime.java | 2 +- .../org/apache/cassandra/db/ExpiringCell.java | 2 +- .../cassandra/db/HintedHandOffManager.java | 8 +- src/java/org/apache/cassandra/db/Memtable.java | 4 +- .../org/apache/cassandra/db/OnDiskAtom.java | 3 +- .../org/apache/cassandra/db/RangeTombstone.java | 9 +- .../db/compaction/LazilyCompactedRow.java | 6 +- .../cassandra/db/context/CounterContext.java| 22 ++- .../apache/cassandra/db/filter/QueryFilter.java | 3 +- .../io/sstable/AbstractSSTableSimpleWriter.java | 3 +- .../cassandra/io/sstable/SSTableWriter.java | 4 +- .../utils/memory/ContextAllocator.java | 11 +- .../cassandra/utils/memory/PoolAllocator.java | 11 +- .../apache/cassandra/config/CFMetaDataTest.java | 2 +- .../org/apache/cassandra/config/DefsTest.java | 8 +- .../apache/cassandra/db/CounterCellTest.java| 39 +++--- .../db/context/CounterContextTest.java | 138 +++ .../streaming/StreamingTransferTest.java| 3 +- .../cassandra/thrift/ThriftValidationTest.java | 2 +- .../cassandra/triggers/TriggersSchemaTest.java | 6 +- 35 files changed, 170 insertions(+), 292 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e5314641/src/java/org/apache/cassandra/config/CFMetaData.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e5314641/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e5314641/test/unit/org/apache/cassandra/config/DefsTest.java -- diff --cc test/unit/org/apache/cassandra/config/DefsTest.java index 6c06648,2e1876f..fd24822 --- a/test/unit/org/apache/cassandra/config/DefsTest.java +++ b/test/unit/org/apache/cassandra/config/DefsTest.java @@@ -69,9 -68,9 +69,9 @@@ public class DefsTest extends SchemaLoa .maxCompactionThreshold(500); // we'll be adding this one later. make sure it's not already there. -assert cfm.getColumnDefinition(ByteBuffer.wrap(new byte[] { 5 })) == null; +Assert.assertNull(cfm.getColumnDefinition(ByteBuffer.wrap(new byte[] { 5 }))); - CFMetaData cfNew = cfm.clone(); + CFMetaData cfNew = cfm.copy(); // add one. ColumnDefinition addIndexDef = ColumnDefinition.regularDef(cfm, ByteBuffer.wrap(new byte[] { 5 }), BytesType.instance, null) @@@ -407,12 -406,12 +407,12 @@@ KSMetaData ksm = KSMetaData.testMetadata(cf.ksName, SimpleStrategy.class, KSMetaData.optsWithRF(1), cf); MigrationManager.announceNewKeyspace(ksm); -assert Schema.instance.getKSMetaData(cf.ksName) != null; -assert Schema.instance.getKSMetaData(cf.ksName).equals(ksm); -assert Schema.instance.getCFMetaData(cf.ksName, cf.cfName) != null; +Assert.assertNotNull(Schema.instance.getKSMetaData(cf.ksName)); +
[jira] [Commented] (CASSANDRA-6911) Netty dependency update broke stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945116#comment-13945116 ] Ryan McGuire commented on CASSANDRA-6911: - It won't start up with the old jar: {code} ERROR [main] 2014-03-24 09:58:21,156 CassandraDaemon.java:471 - Exception encountered during startup java.lang.NoClassDefFoundError: io/netty/util/internal/logging/InternalLoggerFactory at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:374) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:454) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:543) [main/:na] Caused by: java.lang.ClassNotFoundException: io.netty.util.internal.logging.InternalLoggerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_51] at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_51] at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_51] at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_51] at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_51] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_51] at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_51] ... 3 common frames omitted INFO [StorageServiceShutdownHook] 2014-03-24 09:59:27,298 Gossiper.java:1269 - Announcing shutdown {code} Netty dependency update broke stress Key: CASSANDRA-6911 URL: https://issues.apache.org/jira/browse/CASSANDRA-6911 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Ryan McGuire Assignee: Benedict I compiled stress fresh from cassandra-2.1 and running this command: {code} cassandra-stress write n=1900 -rate threads=50 -node bdplab {code} I get the following traceback: {code} Exception in thread Thread-49 java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelFactory at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:941) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:889) at com.datastax.driver.core.Cluster.init(Cluster.java:88) at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:144) at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:854) at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:74) at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:155) at org.apache.cassandra.stress.settings.StressSettings.getSmartThriftClient(StressSettings.java:70) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:275) Caused by: java.lang.ClassNotFoundException: org.jboss.netty.channel.ChannelFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more {code} It seems this was introduced with an updated netty jar in cbf304ebd0436a321753e81231545b705aa8dd23 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6506) counters++ split counter context shards into separate cells
[ https://issues.apache.org/jira/browse/CASSANDRA-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-6506: - Fix Version/s: (was: 2.1 beta2) 3.0 Anyway, committed the first two, changed fixver to 3.0. counters++ split counter context shards into separate cells --- Key: CASSANDRA-6506 URL: https://issues.apache.org/jira/browse/CASSANDRA-6506 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 3.0 This change is related to, but somewhat orthogonal to CASSANDRA-6504. Currently all the shard tuples for a given counter cell are packed, in sorted order, in one binary blob. Thus reconciling N counter cells requires allocating a new byte buffer capable of holding the union of the two context's shards N-1 times. For writes, in post CASSANDRA-6504 world, it also means reading more data than we have to (the complete context, when all we need is the local node's global shard). Splitting the context into separate cells, one cell per shard, will help to improve this. We did a similar thing with super columns for CASSANDRA-3237. Incidentally, doing this split is now possible thanks to CASSANDRA-3237. Doing this would also simplify counter reconciliation logic. Getting rid of old contexts altogether can be done trivially with upgradesstables. In fact, we should be able to put the logical clock into the cell's timestamp, and use regular Cell-s and regular Cell reconcile() logic for the shards, especially once we get rid of the local/remote shards some time in the future (until then we still have to differentiate between global/remote/local shards and their priority rules). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6915) Show storage rows in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945124#comment-13945124 ] Jonathan Ellis commented on CASSANDRA-6915: --- Maybe reasonable: group partitions with whitespace in between, for compound primary keys. OTOH I don't see that this adds a whole ton of value, particularly since most queries are single-partition. Not reasonable: a function in cqlsh to emulate the cli In the end I think it's a pipe dream to save people from reading the docs. I also reject the contention that it's super important for most users to understand all the storage-level details of (for instance) WITH COMPACT STORAGE. So unless you have a better idea for information to show in cqlsh that is both relevant and unintrusive I think we should Notaproblem this. Show storage rows in cqlsh -- Key: CASSANDRA-6915 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Robbie Strickland Labels: cqlsh In Cassandra it's super important to understand how your CQL schema translates to the underlying storage rows. Right now the only way to see this is to create the schema in cqlsh, write some data, then query it using the CLI. Obviously we don't want to be encouraging people to use the CLI when it's supposed to be deprecated. So I'd like to see a function in cqlsh to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6914) Map element is not allowed in CAS condition with DELETE/UPDATE query
[ https://issues.apache.org/jira/browse/CASSANDRA-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6914: -- Fix Version/s: 2.0.7 Assignee: Sylvain Lebresne Map element is not allowed in CAS condition with DELETE/UPDATE query Key: CASSANDRA-6914 URL: https://issues.apache.org/jira/browse/CASSANDRA-6914 Project: Cassandra Issue Type: Bug Reporter: Dmitriy Ukhlov Assignee: Sylvain Lebresne Fix For: 2.0.7 CREATE TABLE test (id int, data maptext,text, PRIMARY KEY(id)); INSERT INTO test (id, data) VALUES (1,{'a':'1'}); DELETE FROM test WHERE id=1 IF data['a']=null; Bad Request: line 1:40 missing EOF at '=' UPDATE test SET data['b']='2' WHERE id=1 IF data['a']='1'; Bad Request: line 1:53 missing EOF at '=' These queries was successfuly executed with cassandra 2.0.5, but don't work in 2.0.6 release -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6911) Netty dependency update broke stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945128#comment-13945128 ] Benedict commented on CASSANDRA-6911: - Did you remove the new jar from the regular lib dir? That looks like C* is not starting because netty-4 is missing, which is nothing to do with the stress lib dir. Netty dependency update broke stress Key: CASSANDRA-6911 URL: https://issues.apache.org/jira/browse/CASSANDRA-6911 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Ryan McGuire Assignee: Benedict I compiled stress fresh from cassandra-2.1 and running this command: {code} cassandra-stress write n=1900 -rate threads=50 -node bdplab {code} I get the following traceback: {code} Exception in thread Thread-49 java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelFactory at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:941) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:889) at com.datastax.driver.core.Cluster.init(Cluster.java:88) at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:144) at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:854) at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:74) at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:155) at org.apache.cassandra.stress.settings.StressSettings.getSmartThriftClient(StressSettings.java:70) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:275) Caused by: java.lang.ClassNotFoundException: org.jboss.netty.channel.ChannelFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more {code} It seems this was introduced with an updated netty jar in cbf304ebd0436a321753e81231545b705aa8dd23 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6911) Netty dependency update broke stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945130#comment-13945130 ] Ryan McGuire commented on CASSANDRA-6911: - nevermind, I missed what you meant sylvain, it does work if I put the old netty jar in the *stress* lib dir. Netty dependency update broke stress Key: CASSANDRA-6911 URL: https://issues.apache.org/jira/browse/CASSANDRA-6911 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Ryan McGuire Assignee: Benedict I compiled stress fresh from cassandra-2.1 and running this command: {code} cassandra-stress write n=1900 -rate threads=50 -node bdplab {code} I get the following traceback: {code} Exception in thread Thread-49 java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelFactory at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:941) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:889) at com.datastax.driver.core.Cluster.init(Cluster.java:88) at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:144) at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:854) at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:74) at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:155) at org.apache.cassandra.stress.settings.StressSettings.getSmartThriftClient(StressSettings.java:70) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:275) Caused by: java.lang.ClassNotFoundException: org.jboss.netty.channel.ChannelFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more {code} It seems this was introduced with an updated netty jar in cbf304ebd0436a321753e81231545b705aa8dd23 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6911) Netty dependency update broke stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945132#comment-13945132 ] Ryan McGuire commented on CASSANDRA-6911: - No, I don't have to remove the new one from the main lib dir. Netty dependency update broke stress Key: CASSANDRA-6911 URL: https://issues.apache.org/jira/browse/CASSANDRA-6911 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Ryan McGuire Assignee: Benedict I compiled stress fresh from cassandra-2.1 and running this command: {code} cassandra-stress write n=1900 -rate threads=50 -node bdplab {code} I get the following traceback: {code} Exception in thread Thread-49 java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelFactory at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:941) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:889) at com.datastax.driver.core.Cluster.init(Cluster.java:88) at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:144) at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:854) at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:74) at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:155) at org.apache.cassandra.stress.settings.StressSettings.getSmartThriftClient(StressSettings.java:70) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:275) Caused by: java.lang.ClassNotFoundException: org.jboss.netty.channel.ChannelFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more {code} It seems this was introduced with an updated netty jar in cbf304ebd0436a321753e81231545b705aa8dd23 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6911) Netty dependency update broke stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945133#comment-13945133 ] Benedict commented on CASSANDRA-6911: - bq. No, I don't have to remove the new one from the main lib dir. I know, I meant maybe you had because modifying the stress lib dir wouldn't cause this. But nevermind looks like it's fixed now :-) Netty dependency update broke stress Key: CASSANDRA-6911 URL: https://issues.apache.org/jira/browse/CASSANDRA-6911 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Ryan McGuire Assignee: Benedict I compiled stress fresh from cassandra-2.1 and running this command: {code} cassandra-stress write n=1900 -rate threads=50 -node bdplab {code} I get the following traceback: {code} Exception in thread Thread-49 java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelFactory at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:941) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:889) at com.datastax.driver.core.Cluster.init(Cluster.java:88) at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:144) at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:854) at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:74) at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:155) at org.apache.cassandra.stress.settings.StressSettings.getSmartThriftClient(StressSettings.java:70) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:275) Caused by: java.lang.ClassNotFoundException: org.jboss.netty.channel.ChannelFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more {code} It seems this was introduced with an updated netty jar in cbf304ebd0436a321753e81231545b705aa8dd23 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6911) Netty dependency update broke stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945136#comment-13945136 ] Ryan McGuire commented on CASSANDRA-6911: - yep, fixed (on my machine, anyway) :) Netty dependency update broke stress Key: CASSANDRA-6911 URL: https://issues.apache.org/jira/browse/CASSANDRA-6911 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Ryan McGuire Assignee: Benedict I compiled stress fresh from cassandra-2.1 and running this command: {code} cassandra-stress write n=1900 -rate threads=50 -node bdplab {code} I get the following traceback: {code} Exception in thread Thread-49 java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelFactory at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:941) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:889) at com.datastax.driver.core.Cluster.init(Cluster.java:88) at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:144) at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:854) at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:74) at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:155) at org.apache.cassandra.stress.settings.StressSettings.getSmartThriftClient(StressSettings.java:70) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:275) Caused by: java.lang.ClassNotFoundException: org.jboss.netty.channel.ChannelFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more {code} It seems this was introduced with an updated netty jar in cbf304ebd0436a321753e81231545b705aa8dd23 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load
[ https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945156#comment-13945156 ] Brandon Williams commented on CASSANDRA-6908: - CASSANDRA-6465 is what I was thinking of. Dynamic endpoint snitch destabilizes cluster under heavy load - Key: CASSANDRA-6908 URL: https://issues.apache.org/jira/browse/CASSANDRA-6908 Project: Cassandra Issue Type: Improvement Components: Config, Core Reporter: Bartłomiej Romański We observe that with dynamic snitch disabled our cluster is much more stable than with dynamic snitch enabled. We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB RAM, 2x480 GB SSD). We mostly do reads (about 300k/s). We use Astyanax on client side with TOKEN_AWARE option enabled. It automatically direct read queries to one of the nodes responsible the given token. In that case with dynamic snitch disabled Cassandra always handles read locally. With dynamic snitch enabled Cassandra very often decides to proxy the read to some other node. This causes much higher CPU usage and produces much more garbage what results in more often GC pauses (young generation fills up quicker). By much higher and much more I mean 1.5-2x. I'm aware that higher dynamic_snitch_badness_threshold value should solve that issue. The default value is 0.1. I've looked at scores exposed in JMX and the problem is that our values seemed to be completely random. They are between usually 0.5 and 2.0, but changes randomly every time I hit refresh. Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something like that, but the result will be similar to simply disabling the dynamic switch at all (that's what we done). I've tried to understand what's the logic behind these scores and I'm not sure if I get the idea... It's a sum (without any multipliers) of two components: - ratio of recent given node latency to recent average node latency - something called 'severity', what, if I analyzed the code correctly, is a result of BackgroundActivityMonitor.getIOWait() - it's a ratio of iowait CPU time to the whole CPU time as reported in /proc/stats (the ratio is multiplied by 100) In our case the second value is something around 0-2% but varies quite heavily every second. What's the idea behind simply adding this two values without any multipliers (e.g the second one is in percentage while the first one is not)? Are we sure this is the best possible way of calculating the final score? Is there a way too force Cassandra to use (much) longer samples? In our case we probably need that to get stable values. The 'severity' is calculated for each second. The mean latency is calculated based on some magic, hardcoded values (ALPHA = 0.75, WINDOW_SIZE = 100). Am I right that there's no way to tune that without hacking the code? I'm aware that there's dynamic_snitch_update_interval_in_ms property in the config file, but that only determines how often the scores are recalculated not how long samples are taken. Is that correct? To sum up, It would be really nice to have more control over dynamic snitch behavior or at least have the official option to disable it described in the default config file (it took me some time to discover that we can just disable it instead of hacking with dynamic_snitch_badness_threshold=1000). Currently for some scenarios (like ours - optimized cluster, token aware client, heavy load) it causes more harm than good. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6915) Show storage rows in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945169#comment-13945169 ] Sylvain Lebresne commented on CASSANDRA-6915: - bq. group partitions with whitespace in between, for compound primary keys I think that giving more clues about partitioning and clustering is not a bad idea in itself, but that's kind of covered by CASSANDRA-6910 imo (I like the idea there of using some color code in the header a bit better than adding empty lines between partition, though really one doesn't exclude the other). Show storage rows in cqlsh -- Key: CASSANDRA-6915 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Robbie Strickland Labels: cqlsh In Cassandra it's super important to understand how your CQL schema translates to the underlying storage rows. Right now the only way to see this is to create the schema in cqlsh, write some data, then query it using the CLI. Obviously we don't want to be encouraging people to use the CLI when it's supposed to be deprecated. So I'd like to see a function in cqlsh to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6915) Show storage rows in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945172#comment-13945172 ] Robbie Strickland commented on CASSANDRA-6915: -- It's the compound key (whether composite partition key or composite column) case that makes this useful--and I would still argue really important. Yes you can read the documentation to understand the mapping, but I think this remains one of the most misunderstood concepts in CQL. I would argue that it's important to understand the storage layer difference between PRIMARY KEY ((id, timestamp), event) and PRIMARY KEY (id, timestamp, event), and that the best way to see the difference is to visualize it. People still don't seem to get the difference between partition keys and composite column names, and this obviously has huge implications for what sorts of queries you can run and how wide your rows will get. Perhaps something along the lines of: CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY (id, timestamp, event) ); EXPLAIN MyTable; Partition Key: id (uuid) Columns: timestamp:event:details (int:string:string) timestamp:event:userId (int:string:string) CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY ((id, timestamp), event) ); EXPLAIN MyTable; Partition Key: id:timestamp (uuid:int) Columns: event:details (string) event:userId (string) Show storage rows in cqlsh -- Key: CASSANDRA-6915 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Robbie Strickland Labels: cqlsh In Cassandra it's super important to understand how your CQL schema translates to the underlying storage rows. Right now the only way to see this is to create the schema in cqlsh, write some data, then query it using the CLI. Obviously we don't want to be encouraging people to use the CLI when it's supposed to be deprecated. So I'd like to see a function in cqlsh to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6915) Show storage rows in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945172#comment-13945172 ] Robbie Strickland edited comment on CASSANDRA-6915 at 3/24/14 2:42 PM: --- It's the compound key (whether composite partition key or composite column) case that makes this useful--and I would still argue really important. Yes you can read the documentation to understand the mapping, but I think this remains one of the most misunderstood concepts in CQL. I would argue that it's important to understand the storage layer difference between PRIMARY KEY ((id, timestamp), event) and PRIMARY KEY (id, timestamp, event), and that the best way to see the difference is to visualize it. People still don't seem to get the difference between partition keys and composite column names, and this obviously has huge implications for what sorts of queries you can run and how wide your rows will get. Perhaps something along the lines of: {code} CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY (id, timestamp, event) ); EXPLAIN MyTable; Partition Key: id (uuid) Columns: timestamp:event:details (int:string:string) timestamp:event:userId (int:string:string) CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY ((id, timestamp), event) ); EXPLAIN MyTable; Partition Key: id:timestamp (uuid:int) Columns: event:details (string) event:userId (string) {code} was (Author: rstrickland): It's the compound key (whether composite partition key or composite column) case that makes this useful--and I would still argue really important. Yes you can read the documentation to understand the mapping, but I think this remains one of the most misunderstood concepts in CQL. I would argue that it's important to understand the storage layer difference between PRIMARY KEY ((id, timestamp), event) and PRIMARY KEY (id, timestamp, event), and that the best way to see the difference is to visualize it. People still don't seem to get the difference between partition keys and composite column names, and this obviously has huge implications for what sorts of queries you can run and how wide your rows will get. Perhaps something along the lines of: CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY (id, timestamp, event) ); EXPLAIN MyTable; Partition Key: id (uuid) Columns: timestamp:event:details (int:string:string) timestamp:event:userId (int:string:string) CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY ((id, timestamp), event) ); EXPLAIN MyTable; Partition Key: id:timestamp (uuid:int) Columns: event:details (string) event:userId (string) Show storage rows in cqlsh -- Key: CASSANDRA-6915 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Robbie Strickland Labels: cqlsh In Cassandra it's super important to understand how your CQL schema translates to the underlying storage rows. Right now the only way to see this is to create the schema in cqlsh, write some data, then query it using the CLI. Obviously we don't want to be encouraging people to use the CLI when it's supposed to be deprecated. So I'd like to see a function in cqlsh to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945173#comment-13945173 ] Marcus Eriksson commented on CASSANDRA-6696: Been poking this, wip-patch pushed here: https://github.com/krummas/cassandra/commits/marcuse/6696 it does the following; * Extract an interface out of SSTableWriter (imaginatively called SSTableWriterInterface), start using this interface everywhere * Create DiskAwareSSTableWriter which knows about disk layout and starts using it instead of standard SSTW * Ranges of tokens are assigned to the disks, this way we only need to check is the key we are appending larger than the boundary token for the current disk? If so, create a new SSTableWriter for that disk * Breaks unit tests todo: * fix unit tests, general cleanups * I kind of want to name the interface SSTableWriter and call the old SSTW class something else, but i guess SSTW is the class that most external people depend on, so maybe not * Take disk size into consideration when splitting the ranges over disks, this needs to be deterministic though, so we have to use total disk size instead of free disk space. * Make other partitioners than M3P work * Fix keycache Rebalancing of data is simply running upgradesstables or scrub, if we loose a disk, we will take writes to the other disks Comments on this approach? Drive replacement in JBOD can cause data to reappear. -- Key: CASSANDRA-6696 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new empty one and repair is run. This can cause deleted data to come back in some cases. Also this is true for corrupt stables in which we delete the corrupt stable and run repair. Here is an example: Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. row=sankalp col=sankalp is written 20 days back and successfully went to all three nodes. Then a delete/tombstone was written successfully for the same row column 15 days back. Since this tombstone is more than gc grace, it got compacted in Nodes A and B since it got compacted with the actual data. So there is no trace of this row column in node A and B. Now in node C, say the original data is in drive1 and tombstone is in drive2. Compaction has not yet reclaimed the data and tombstone. Drive2 becomes corrupt and was replaced with new empty drive. Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp has come back to life. Now after replacing the drive we run repair. This data will be propagated to all nodes. Note: This is still a problem even if we run repair every gc grace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6915) Show storage rows in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945172#comment-13945172 ] Robbie Strickland edited comment on CASSANDRA-6915 at 3/24/14 2:44 PM: --- It's the compound key (whether composite partition key or composite column) case that makes this useful--and I would still argue really important. Yes you can read the documentation to understand the mapping, but I think this remains one of the most misunderstood concepts in CQL. I would argue that it's important to understand the storage layer difference between PRIMARY KEY ((id, timestamp), event) and PRIMARY KEY (id, timestamp, event), and that the best way to see the difference is to visualize it. People still don't seem to get the difference between partition keys and composite column names, and this obviously has huge implications for what sorts of queries you can run and how wide your rows will get. Perhaps something along the lines of: {code} CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY (id, timestamp, event) ); EXPLAIN MyTable; Partition Key: id (uuid) Columns: timestamp:event:details (int:string:string) timestamp:event:userId (int:string:string) CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY ((id, timestamp), event) ); EXPLAIN MyTable; Partition Key: id:timestamp (uuid:int) Columns: event:details (string:string) event:userId (string:string) {code} was (Author: rstrickland): It's the compound key (whether composite partition key or composite column) case that makes this useful--and I would still argue really important. Yes you can read the documentation to understand the mapping, but I think this remains one of the most misunderstood concepts in CQL. I would argue that it's important to understand the storage layer difference between PRIMARY KEY ((id, timestamp), event) and PRIMARY KEY (id, timestamp, event), and that the best way to see the difference is to visualize it. People still don't seem to get the difference between partition keys and composite column names, and this obviously has huge implications for what sorts of queries you can run and how wide your rows will get. Perhaps something along the lines of: {code} CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY (id, timestamp, event) ); EXPLAIN MyTable; Partition Key: id (uuid) Columns: timestamp:event:details (int:string:string) timestamp:event:userId (int:string:string) CREATE TABLE MyTable ( id uuid, timestamp int, event string, details string, userId string, PRIMARY KEY ((id, timestamp), event) ); EXPLAIN MyTable; Partition Key: id:timestamp (uuid:int) Columns: event:details (string) event:userId (string) {code} Show storage rows in cqlsh -- Key: CASSANDRA-6915 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Robbie Strickland Labels: cqlsh In Cassandra it's super important to understand how your CQL schema translates to the underlying storage rows. Right now the only way to see this is to create the schema in cqlsh, write some data, then query it using the CLI. Obviously we don't want to be encouraging people to use the CLI when it's supposed to be deprecated. So I'd like to see a function in cqlsh to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6915) Show storage rows in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945179#comment-13945179 ] Sylvain Lebresne commented on CASSANDRA-6915: - bq. I would argue that it's important to understand the storage layer difference between PRIMARY KEY ((id, timestamp), event) and PRIMARY KEY (id, timestamp, event) It's important to understand what your partition key is, and that the partition key decides how our table rows will be distributed on the cluster. But you don't need to delve into storage layer representation to explain that. Again, I think the approach of CASSANDRA-6910 of using separate colors in the resultset header (colors that we could reuse in DESC) is simple and efficient. Show storage rows in cqlsh -- Key: CASSANDRA-6915 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Robbie Strickland Labels: cqlsh In Cassandra it's super important to understand how your CQL schema translates to the underlying storage rows. Right now the only way to see this is to create the schema in cqlsh, write some data, then query it using the CLI. Obviously we don't want to be encouraging people to use the CLI when it's supposed to be deprecated. So I'd like to see a function in cqlsh to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6915) Show storage rows in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945182#comment-13945182 ] Robbie Strickland commented on CASSANDRA-6915: -- I agree that CASSANDRA-6910 is a good step, but I'd still like to see something along the lines of the EXPLAIN I demonstrate above. Maybe I'm just hanging onto the past, but I think a lot of people would appreciate the overt explanation. Show storage rows in cqlsh -- Key: CASSANDRA-6915 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Robbie Strickland Labels: cqlsh In Cassandra it's super important to understand how your CQL schema translates to the underlying storage rows. Right now the only way to see this is to create the schema in cqlsh, write some data, then query it using the CLI. Obviously we don't want to be encouraging people to use the CLI when it's supposed to be deprecated. So I'd like to see a function in cqlsh to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945194#comment-13945194 ] Jonathan Ellis commented on CASSANDRA-6696: --- Can we drop BOP/OPP in 3.0? Drive replacement in JBOD can cause data to reappear. -- Key: CASSANDRA-6696 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new empty one and repair is run. This can cause deleted data to come back in some cases. Also this is true for corrupt stables in which we delete the corrupt stable and run repair. Here is an example: Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. row=sankalp col=sankalp is written 20 days back and successfully went to all three nodes. Then a delete/tombstone was written successfully for the same row column 15 days back. Since this tombstone is more than gc grace, it got compacted in Nodes A and B since it got compacted with the actual data. So there is no trace of this row column in node A and B. Now in node C, say the original data is in drive1 and tombstone is in drive2. Compaction has not yet reclaimed the data and tombstone. Drive2 becomes corrupt and was replaced with new empty drive. Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp has come back to life. Now after replacing the drive we run repair. This data will be propagated to all nodes. Note: This is still a problem even if we run repair every gc grace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6357) Flush memtables to separate directory
[ https://issues.apache.org/jira/browse/CASSANDRA-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dan jatnieks updated CASSANDRA-6357: Attachment: c6357-2.1-stress-write-adj-ops-sec.png c6357-2.1-stress-write-latency-median.png c6357-2.1-stress-write-latency-99th.png Flush memtables to separate directory - Key: CASSANDRA-6357 URL: https://issues.apache.org/jira/browse/CASSANDRA-6357 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Patrick McFadin Assignee: Jonathan Ellis Priority: Minor Labels: performance Fix For: 2.1 beta1 Attachments: 6357-v2.txt, 6357.txt, c6357-2.1-stress-write-adj-ops-sec.png, c6357-2.1-stress-write-latency-99th.png, c6357-2.1-stress-write-latency-median.png, c6357-stress-write-latency-99th-1.png Flush writers are a critical element for keeping a node healthy. When several compactions run on systems with low performing data directories, IO becomes a premium. Once the disk subsystem is saturated, write IO is blocked which will cause flush writer threads to backup. Since memtables are large blocks of memory in the JVM, too much blocking can cause excessive GC over time degrading performance. In the worst case causing an OOM. Since compaction is running on the data directories. My proposal is to create a separate directory for flushing memtables. Potentially we can use the same methodology of keeping the commit log separate and minimize disk contention against the critical function of the flushwriter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6357) Flush memtables to separate directory
[ https://issues.apache.org/jira/browse/CASSANDRA-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945219#comment-13945219 ] dan jatnieks commented on CASSANDRA-6357: - Retested this using trunk (as of Mar 13). I switched to different hardware with 7200rpm disks because the slow 5400rpm disks on the other system just couldn't keep up without throttling stress op/s. The machine this time was a Dell with two quad core hyper-threaded Intel Xeon E5620 CPU, 32Gb memory, and 8 disks (500Gb, 7200 rpm). The two scenarios were the same as last time and used the same stress parameters. The results were much less dramatic than the 2.0 test. The base test results (data and flush on the same device) : {noformat} real op rate : 9433 adjusted op rate : 9435 adjusted op rate stderr : 0 key rate : 9433 latency mean : 5.3 latency median: 1.4 latency 95th percentile : 5.4 latency 99th percentile : 12.9 latency 99.9th percentile : 259.4 latency max : 20873.9 Total operation time : 00:17:40 {noformat} The flush test results (data and flush on separate devices): {noformat} real op rate : 10391 adjusted op rate : 10391 adjusted op rate stderr : 0 key rate : 10391 latency mean : 4.8 latency median: 1.4 latency 95th percentile : 5.4 latency 99th percentile : 14.2 latency 99.9th percentile : 245.0 latency max : 17035.2 Total operation time : 00:16:02 {noformat} See attached graphs: [2.1 Stress Write Latency 99.9th Percentile|^c6357-2.1-stress-write-latency-99th.png] [2.1 Stress Write Median Latency|^c6357-2.1-stress-write-latency-median.png] [2.1 Stress Write Adjusted Ops/sec|^c6357-2.1-stress-write-adj-ops-sec.png] Flush memtables to separate directory - Key: CASSANDRA-6357 URL: https://issues.apache.org/jira/browse/CASSANDRA-6357 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Patrick McFadin Assignee: Jonathan Ellis Priority: Minor Labels: performance Fix For: 2.1 beta1 Attachments: 6357-v2.txt, 6357.txt, c6357-2.1-stress-write-adj-ops-sec.png, c6357-2.1-stress-write-latency-99th.png, c6357-2.1-stress-write-latency-median.png, c6357-stress-write-latency-99th-1.png Flush writers are a critical element for keeping a node healthy. When several compactions run on systems with low performing data directories, IO becomes a premium. Once the disk subsystem is saturated, write IO is blocked which will cause flush writer threads to backup. Since memtables are large blocks of memory in the JVM, too much blocking can cause excessive GC over time degrading performance. In the worst case causing an OOM. Since compaction is running on the data directories. My proposal is to create a separate directory for flushing memtables. Potentially we can use the same methodology of keeping the commit log separate and minimize disk contention against the critical function of the flushwriter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6357) Flush memtables to separate directory
[ https://issues.apache.org/jira/browse/CASSANDRA-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945234#comment-13945234 ] Jonathan Ellis commented on CASSANDRA-6357: --- Hmm. If the answer is, as long as you're not on 5400rpm disks this doesn't do anything for you then I'd be inclined to back it out. Do we need to test a different scenario [~pmcfadin]? Flush memtables to separate directory - Key: CASSANDRA-6357 URL: https://issues.apache.org/jira/browse/CASSANDRA-6357 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Patrick McFadin Assignee: Jonathan Ellis Priority: Minor Labels: performance Fix For: 2.1 beta1 Attachments: 6357-v2.txt, 6357.txt, c6357-2.1-stress-write-adj-ops-sec.png, c6357-2.1-stress-write-latency-99th.png, c6357-2.1-stress-write-latency-median.png, c6357-stress-write-latency-99th-1.png Flush writers are a critical element for keeping a node healthy. When several compactions run on systems with low performing data directories, IO becomes a premium. Once the disk subsystem is saturated, write IO is blocked which will cause flush writer threads to backup. Since memtables are large blocks of memory in the JVM, too much blocking can cause excessive GC over time degrading performance. In the worst case causing an OOM. Since compaction is running on the data directories. My proposal is to create a separate directory for flushing memtables. Potentially we can use the same methodology of keeping the commit log separate and minimize disk contention against the critical function of the flushwriter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6916) Preemptive re-open of compaction result
Benedict created CASSANDRA-6916: --- Summary: Preemptive re-open of compaction result Key: CASSANDRA-6916 URL: https://issues.apache.org/jira/browse/CASSANDRA-6916 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.1 Related to CASSANDRA-6812, but a little simpler: when compacting, we mess quite badly with the page cache. One thing we can do to mitigate this problem is to use the sstable we're writing before we've finished writing it, and to drop the regions from the old sstables from the page cache as soon as the new sstables have them (even if they're only written to the page cache). This should minimise any page cache churn, as the old sstables must be larger than the new sstable, and since both will be in memory, dropping the old sstables is at least as good as dropping the new. The approach is quite straight-forward. Every X MB written: # grab flushed length of index file; # grab second to last index summary record, after excluding those that point to positions after the flushed length; # open index file, and check that our last record doesn't occur outside of the flushed length of the data file (pretty unlikely) # Open the sstable with the calculated upper bound Some complications: # must keep running copy of compression metadata for reopening with # we need to be able to replace an sstable with itself but a different lower bound # we need to drop the old page cache only when readers have finished -- This message was sent by Atlassian JIRA (v6.2#6252)
git commit: Fix typo in DeletionInfo
Repository: cassandra Updated Branches: refs/heads/cassandra-1.2 35d4b5de8 - 91130373f Fix typo in DeletionInfo Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/91130373 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/91130373 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/91130373 Branch: refs/heads/cassandra-1.2 Commit: 91130373f474c8a8d8f5100044507553d2a9b872 Parents: 35d4b5d Author: Sylvain Lebresne sylv...@datastax.com Authored: Mon Mar 24 17:06:01 2014 +0100 Committer: Sylvain Lebresne sylv...@datastax.com Committed: Mon Mar 24 17:06:01 2014 +0100 -- src/java/org/apache/cassandra/db/DeletionInfo.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/91130373/src/java/org/apache/cassandra/db/DeletionInfo.java -- diff --git a/src/java/org/apache/cassandra/db/DeletionInfo.java b/src/java/org/apache/cassandra/db/DeletionInfo.java index 91af9fd..ce683d1 100644 --- a/src/java/org/apache/cassandra/db/DeletionInfo.java +++ b/src/java/org/apache/cassandra/db/DeletionInfo.java @@ -227,7 +227,7 @@ public class DeletionInfo public boolean mayModify(DeletionInfo delInfo) { return topLevel.markedForDeleteAt delInfo.topLevel.markedForDeleteAt -|| ranges == null; +|| ranges != null; } @Override
[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed
[ https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945283#comment-13945283 ] Benedict commented on CASSANDRA-6746: - CASSANDRA-6916 is my proposed solution to this problem, which should provide pretty much optimal behaviour on this front, regardless of OS, and with fewer parameters to tweak. Reads have a slow ramp up in speed -- Key: CASSANDRA-6746 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746 Project: Cassandra Issue Type: Bug Components: Core Reporter: Ryan McGuire Assignee: Benedict Labels: performance Fix For: 2.1 beta2 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 6746-patched.png, 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, cassandra-2.1-bdplab-trial-fincore.tar.bz2 On a physical four node cluister I am doing a big write and then a big read. The read takes a long time to ramp up to respectable speeds. !2.1_vs_2.0_read.png! [See data here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6916) Preemptive opening of compaction result
[ https://issues.apache.org/jira/browse/CASSANDRA-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6916: Summary: Preemptive opening of compaction result (was: Preemptive re-open of compaction result) Preemptive opening of compaction result --- Key: CASSANDRA-6916 URL: https://issues.apache.org/jira/browse/CASSANDRA-6916 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.1 Related to CASSANDRA-6812, but a little simpler: when compacting, we mess quite badly with the page cache. One thing we can do to mitigate this problem is to use the sstable we're writing before we've finished writing it, and to drop the regions from the old sstables from the page cache as soon as the new sstables have them (even if they're only written to the page cache). This should minimise any page cache churn, as the old sstables must be larger than the new sstable, and since both will be in memory, dropping the old sstables is at least as good as dropping the new. The approach is quite straight-forward. Every X MB written: # grab flushed length of index file; # grab second to last index summary record, after excluding those that point to positions after the flushed length; # open index file, and check that our last record doesn't occur outside of the flushed length of the data file (pretty unlikely) # Open the sstable with the calculated upper bound Some complications: # must keep running copy of compression metadata for reopening with # we need to be able to replace an sstable with itself but a different lower bound # we need to drop the old page cache only when readers have finished -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6916) Preemptive re-open of compaction result
[ https://issues.apache.org/jira/browse/CASSANDRA-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945281#comment-13945281 ] Benedict commented on CASSANDRA-6916: - [Here|https://github.com/belliottsmith/cassandra/tree/6916-preempive-open-compact] is a patch that adds this functionality, and also drops support for preheating page cache or dropping the page cache on writes, since they are no longer likely to provide any benefit above the standard behaviour with this patch. Preemptive re-open of compaction result --- Key: CASSANDRA-6916 URL: https://issues.apache.org/jira/browse/CASSANDRA-6916 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.1 Related to CASSANDRA-6812, but a little simpler: when compacting, we mess quite badly with the page cache. One thing we can do to mitigate this problem is to use the sstable we're writing before we've finished writing it, and to drop the regions from the old sstables from the page cache as soon as the new sstables have them (even if they're only written to the page cache). This should minimise any page cache churn, as the old sstables must be larger than the new sstable, and since both will be in memory, dropping the old sstables is at least as good as dropping the new. The approach is quite straight-forward. Every X MB written: # grab flushed length of index file; # grab second to last index summary record, after excluding those that point to positions after the flushed length; # open index file, and check that our last record doesn't occur outside of the flushed length of the data file (pretty unlikely) # Open the sstable with the calculated upper bound Some complications: # must keep running copy of compression metadata for reopening with # we need to be able to replace an sstable with itself but a different lower bound # we need to drop the old page cache only when readers have finished -- This message was sent by Atlassian JIRA (v6.2#6252)
git commit: Update versions and NEWS for 1.2.16 release
Repository: cassandra Updated Branches: refs/heads/cassandra-1.2 91130373f - 05fcfa2be Update versions and NEWS for 1.2.16 release Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/05fcfa2b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/05fcfa2b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/05fcfa2b Branch: refs/heads/cassandra-1.2 Commit: 05fcfa2be4eba2cd6daeee62d943f48c45f42668 Parents: 9113037 Author: Sylvain Lebresne sylv...@datastax.com Authored: Mon Mar 24 17:16:15 2014 +0100 Committer: Sylvain Lebresne sylv...@datastax.com Committed: Mon Mar 24 17:16:15 2014 +0100 -- NEWS.txt | 9 + build.xml| 2 +- debian/changelog | 6 ++ 3 files changed, 16 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/05fcfa2b/NEWS.txt -- diff --git a/NEWS.txt b/NEWS.txt index 771536d..f297634 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -14,6 +14,15 @@ restore snapshots created with the previous major version using the using the provided 'sstableupgrade' tool. +1.2.16 +== + +Upgrading +- +- Nothing specific to this release, but please see 1.2.15 if you are upgrading + from a previous version. + + 1.2.15 == http://git-wip-us.apache.org/repos/asf/cassandra/blob/05fcfa2b/build.xml -- diff --git a/build.xml b/build.xml index eaf35b5..5db0a6a 100644 --- a/build.xml +++ b/build.xml @@ -25,7 +25,7 @@ property name=debuglevel value=source,lines,vars/ !-- default version and SCM information -- -property name=base.version value=1.2.15/ +property name=base.version value=1.2.16/ property name=scm.connection value=scm:git://git.apache.org/cassandra.git/ property name=scm.developerConnection value=scm:git://git.apache.org/cassandra.git/ property name=scm.url value=http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=tree/ http://git-wip-us.apache.org/repos/asf/cassandra/blob/05fcfa2b/debian/changelog -- diff --git a/debian/changelog b/debian/changelog index bb8ecf2..50318c8 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,9 @@ +cassandra (1.2.16) unstable; urgency=low + + * New release + + -- Sylvain Lebresne slebre...@apache.org Mon, 24 Mar 2014 17:15:34 +0100 + cassandra (1.2.15) unstable; urgency=low * New release
[jira] [Commented] (CASSANDRA-6907) ignore snapshot repair flag on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945305#comment-13945305 ] Joshua McKenzie commented on CASSANDRA-6907: Went with protection in the startup .bat file to force addition of -par on repair if it's not there and log to stdout. Lighter touch and isolates changes to windows environment only. ignore snapshot repair flag on Windows -- Key: CASSANDRA-6907 URL: https://issues.apache.org/jira/browse/CASSANDRA-6907 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jonathan Ellis Assignee: Joshua McKenzie Fix For: 2.0.7 Attachments: CASSANDRA-6907_v1.patch Per discussion in CASSANDRA-4050, we should ignore the snapshot repair flag on windows, and log a warning while proceeding to do non-snapshot repair. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6907) ignore snapshot repair flag on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-6907: --- Attachment: CASSANDRA-6907_v1.patch ignore snapshot repair flag on Windows -- Key: CASSANDRA-6907 URL: https://issues.apache.org/jira/browse/CASSANDRA-6907 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jonathan Ellis Assignee: Joshua McKenzie Fix For: 2.0.7 Attachments: CASSANDRA-6907_v1.patch Per discussion in CASSANDRA-4050, we should ignore the snapshot repair flag on windows, and log a warning while proceeding to do non-snapshot repair. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6907) ignore snapshot repair flag on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-6907: --- Attachment: CASSANDRA-6907_v2.patch fixed ticket # listed in comment ignore snapshot repair flag on Windows -- Key: CASSANDRA-6907 URL: https://issues.apache.org/jira/browse/CASSANDRA-6907 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jonathan Ellis Assignee: Joshua McKenzie Fix For: 2.0.7 Attachments: CASSANDRA-6907_v1.patch, CASSANDRA-6907_v2.patch Per discussion in CASSANDRA-4050, we should ignore the snapshot repair flag on windows, and log a warning while proceeding to do non-snapshot repair. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6907) ignore snapshot repair flag on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945315#comment-13945315 ] Jonathan Ellis commented on CASSANDRA-6907: --- Wouldn't addressing in code be more robust for invocations via jmx as well as .bat? ignore snapshot repair flag on Windows -- Key: CASSANDRA-6907 URL: https://issues.apache.org/jira/browse/CASSANDRA-6907 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jonathan Ellis Assignee: Joshua McKenzie Fix For: 2.0.7 Attachments: CASSANDRA-6907_v1.patch, CASSANDRA-6907_v2.patch Per discussion in CASSANDRA-4050, we should ignore the snapshot repair flag on windows, and log a warning while proceeding to do non-snapshot repair. -- This message was sent by Atlassian JIRA (v6.2#6252)
Git Push Summary
Repository: cassandra Updated Tags: refs/tags/1.2.16-tentative [created] 05fcfa2be
[jira] [Commented] (CASSANDRA-6907) ignore snapshot repair flag on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945329#comment-13945329 ] Joshua McKenzie commented on CASSANDRA-6907: Would be more robust, yes, but also represent more invasive changes for temporary OS-specific workarounds. If we want to err on the side of robustness that should be a trivial change. ignore snapshot repair flag on Windows -- Key: CASSANDRA-6907 URL: https://issues.apache.org/jira/browse/CASSANDRA-6907 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jonathan Ellis Assignee: Joshua McKenzie Fix For: 2.0.7 Attachments: CASSANDRA-6907_v1.patch, CASSANDRA-6907_v2.patch Per discussion in CASSANDRA-4050, we should ignore the snapshot repair flag on windows, and log a warning while proceeding to do non-snapshot repair. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination
[ https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945345#comment-13945345 ] Alex Liu commented on CASSANDRA-6311: - Key is Long which is row count number. Value is Row which is backed by ArrayBackedRow, a protected class. We need make it to be a public class. Add CqlRecordReader to take advantage of native CQL pagination -- Key: CASSANDRA-6311 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Fix For: 2.0.7 Attachments: 6311-v10.txt, 6311-v3-2.0-branch.txt, 6311-v4.txt, 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6311-v7.txt, 6311-v8.txt, 6311-v9.txt, 6331-2.0-branch.txt, 6331-v2-2.0-branch.txt Since the latest Cql pagination is done and it should be more efficient, so we need update CqlPagingRecordReader to use it instead of the custom thrift paging. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6907) ignore snapshot repair flag on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945357#comment-13945357 ] Jonathan Ellis commented on CASSANDRA-6907: --- Let's go ahead and do that, we can just leave it out of 3.0 when we merge so there's no need to worry about it outliving its usefulness. ignore snapshot repair flag on Windows -- Key: CASSANDRA-6907 URL: https://issues.apache.org/jira/browse/CASSANDRA-6907 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jonathan Ellis Assignee: Joshua McKenzie Fix For: 2.0.7 Attachments: CASSANDRA-6907_v1.patch, CASSANDRA-6907_v2.patch Per discussion in CASSANDRA-4050, we should ignore the snapshot repair flag on windows, and log a warning while proceeding to do non-snapshot repair. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6907) ignore snapshot repair flag on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-6907: --- Attachment: CASSANDRA-6907_v3.patch ignore snapshot repair flag on Windows -- Key: CASSANDRA-6907 URL: https://issues.apache.org/jira/browse/CASSANDRA-6907 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jonathan Ellis Assignee: Joshua McKenzie Fix For: 2.0.7 Attachments: CASSANDRA-6907_v1.patch, CASSANDRA-6907_v2.patch, CASSANDRA-6907_v3.patch Per discussion in CASSANDRA-4050, we should ignore the snapshot repair flag on windows, and log a warning while proceeding to do non-snapshot repair. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6914) Map element is not allowed in CAS condition with DELETE/UPDATE query
[ https://issues.apache.org/jira/browse/CASSANDRA-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Stepura updated CASSANDRA-6914: --- Description: {code} CREATE TABLE test (id int, data maptext,text, PRIMARY KEY(id)); INSERT INTO test (id, data) VALUES (1,{'a':'1'}); DELETE FROM test WHERE id=1 IF data['a']=null; Bad Request: line 1:40 missing EOF at '=' UPDATE test SET data['b']='2' WHERE id=1 IF data['a']='1'; Bad Request: line 1:53 missing EOF at '=' {code} These queries was successfuly executed with cassandra 2.0.5, but don't work in 2.0.6 release was: CREATE TABLE test (id int, data maptext,text, PRIMARY KEY(id)); INSERT INTO test (id, data) VALUES (1,{'a':'1'}); DELETE FROM test WHERE id=1 IF data['a']=null; Bad Request: line 1:40 missing EOF at '=' UPDATE test SET data['b']='2' WHERE id=1 IF data['a']='1'; Bad Request: line 1:53 missing EOF at '=' These queries was successfuly executed with cassandra 2.0.5, but don't work in 2.0.6 release Map element is not allowed in CAS condition with DELETE/UPDATE query Key: CASSANDRA-6914 URL: https://issues.apache.org/jira/browse/CASSANDRA-6914 Project: Cassandra Issue Type: Bug Reporter: Dmitriy Ukhlov Assignee: Sylvain Lebresne Fix For: 2.0.7 {code} CREATE TABLE test (id int, data maptext,text, PRIMARY KEY(id)); INSERT INTO test (id, data) VALUES (1,{'a':'1'}); DELETE FROM test WHERE id=1 IF data['a']=null; Bad Request: line 1:40 missing EOF at '=' UPDATE test SET data['b']='2' WHERE id=1 IF data['a']='1'; Bad Request: line 1:53 missing EOF at '=' {code} These queries was successfuly executed with cassandra 2.0.5, but don't work in 2.0.6 release -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6907) ignore snapshot repair flag on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945419#comment-13945419 ] Joshua McKenzie commented on CASSANDRA-6907: new patch attached. ignore snapshot repair flag on Windows -- Key: CASSANDRA-6907 URL: https://issues.apache.org/jira/browse/CASSANDRA-6907 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jonathan Ellis Assignee: Joshua McKenzie Fix For: 2.0.7 Attachments: CASSANDRA-6907_v1.patch, CASSANDRA-6907_v2.patch, CASSANDRA-6907_v3.patch Per discussion in CASSANDRA-4050, we should ignore the snapshot repair flag on windows, and log a warning while proceeding to do non-snapshot repair. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945471#comment-13945471 ] Benedict commented on CASSANDRA-6696: - It seems to me it _might_ also be simpler, once this change is made, to just split the range of the memtable and call subMap(lb, ub) and spawn a separate flush writer for each range, which might avoid the need for an SSTableWriterInterface... Might also be a good time to introduce a separate flush executor for each disk. Drive replacement in JBOD can cause data to reappear. -- Key: CASSANDRA-6696 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new empty one and repair is run. This can cause deleted data to come back in some cases. Also this is true for corrupt stables in which we delete the corrupt stable and run repair. Here is an example: Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. row=sankalp col=sankalp is written 20 days back and successfully went to all three nodes. Then a delete/tombstone was written successfully for the same row column 15 days back. Since this tombstone is more than gc grace, it got compacted in Nodes A and B since it got compacted with the actual data. So there is no trace of this row column in node A and B. Now in node C, say the original data is in drive1 and tombstone is in drive2. Compaction has not yet reclaimed the data and tombstone. Drive2 becomes corrupt and was replaced with new empty drive. Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp has come back to life. Now after replacing the drive we run repair. This data will be propagated to all nodes. Note: This is still a problem even if we run repair every gc grace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945452#comment-13945452 ] Benedict commented on CASSANDRA-6696: - Had a quick glance, and have one initial thought: Might be worth forcing compaction to always work on one disk (i.e. always selects files from one disk for compaction). Would simplify it slightly, and it seems likely to be the most optimal use of IO, but also as it stands you could have a scenario where one file is selected each from a different disk, which would result in a perpetual compaction loop. Drive replacement in JBOD can cause data to reappear. -- Key: CASSANDRA-6696 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new empty one and repair is run. This can cause deleted data to come back in some cases. Also this is true for corrupt stables in which we delete the corrupt stable and run repair. Here is an example: Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. row=sankalp col=sankalp is written 20 days back and successfully went to all three nodes. Then a delete/tombstone was written successfully for the same row column 15 days back. Since this tombstone is more than gc grace, it got compacted in Nodes A and B since it got compacted with the actual data. So there is no trace of this row column in node A and B. Now in node C, say the original data is in drive1 and tombstone is in drive2. Compaction has not yet reclaimed the data and tombstone. Drive2 becomes corrupt and was replaced with new empty drive. Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp has come back to life. Now after replacing the drive we run repair. This data will be propagated to all nodes. Note: This is still a problem even if we run repair every gc grace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6541) New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set.
[ https://issues.apache.org/jira/browse/CASSANDRA-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945515#comment-13945515 ] Robert Coli commented on CASSANDRA-6541: {quote}It's a HotSpot regression that we're working around, not a Cassandra bug.{quote} Yes, so the statement this HotSpot regression is not-worked-around in all extant versions of Cassandra since the dawn of time is correct. Thanks for the clarification! New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set. - Key: CASSANDRA-6541 URL: https://issues.apache.org/jira/browse/CASSANDRA-6541 Project: Cassandra Issue Type: Bug Components: Config Reporter: jonathan lacefield Assignee: Brandon Williams Priority: Minor Fix For: 1.2.16, 2.0.6, 2.1 beta2 Newer versions of Oracle's Hotspot JVM , post 6u43 (maybe earlier) and 7u25 (maybe earlier), are experiencing issues with GC and JMX where heap slowly fills up overtime until OOM or a full GC event occurs, specifically when CMS is leveraged. Adding: {noformat} JVM_OPTS=$JVM_OPTS -XX:+CMSClassUnloadingEnabled {noformat} The the options in cassandra-env.sh alleviates the problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6541) New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set.
[ https://issues.apache.org/jira/browse/CASSANDRA-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Coli updated CASSANDRA-6541: --- Since Version: 0.3 New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set. - Key: CASSANDRA-6541 URL: https://issues.apache.org/jira/browse/CASSANDRA-6541 Project: Cassandra Issue Type: Bug Components: Config Reporter: jonathan lacefield Assignee: Brandon Williams Priority: Minor Fix For: 1.2.16, 2.0.6, 2.1 beta2 Newer versions of Oracle's Hotspot JVM , post 6u43 (maybe earlier) and 7u25 (maybe earlier), are experiencing issues with GC and JMX where heap slowly fills up overtime until OOM or a full GC event occurs, specifically when CMS is leveraged. Adding: {noformat} JVM_OPTS=$JVM_OPTS -XX:+CMSClassUnloadingEnabled {noformat} The the options in cassandra-env.sh alleviates the problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6541) New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set.
[ https://issues.apache.org/jira/browse/CASSANDRA-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945516#comment-13945516 ] Brandon Williams commented on CASSANDRA-6541: - Well, not really, since the problematic JVM versions didn't exist back then. New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set. - Key: CASSANDRA-6541 URL: https://issues.apache.org/jira/browse/CASSANDRA-6541 Project: Cassandra Issue Type: Bug Components: Config Reporter: jonathan lacefield Assignee: Brandon Williams Priority: Minor Fix For: 1.2.16, 2.0.6, 2.1 beta2 Newer versions of Oracle's Hotspot JVM , post 6u43 (maybe earlier) and 7u25 (maybe earlier), are experiencing issues with GC and JMX where heap slowly fills up overtime until OOM or a full GC event occurs, specifically when CMS is leveraged. Adding: {noformat} JVM_OPTS=$JVM_OPTS -XX:+CMSClassUnloadingEnabled {noformat} The the options in cassandra-env.sh alleviates the problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6541) New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set.
[ https://issues.apache.org/jira/browse/CASSANDRA-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945527#comment-13945527 ] Benedict commented on CASSANDRA-6541: - bq. Yes, so the statement this HotSpot regression is not-worked-around in all extant versions of Cassandra since the dawn of time is correct. Thanks for the clarification! No, it isn't, since the hot spot bug did not exist since the dawn of time. As such the from-version is ill-defined, but probably the most sensible definition requires determining the from-version-for-hotspot, which we have yet to manage, and aligning that with C* releases. New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set. - Key: CASSANDRA-6541 URL: https://issues.apache.org/jira/browse/CASSANDRA-6541 Project: Cassandra Issue Type: Bug Components: Config Reporter: jonathan lacefield Assignee: Brandon Williams Priority: Minor Fix For: 1.2.16, 2.0.6, 2.1 beta2 Newer versions of Oracle's Hotspot JVM , post 6u43 (maybe earlier) and 7u25 (maybe earlier), are experiencing issues with GC and JMX where heap slowly fills up overtime until OOM or a full GC event occurs, specifically when CMS is leveraged. Adding: {noformat} JVM_OPTS=$JVM_OPTS -XX:+CMSClassUnloadingEnabled {noformat} The the options in cassandra-env.sh alleviates the problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN
[ https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-6875: Fix Version/s: (was: 2.1) 2.1 beta2 CQL3: select multiple CQL rows in a single partition using IN - Key: CASSANDRA-6875 URL: https://issues.apache.org/jira/browse/CASSANDRA-6875 Project: Cassandra Issue Type: Bug Components: API Reporter: Nicolas Favre-Felix Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 beta2 In the spirit of CASSANDRA-4851 and to bring CQL to parity with Thrift, it is important to support reading several distinct CQL rows from a given partition using a distinct set of coordinates for these rows within the partition. CASSANDRA-4851 introduced a range scan over the multi-dimensional space of clustering keys. We also need to support a multi-get of CQL rows, potentially using the IN keyword to define a set of clustering keys to fetch at once. (reusing the same example\:) Consider the following table: {code} CREATE TABLE test ( k int, c1 int, c2 int, PRIMARY KEY (k, c1, c2) ); {code} with the following data: {code} k | c1 | c2 ---++ 0 | 0 | 0 0 | 0 | 1 0 | 1 | 0 0 | 1 | 1 {code} We can fetch a single row or a range of rows, but not a set of them: {code} SELECT * FROM test WHERE k = 0 AND (c1, c2) IN ((0, 0), (1,1)) ; Bad Request: line 1:54 missing EOF at ',' {code} Supporting this syntax would return: {code} k | c1 | c2 ---++ 0 | 0 | 0 0 | 1 | 1 {code} Being able to fetch these two CQL rows in a single read is important to maintain partition-level isolation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6918) Compaction Assert: Incorrect Row Data Size
Alexander Goodrich created CASSANDRA-6918: - Summary: Compaction Assert: Incorrect Row Data Size Key: CASSANDRA-6918 URL: https://issues.apache.org/jira/browse/CASSANDRA-6918 Project: Cassandra Issue Type: Bug Components: Core Environment: 11 node Linux Cassandra 1.2.15 cluster, each node configured as follows: 2P IntelXeon CPU X5660 @ 2.8 GHz (12 cores, 24 threads total) 148 GB RAM CentOS release 6.4 (Final) 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) Node configuration: Default cassandra.yaml settings for the most part with the following exceptions: rpc_server_type: hsha Reporter: Alexander Goodrich Fix For: 1.2.16 I have four tables in a schema with Replication Factor: 6 (previously we set this to 3, but when we added more nodes we figured adding more replication to improve read time would help, this might have aggravated the issue). create table table_value_one ( id timeuuid PRIMARY KEY, value_1 counter ); create table table_value_two ( id timeuuid PRIMARY KEY, value_2 counter ); create table table_position_lookup ( value_1 bigint, value_2 bigint, id timeuuid, PRIMARY KEY (id) ) WITH compaction={'class': 'LeveledCompactionStrategy'}; create table sorted_table ( row_key_index text, range bigint, sorted_value bigint, id timeuuid, extra_data listbigint, PRIMARY KEY ((row_key_index, range), sorted_value, id) ) WITH CLUSTERING ORDER BY (sorted_value DESC) AND compaction={'class': 'LeveledCompactionStrategy'}; The application creates an object, and stores it in sorted_table based on a value position - for example, an object has a value_1 of 5500, and a value_2 of 4300. There are rows which represent indices by which I can sort items based on these values in descending order. If I wish to see items with the highest # of value_1, I can create an index that stores them like so: row_key_index = 'highest_value_1s' Additionally, we shard each row by bucket ranges - which is simply the value_1 or value_2 / 1000. For example, our object above would be found in row_key_index = 'highest_value_1s' and range 5000, and also in row_key_index = 'highest_value_2s' with range 4300. The true values of this object are stored in two counter tables, table_value_one and table_value_two. The current indexed position is stored in table_position_lookup. We allow the application to modify value_one and value_two in the counter table indiscriminately. If we know the current values for these are dirty, we wait a tuned amount of time before we update the position in the sorted_table index. This creates 2 delete operations, and 2 write operations on the same table. The issue is when we expand the number of write/delete operations on sorted_table, we see the following assert in the system log: ERROR [CompactionExecutor:169] 2014-03-24 08:07:12,871 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:169,1,main] java.lang.AssertionError: incorrect row data size 77705872 written to /var/lib/cassandra/data/loadtest_1/sorted_table/loadtest_1-sorted_table-tmp-ic-165-Data.db; correct is 77800512 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Each object creates approximately ~500 unique row keys in sorted_table, and it possesses an extra_data field containing approximately 15 different bigint values. Previously, our application was running Cassandra 1.2.10 and we did not see the assert when our sorted_table did not have the extra data listbigint. Also, we were writing around ~200 unique row keys, only containing the ID column. We tried both leveled compaction
[jira] [Commented] (CASSANDRA-6541) New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set.
[ https://issues.apache.org/jira/browse/CASSANDRA-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945567#comment-13945567 ] Jeremiah Jordan commented on CASSANDRA-6541: C* from version really has no meaning here. But yes, if you run C* 0.3 (if that did JMX monitoring) with 1.6_45 you will hit this issue. Unless of course you add the given setting to your cassandra-env.sh. Which can be done without upgrading your C*. New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set. - Key: CASSANDRA-6541 URL: https://issues.apache.org/jira/browse/CASSANDRA-6541 Project: Cassandra Issue Type: Bug Components: Config Reporter: jonathan lacefield Assignee: Brandon Williams Priority: Minor Fix For: 1.2.16, 2.0.6, 2.1 beta2 Newer versions of Oracle's Hotspot JVM , post 6u43 (maybe earlier) and 7u25 (maybe earlier), are experiencing issues with GC and JMX where heap slowly fills up overtime until OOM or a full GC event occurs, specifically when CMS is leveraged. Adding: {noformat} JVM_OPTS=$JVM_OPTS -XX:+CMSClassUnloadingEnabled {noformat} The the options in cassandra-env.sh alleviates the problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6917) enum data type
[ https://issues.apache.org/jira/browse/CASSANDRA-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945592#comment-13945592 ] Jonathan Ellis edited comment on CASSANDRA-6917 at 3/24/14 7:35 PM: see also CASSANDRA-4175 was (Author: jbellis): see also CASSNADRA-4175 enum data type -- Key: CASSANDRA-6917 URL: https://issues.apache.org/jira/browse/CASSANDRA-6917 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Priority: Minor It seems like it would be useful to support an enum data type, that automatically converts string data from the user into a fixed-width data type with guaranteed uniqueness across the cluster. This data would be replicated to all nodes for lookup, but ideally would use only the keyspace RF to determine nodes for coordinating quorum writes/consistency. This would not only permit improved local disk and inter-node network IO for symbology information (e.g. stock tickers, ISINs, etc), but also potentially for column identifiers also, which are currently stored as their full string representation. It should be possible then with later updates to propagate the enum map (lazily) to clients through the native protocol, reducing network IO further. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6917) enum data type
[ https://issues.apache.org/jira/browse/CASSANDRA-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945592#comment-13945592 ] Jonathan Ellis commented on CASSANDRA-6917: --- see also CASSNADRA-4175 enum data type -- Key: CASSANDRA-6917 URL: https://issues.apache.org/jira/browse/CASSANDRA-6917 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Priority: Minor It seems like it would be useful to support an enum data type, that automatically converts string data from the user into a fixed-width data type with guaranteed uniqueness across the cluster. This data would be replicated to all nodes for lookup, but ideally would use only the keyspace RF to determine nodes for coordinating quorum writes/consistency. This would not only permit improved local disk and inter-node network IO for symbology information (e.g. stock tickers, ISINs, etc), but also potentially for column identifiers also, which are currently stored as their full string representation. It should be possible then with later updates to propagate the enum map (lazily) to clients through the native protocol, reducing network IO further. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6917) enum data type
Benedict created CASSANDRA-6917: --- Summary: enum data type Key: CASSANDRA-6917 URL: https://issues.apache.org/jira/browse/CASSANDRA-6917 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Priority: Minor It seems like it would be useful to support an enum data type, that automatically converts string data from the user into a fixed-width data type with guaranteed uniqueness across the cluster. This data would be replicated to all nodes for lookup, but ideally would use only the keyspace RF to determine nodes for coordinating quorum writes/consistency. This would not only permit improved local disk and inter-node network IO for symbology information (e.g. stock tickers, ISINs, etc), but also potentially for column identifiers also, which are currently stored as their full string representation. It should be possible then with later updates to propagate the enum map (lazily) to clients through the native protocol, reducing network IO further. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6918) Compaction Assert: Incorrect Row Data Size
[ https://issues.apache.org/jira/browse/CASSANDRA-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945597#comment-13945597 ] Jonathan Ellis commented on CASSANDRA-6918: --- [~iamaleksey] is this something that counters++ will fix or do you think it is more general than counters? Compaction Assert: Incorrect Row Data Size -- Key: CASSANDRA-6918 URL: https://issues.apache.org/jira/browse/CASSANDRA-6918 Project: Cassandra Issue Type: Bug Components: Core Environment: 11 node Linux Cassandra 1.2.15 cluster, each node configured as follows: 2P IntelXeon CPU X5660 @ 2.8 GHz (12 cores, 24 threads total) 148 GB RAM CentOS release 6.4 (Final) 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) Node configuration: Default cassandra.yaml settings for the most part with the following exceptions: rpc_server_type: hsha Reporter: Alexander Goodrich Fix For: 1.2.16 I have four tables in a schema with Replication Factor: 6 (previously we set this to 3, but when we added more nodes we figured adding more replication to improve read time would help, this might have aggravated the issue). create table table_value_one ( id timeuuid PRIMARY KEY, value_1 counter ); create table table_value_two ( id timeuuid PRIMARY KEY, value_2 counter ); create table table_position_lookup ( value_1 bigint, value_2 bigint, id timeuuid, PRIMARY KEY (id) ) WITH compaction={'class': 'LeveledCompactionStrategy'}; create table sorted_table ( row_key_index text, range bigint, sorted_value bigint, id timeuuid, extra_data listbigint, PRIMARY KEY ((row_key_index, range), sorted_value, id) ) WITH CLUSTERING ORDER BY (sorted_value DESC) AND compaction={'class': 'LeveledCompactionStrategy'}; The application creates an object, and stores it in sorted_table based on a value position - for example, an object has a value_1 of 5500, and a value_2 of 4300. There are rows which represent indices by which I can sort items based on these values in descending order. If I wish to see items with the highest # of value_1, I can create an index that stores them like so: row_key_index = 'highest_value_1s' Additionally, we shard each row by bucket ranges - which is simply the value_1 or value_2 / 1000. For example, our object above would be found in row_key_index = 'highest_value_1s' and range 5000, and also in row_key_index = 'highest_value_2s' with range 4300. The true values of this object are stored in two counter tables, table_value_one and table_value_two. The current indexed position is stored in table_position_lookup. We allow the application to modify value_one and value_two in the counter table indiscriminately. If we know the current values for these are dirty, we wait a tuned amount of time before we update the position in the sorted_table index. This creates 2 delete operations, and 2 write operations on the same table. The issue is when we expand the number of write/delete operations on sorted_table, we see the following assert in the system log: ERROR [CompactionExecutor:169] 2014-03-24 08:07:12,871 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:169,1,main] java.lang.AssertionError: incorrect row data size 77705872 written to /var/lib/cassandra/data/loadtest_1/sorted_table/loadtest_1-sorted_table-tmp-ic-165-Data.db; correct is 77800512 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Each object creates approximately ~500 unique
[jira] [Created] (CASSANDRA-6919) Use OpOrder to guard sstable references for reads, instead of acquiring/releasing references
Benedict created CASSANDRA-6919: --- Summary: Use OpOrder to guard sstable references for reads, instead of acquiring/releasing references Key: CASSANDRA-6919 URL: https://issues.apache.org/jira/browse/CASSANDRA-6919 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.1 beta2 To slightly improve CASSANDRA-6916, and because it's a bit of a simplification anyway, we should move to ensuring sstable resources remain around during reads by guarding them with an OpOrder (which is also being introduced for CASSANDRA-6694) instead of using markReferenced()/release. Note this does not eliminate markReferenced, as for long running processes such as compaction it makes sense to have an independent mechanism, because these long running processes would prevent all resource cleanup for their duration rather than just the resources they're using. All this does is cleanup and slightly optimise the read path, whilst giving better guarantees about resource cleanup (e.g. page cache dropping of old sstables which may have been replaced multiple times since the reader was created, so we are dropping pages we don't realise are still in use - in real terms it should be very rare for such a reader to outlive multiple replacements and this is only a performance issue, not a matter of correctness, but it's nice to absolutely be certain anyway). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6541) New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set.
[ https://issues.apache.org/jira/browse/CASSANDRA-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6541: -- Since Version: (was: 0.3) New versions of Hotspot create new Class objects on every JMX connection causing the heap to fill up with them if CMSClassUnloadingEnabled isn't set. - Key: CASSANDRA-6541 URL: https://issues.apache.org/jira/browse/CASSANDRA-6541 Project: Cassandra Issue Type: Bug Components: Config Reporter: jonathan lacefield Assignee: Brandon Williams Priority: Minor Fix For: 1.2.16, 2.0.6, 2.1 beta2 Newer versions of Oracle's Hotspot JVM , post 6u43 (maybe earlier) and 7u25 (maybe earlier), are experiencing issues with GC and JMX where heap slowly fills up overtime until OOM or a full GC event occurs, specifically when CMS is leveraged. Adding: {noformat} JVM_OPTS=$JVM_OPTS -XX:+CMSClassUnloadingEnabled {noformat} The the options in cassandra-env.sh alleviates the problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6918) Compaction Assert: Incorrect Row Data Size
[ https://issues.apache.org/jira/browse/CASSANDRA-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945606#comment-13945606 ] Aleksey Yeschenko commented on CASSANDRA-6918: -- [~jbellis] I could be reading it wrong, but it seems like their issue is with the `sorted_table` table, and that one is counter-less. Compaction Assert: Incorrect Row Data Size -- Key: CASSANDRA-6918 URL: https://issues.apache.org/jira/browse/CASSANDRA-6918 Project: Cassandra Issue Type: Bug Components: Core Environment: 11 node Linux Cassandra 1.2.15 cluster, each node configured as follows: 2P IntelXeon CPU X5660 @ 2.8 GHz (12 cores, 24 threads total) 148 GB RAM CentOS release 6.4 (Final) 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) Node configuration: Default cassandra.yaml settings for the most part with the following exceptions: rpc_server_type: hsha Reporter: Alexander Goodrich Fix For: 1.2.16 I have four tables in a schema with Replication Factor: 6 (previously we set this to 3, but when we added more nodes we figured adding more replication to improve read time would help, this might have aggravated the issue). create table table_value_one ( id timeuuid PRIMARY KEY, value_1 counter ); create table table_value_two ( id timeuuid PRIMARY KEY, value_2 counter ); create table table_position_lookup ( value_1 bigint, value_2 bigint, id timeuuid, PRIMARY KEY (id) ) WITH compaction={'class': 'LeveledCompactionStrategy'}; create table sorted_table ( row_key_index text, range bigint, sorted_value bigint, id timeuuid, extra_data listbigint, PRIMARY KEY ((row_key_index, range), sorted_value, id) ) WITH CLUSTERING ORDER BY (sorted_value DESC) AND compaction={'class': 'LeveledCompactionStrategy'}; The application creates an object, and stores it in sorted_table based on a value position - for example, an object has a value_1 of 5500, and a value_2 of 4300. There are rows which represent indices by which I can sort items based on these values in descending order. If I wish to see items with the highest # of value_1, I can create an index that stores them like so: row_key_index = 'highest_value_1s' Additionally, we shard each row by bucket ranges - which is simply the value_1 or value_2 / 1000. For example, our object above would be found in row_key_index = 'highest_value_1s' and range 5000, and also in row_key_index = 'highest_value_2s' with range 4300. The true values of this object are stored in two counter tables, table_value_one and table_value_two. The current indexed position is stored in table_position_lookup. We allow the application to modify value_one and value_two in the counter table indiscriminately. If we know the current values for these are dirty, we wait a tuned amount of time before we update the position in the sorted_table index. This creates 2 delete operations, and 2 write operations on the same table. The issue is when we expand the number of write/delete operations on sorted_table, we see the following assert in the system log: ERROR [CompactionExecutor:169] 2014-03-24 08:07:12,871 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:169,1,main] java.lang.AssertionError: incorrect row data size 77705872 written to /var/lib/cassandra/data/loadtest_1/sorted_table/loadtest_1-sorted_table-tmp-ic-165-Data.db; correct is 77800512 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Each object
[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed
[ https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945607#comment-13945607 ] Pavel Yaskevich commented on CASSANDRA-6746: bq. In practice, moving the WILLNEED into the getSegment() call is dangerous as the segment is used past the initial 64Kb, and if we rely on ourselves only for read-ahead this could result in very substandard performance for larger rows. We also probably want to only WILLNEED the actual size of the buffer we expect to read for compressed files. Yes, this is only PoC to see if the scheme works for platters. Just a couple of things, for the optimal performance we need an information from the index about the size of the row, so we can mark SEQUENTIAL a). whole row if the row is less then indexing threshold, b). portions of the row on the index boundaries. Original 1 page WILLNEED (very conservative) is used to make sure that read can quickly grab the first portion of the buffer while extended read-ahead prefetches everything else. This still works for the big rows because we are forced to read the header of the row first (key at least) and then when we seek() to the position indicated by column index and we want to hint that we are going to read for the portion of the row, so large rows are suffering more from the fact that we have to over-buffer then WILLNEED. I wish we could have useful mmap'ed buffer implementation, so madvice as such as we do fadvice would no longer be required... There is a way to solve cold cache problem from the parts of the data from original SSTables that have been read before, I did some work with mincore() previously and can revisit if needed. The problem we are trying to solve with dropping the cache for memtable and compacted SSTables (in memory restricted and/or slow I/O systems) is keeping page cache for the old files creates more jitter and slows down warmup of the newly created SSTable. Reads have a slow ramp up in speed -- Key: CASSANDRA-6746 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746 Project: Cassandra Issue Type: Bug Components: Core Reporter: Ryan McGuire Assignee: Benedict Labels: performance Fix For: 2.1 beta2 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 6746-patched.png, 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, cassandra-2.1-bdplab-trial-fincore.tar.bz2 On a physical four node cluister I am doing a big write and then a big read. The read takes a long time to ramp up to respectable speeds. !2.1_vs_2.0_read.png! [See data here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6918) Compaction Assert: Incorrect Row Data Size
[ https://issues.apache.org/jira/browse/CASSANDRA-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945623#comment-13945623 ] Jonathan Ellis commented on CASSANDRA-6918: --- [~agoodrich] [~redpriest] does it log Compacting large row before the exception? Compaction Assert: Incorrect Row Data Size -- Key: CASSANDRA-6918 URL: https://issues.apache.org/jira/browse/CASSANDRA-6918 Project: Cassandra Issue Type: Bug Components: Core Environment: 11 node Linux Cassandra 1.2.15 cluster, each node configured as follows: 2P IntelXeon CPU X5660 @ 2.8 GHz (12 cores, 24 threads total) 148 GB RAM CentOS release 6.4 (Final) 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) Node configuration: Default cassandra.yaml settings for the most part with the following exceptions: rpc_server_type: hsha Reporter: Alexander Goodrich Fix For: 1.2.16 I have four tables in a schema with Replication Factor: 6 (previously we set this to 3, but when we added more nodes we figured adding more replication to improve read time would help, this might have aggravated the issue). create table table_value_one ( id timeuuid PRIMARY KEY, value_1 counter ); create table table_value_two ( id timeuuid PRIMARY KEY, value_2 counter ); create table table_position_lookup ( value_1 bigint, value_2 bigint, id timeuuid, PRIMARY KEY (id) ) WITH compaction={'class': 'LeveledCompactionStrategy'}; create table sorted_table ( row_key_index text, range bigint, sorted_value bigint, id timeuuid, extra_data listbigint, PRIMARY KEY ((row_key_index, range), sorted_value, id) ) WITH CLUSTERING ORDER BY (sorted_value DESC) AND compaction={'class': 'LeveledCompactionStrategy'}; The application creates an object, and stores it in sorted_table based on a value position - for example, an object has a value_1 of 5500, and a value_2 of 4300. There are rows which represent indices by which I can sort items based on these values in descending order. If I wish to see items with the highest # of value_1, I can create an index that stores them like so: row_key_index = 'highest_value_1s' Additionally, we shard each row by bucket ranges - which is simply the value_1 or value_2 / 1000. For example, our object above would be found in row_key_index = 'highest_value_1s' and range 5000, and also in row_key_index = 'highest_value_2s' with range 4300. The true values of this object are stored in two counter tables, table_value_one and table_value_two. The current indexed position is stored in table_position_lookup. We allow the application to modify value_one and value_two in the counter table indiscriminately. If we know the current values for these are dirty, we wait a tuned amount of time before we update the position in the sorted_table index. This creates 2 delete operations, and 2 write operations on the same table. The issue is when we expand the number of write/delete operations on sorted_table, we see the following assert in the system log: ERROR [CompactionExecutor:169] 2014-03-24 08:07:12,871 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:169,1,main] java.lang.AssertionError: incorrect row data size 77705872 written to /var/lib/cassandra/data/loadtest_1/sorted_table/loadtest_1-sorted_table-tmp-ic-165-Data.db; correct is 77800512 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Each object creates approximately ~500 unique row keys in sorted_table,
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945631#comment-13945631 ] Benedict commented on CASSANDRA-6696: - Last thoughts for the day: only major downside to this approach is that we are now guaranteeing no better than single disk performance for all operations on a given partition. So if there are particularly large and fragmented partitions, they could see read performance decline notably. One possible solution to this would be split by clustering part (if any), instead of partition key, but determine the clustering part range split as a function of the partition hash, so that the distribution of data as a whole is still random (i.e. each partition has a different clustering distribution across the disks). This would make the initial flush more complex, and might require more merging on reads, but compaction could still be easily constrained to one disk. This is just a poorly formed thought I'm throwing out there for consideration, and possibly outside of scope for this ticket. Either way, I'm not certain that splitting ranges based on disk size is such a great idea. As a follow on ticket it might be sensible to permit two category of disks: archive for slow and cold data, and live disks for faster data. Splitting by capacity seems likely to create undesirable performance characteristics, as two similarly performant disks with different capacities would lead to worse performance for the data residing on the larger disks. On the whole I'm +1 this change anyway, the more I think about it. I had been vaguely considering something along these lines to optimise flush performance, but it seems we can get this for free along with improving correctness, which is great. Drive replacement in JBOD can cause data to reappear. -- Key: CASSANDRA-6696 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new empty one and repair is run. This can cause deleted data to come back in some cases. Also this is true for corrupt stables in which we delete the corrupt stable and run repair. Here is an example: Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. row=sankalp col=sankalp is written 20 days back and successfully went to all three nodes. Then a delete/tombstone was written successfully for the same row column 15 days back. Since this tombstone is more than gc grace, it got compacted in Nodes A and B since it got compacted with the actual data. So there is no trace of this row column in node A and B. Now in node C, say the original data is in drive1 and tombstone is in drive2. Compaction has not yet reclaimed the data and tombstone. Drive2 becomes corrupt and was replaced with new empty drive. Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp has come back to life. Now after replacing the drive we run repair. This data will be propagated to all nodes. Note: This is still a problem even if we run repair every gc grace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6918) Compaction Assert: Incorrect Row Data Size
[ https://issues.apache.org/jira/browse/CASSANDRA-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945640#comment-13945640 ] Alexander Goodrich commented on CASSANDRA-6918: --- Yes, this is a counter-less table that the exceptions occur on - [~jbellis] It depends on the node - here's an exception on node #2 in my cluster - I've seen it happen without (seemingly) a corresponding compaction large row. Here's an example where there is one directly above it: INFO [CompactionExecutor:144] 2014-03-24 07:50:33,240 CompactionController.java (line 156) Compacting large row loadtest_1/sorted_table:category1_globallist_item_4:0 (67157460 bytes) incrementally ERROR [CompactionExecutor:144] 2014-03-24 07:50:42,471 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:144,1,main] java.lang.AssertionError: incorrect row data size 67156948 written to /var/lib/cassandra/data/loadtest_1/sorted_table/loadtest_1-sorted_table-tmp-ic-77-Data.db; correct is 67239030 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Compaction Assert: Incorrect Row Data Size -- Key: CASSANDRA-6918 URL: https://issues.apache.org/jira/browse/CASSANDRA-6918 Project: Cassandra Issue Type: Bug Components: Core Environment: 11 node Linux Cassandra 1.2.15 cluster, each node configured as follows: 2P IntelXeon CPU X5660 @ 2.8 GHz (12 cores, 24 threads total) 148 GB RAM CentOS release 6.4 (Final) 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) Node configuration: Default cassandra.yaml settings for the most part with the following exceptions: rpc_server_type: hsha Reporter: Alexander Goodrich Fix For: 1.2.16 I have four tables in a schema with Replication Factor: 6 (previously we set this to 3, but when we added more nodes we figured adding more replication to improve read time would help, this might have aggravated the issue). create table table_value_one ( id timeuuid PRIMARY KEY, value_1 counter ); create table table_value_two ( id timeuuid PRIMARY KEY, value_2 counter ); create table table_position_lookup ( value_1 bigint, value_2 bigint, id timeuuid, PRIMARY KEY (id) ) WITH compaction={'class': 'LeveledCompactionStrategy'}; create table sorted_table ( row_key_index text, range bigint, sorted_value bigint, id timeuuid, extra_data listbigint, PRIMARY KEY ((row_key_index, range), sorted_value, id) ) WITH CLUSTERING ORDER BY (sorted_value DESC) AND compaction={'class': 'LeveledCompactionStrategy'}; The application creates an object, and stores it in sorted_table based on a value position - for example, an object has a value_1 of 5500, and a value_2 of 4300. There are rows which represent indices by which I can sort items based on these values in descending order. If I wish to see items with the highest # of value_1, I can create an index that stores them like so: row_key_index = 'highest_value_1s' Additionally, we shard each row by bucket ranges - which is simply the value_1 or value_2 / 1000. For example, our object above would be found in row_key_index = 'highest_value_1s' and range 5000, and also in row_key_index = 'highest_value_2s' with range 4300. The true values of this object are stored in two counter tables, table_value_one and table_value_two. The current indexed position is stored in table_position_lookup. We allow the application to modify value_one and value_two in the counter table indiscriminately. If we know the current values for these
[jira] [Created] (CASSANDRA-6920) LatencyMetrics can return infinity
Nick Bailey created CASSANDRA-6920: -- Summary: LatencyMetrics can return infinity Key: CASSANDRA-6920 URL: https://issues.apache.org/jira/browse/CASSANDRA-6920 Project: Cassandra Issue Type: Bug Reporter: Nick Bailey There is a race condition when updating the recentLatency metrics exposed from LatencyMetrics. Attaching a patch with a test that exposes the issue and a potential fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6357) Flush memtables to separate directory
[ https://issues.apache.org/jira/browse/CASSANDRA-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945705#comment-13945705 ] dan jatnieks commented on CASSANDRA-6357: - I can also go back and re-run 2.0 w/patch on the same machine used for 2.1 ... but that patch isn't quite the same as the 2.1 code either... Flush memtables to separate directory - Key: CASSANDRA-6357 URL: https://issues.apache.org/jira/browse/CASSANDRA-6357 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Patrick McFadin Assignee: Jonathan Ellis Priority: Minor Labels: performance Fix For: 2.1 beta1 Attachments: 6357-v2.txt, 6357.txt, c6357-2.1-stress-write-adj-ops-sec.png, c6357-2.1-stress-write-latency-99th.png, c6357-2.1-stress-write-latency-median.png, c6357-stress-write-latency-99th-1.png Flush writers are a critical element for keeping a node healthy. When several compactions run on systems with low performing data directories, IO becomes a premium. Once the disk subsystem is saturated, write IO is blocked which will cause flush writer threads to backup. Since memtables are large blocks of memory in the JVM, too much blocking can cause excessive GC over time degrading performance. In the worst case causing an OOM. Since compaction is running on the data directories. My proposal is to create a separate directory for flushing memtables. Potentially we can use the same methodology of keeping the commit log separate and minimize disk contention against the critical function of the flushwriter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6920) LatencyMetrics can return infinity
[ https://issues.apache.org/jira/browse/CASSANDRA-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-6920: --- Attachment: 6920-infinity-metrics.patch LatencyMetrics can return infinity --- Key: CASSANDRA-6920 URL: https://issues.apache.org/jira/browse/CASSANDRA-6920 Project: Cassandra Issue Type: Bug Reporter: Nick Bailey Attachments: 6920-infinity-metrics.patch There is a race condition when updating the recentLatency metrics exposed from LatencyMetrics. Attaching a patch with a test that exposes the issue and a potential fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6357) Flush memtables to separate directory
[ https://issues.apache.org/jira/browse/CASSANDRA-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945705#comment-13945705 ] dan jatnieks edited comment on CASSANDRA-6357 at 3/24/14 9:19 PM: -- I can also go back and re-run 2.0 w/patch on the same machine used for 2.1 ... but that patch isn't quite the same as the 2.1 code either... I was wondering if changes to flushing and/or compaction in 2.1 already lessen the contention that was present in 2.0? was (Author: djatnieks): I can also go back and re-run 2.0 w/patch on the same machine used for 2.1 ... but that patch isn't quite the same as the 2.1 code either... Flush memtables to separate directory - Key: CASSANDRA-6357 URL: https://issues.apache.org/jira/browse/CASSANDRA-6357 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Patrick McFadin Assignee: Jonathan Ellis Priority: Minor Labels: performance Fix For: 2.1 beta1 Attachments: 6357-v2.txt, 6357.txt, c6357-2.1-stress-write-adj-ops-sec.png, c6357-2.1-stress-write-latency-99th.png, c6357-2.1-stress-write-latency-median.png, c6357-stress-write-latency-99th-1.png Flush writers are a critical element for keeping a node healthy. When several compactions run on systems with low performing data directories, IO becomes a premium. Once the disk subsystem is saturated, write IO is blocked which will cause flush writer threads to backup. Since memtables are large blocks of memory in the JVM, too much blocking can cause excessive GC over time degrading performance. In the worst case causing an OOM. Since compaction is running on the data directories. My proposal is to create a separate directory for flushing memtables. Potentially we can use the same methodology of keeping the commit log separate and minimize disk contention against the critical function of the flushwriter. -- This message was sent by Atlassian JIRA (v6.2#6252)