[jira] [Assigned] (CASSANDRA-2717) duplicate rows returned from SELECT where KEY term is duplicated
[ https://issues.apache.org/jira/browse/CASSANDRA-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ancona reassigned CASSANDRA-2717: - Assignee: Jim Ancona duplicate rows returned from SELECT where KEY term is duplicated Key: CASSANDRA-2717 URL: https://issues.apache.org/jira/browse/CASSANDRA-2717 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.0 beta 2 Reporter: Aaron Morton Assignee: Jim Ancona Priority: Minor Labels: cql, lhf Attachments: v1-0001-CASSANDRA-2717-Prevent-multiple-KEY-terms-properly-han.txt Noticed while working on CASSANDRA-2268 when random keys generated during a mutli_get test contain duplicate keys. The thrift multiget_slice() returns only the unique rows because of the map generated for the result. CQL will return a row for each KEY term in the SELECT. I could make QueryProcessor.getSlice() only create commands for the unique keys if we wanted to. Not sure it's a bug and it's definitely not something that should come up to often, reporting it because it's different to the thrift mutli_get operation. Happy to close if it's by design. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2717) duplicate rows returned from SELECT where KEY term is duplicated
[ https://issues.apache.org/jira/browse/CASSANDRA-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ancona updated CASSANDRA-2717: -- Attachment: v1-0001-CASSANDRA-2717-Prevent-multiple-KEY-terms-properly-han.txt duplicate rows returned from SELECT where KEY term is duplicated Key: CASSANDRA-2717 URL: https://issues.apache.org/jira/browse/CASSANDRA-2717 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.0 beta 2 Reporter: Aaron Morton Priority: Minor Labels: cql, lhf Attachments: v1-0001-CASSANDRA-2717-Prevent-multiple-KEY-terms-properly-han.txt Noticed while working on CASSANDRA-2268 when random keys generated during a mutli_get test contain duplicate keys. The thrift multiget_slice() returns only the unique rows because of the map generated for the result. CQL will return a row for each KEY term in the SELECT. I could make QueryProcessor.getSlice() only create commands for the unique keys if we wanted to. Not sure it's a bug and it's definitely not something that should come up to often, reporting it because it's different to the thrift mutli_get operation. Happy to close if it's by design. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2875) Increase index_interval and reopen sstables on low heap situations
Increase index_interval and reopen sstables on low heap situations -- Key: CASSANDRA-2875 URL: https://issues.apache.org/jira/browse/CASSANDRA-2875 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.1 Reporter: Héctor Izquierdo One of the reasons that can cause an OOM is key indexes. Of course you can tune it, but that's after your node has crashed. Events like repair can cause a much bigger memory pressure than expected on normal operation. As part of the measures taken when heap is almost full it would be good if key indexes could be shrank. I don't know how indexes are stored in memory but I guess it would be possible to remove entries without rereading all sstables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2717) duplicate rows returned from SELECT where KEY term is duplicated
[ https://issues.apache.org/jira/browse/CASSANDRA-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062376#comment-13062376 ] Jim Ancona commented on CASSANDRA-2717: --- I added some tests to test/system/test_cql.py. I found that not only did WHERE KEY = 'bar' and KEY = 'bar' return two rows, so did WHERE KEY = 'bar' and KEY = 'baz' and WHERE KEY IN ('bar', 'bar') The attached patch makes having more than one KEY = clause be an error, and changes the List of keys in WhereClause to a Set. Jonathan mentioned that OR support was added, but I didn't see that in cassandra-0.8. Am I looking at the wrong branch? If so, this patch will have to be reworked, along with the logic in WhereClause. duplicate rows returned from SELECT where KEY term is duplicated Key: CASSANDRA-2717 URL: https://issues.apache.org/jira/browse/CASSANDRA-2717 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.0 beta 2 Reporter: Aaron Morton Assignee: Jim Ancona Priority: Minor Labels: cql, lhf Attachments: v1-0001-CASSANDRA-2717-Prevent-multiple-KEY-terms-properly-han.txt Noticed while working on CASSANDRA-2268 when random keys generated during a mutli_get test contain duplicate keys. The thrift multiget_slice() returns only the unique rows because of the map generated for the result. CQL will return a row for each KEY term in the SELECT. I could make QueryProcessor.getSlice() only create commands for the unique keys if we wanted to. Not sure it's a bug and it's definitely not something that should come up to often, reporting it because it's different to the thrift mutli_get operation. Happy to close if it's by design. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062379#comment-13062379 ] Chris Burroughs commented on CASSANDRA-47: -- .bq Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress -n 100 -S 1024 -V, where -V option generates average size values and different cardinality from 50 (default) to 250). This seems like an unrealistically good compression ratio. If I gzip a real world SSTable that has redundant data that should be ripe for compression I only see 641M--217M. What's the gzip compression ratio with the SSTables that stress.java workload generates? Stu, could you post your custom YCSB workload from CASSANDRA-674 for comparison? SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062381#comment-13062381 ] Pavel Yaskevich commented on CASSANDRA-47: -- bq. This seems like an unrealistically good compression ratio. If I gzip a real world SSTable that has redundant data that should be ripe for compression I only see 641M--217M. What's the gzip compression ratio with the SSTables that stress.java workload generates? You can easily test it yourself: for example ./bin/stress -S 1024 -n 100 -C 250 -V wait for compactions to finish and check block size of the resulting files (using ls -lahs), I see 3.8GB compressed into 781MB in my tests. internal_op_rate with the current trunk code is around 450-500 but with current patch it is about 2800-3000 on Quad-Core AMD Opteron(tm) Processor 2374 HE 4229730MHz on each core, 2GB mem (rackspace instance). cardinality of 250 is 5 times bigger that default + average size values using -V option. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2875) Increase index_interval and reopen sstables on low heap situations
[ https://issues.apache.org/jira/browse/CASSANDRA-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2875. --- Resolution: Won't Fix Sorry, it's too hard to tell when low heap is a transitory condition vs when the JVM is really in trouble. We do our best by cutting row cache and flushing memtables (both relatively low-impact) but reopening sstables would cause more damage than good in a lot of situations. Increase index_interval and reopen sstables on low heap situations -- Key: CASSANDRA-2875 URL: https://issues.apache.org/jira/browse/CASSANDRA-2875 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.1 Reporter: Héctor Izquierdo One of the reasons that can cause an OOM is key indexes. Of course you can tune it, but that's after your node has crashed. Events like repair can cause a much bigger memory pressure than expected on normal operation. As part of the measures taken when heap is almost full it would be good if key indexes could be shrank. I don't know how indexes are stored in memory but I guess it would be possible to remove entries without rereading all sstables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062609#comment-13062609 ] Andrey Stepachev commented on CASSANDRA-47: --- Are there any chances to get this work in 0.8.x? (simply apply of the patch is failed) SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062611#comment-13062611 ] Jonathan Ellis commented on CASSANDRA-47: - My guess is that a backport would be difficult because of the changes already in 1.0 (CASSANDRA-2062, CASSANDRA-1610, etc) that deal with the same parts of the code. Compression will only be officially supported in 1.0. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1144730 - /cassandra/trunk/CHANGES.txt
Author: jbellis Date: Sat Jul 9 20:08:29 2011 New Revision: 1144730 URL: http://svn.apache.org/viewvc?rev=1144730view=rev Log: fix merge conflict Modified: cassandra/trunk/CHANGES.txt Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1144730r1=1144729r2=1144730view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Sat Jul 9 20:08:29 2011 @@ -31,7 +31,6 @@ * fix CLI perpetuating obsolete KsDef.replication_factor (CASSANDRA-2846) * improve cli treatment of multiline comments (CASSANDRA-2852) * handle row tombstones correctly in EchoedRow (CASSANDRA-2786) - .working * add MessagingService.get[Recently]DroppedMessages and StorageService.getExceptionCount (CASSANDRA-2804) * fix possibility of spurious UnavailableException for LOCAL_QUORUM @@ -39,10 +38,6 @@ * add ant-optional as dependence for the debian package (CASSANDRA-2164) * add option to specify limit for get_slice in the CLI (CASSANDRA-2646) * decrease HH page size (CASSANDRA-2832) -=== - * add MessagingService.get[Recently]DroppedMessages and - StorageService.getExceptionCount (CASSANDRA-2804) - .merge-right.r1143437 0.8.1
svn commit: r1144748 - /cassandra/branches/cassandra-0.8/src/resources/org/apache/cassandra/cli/CliHelp.yaml
Author: jbellis Date: Sat Jul 9 21:21:50 2011 New Revision: 1144748 URL: http://svn.apache.org/viewvc?rev=1144748view=rev Log: fix typo for CASSANDRA-2873 Modified: cassandra/branches/cassandra-0.8/src/resources/org/apache/cassandra/cli/CliHelp.yaml Modified: cassandra/branches/cassandra-0.8/src/resources/org/apache/cassandra/cli/CliHelp.yaml URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/resources/org/apache/cassandra/cli/CliHelp.yaml?rev=1144748r1=1144747r2=1144748view=diff == --- cassandra/branches/cassandra-0.8/src/resources/org/apache/cassandra/cli/CliHelp.yaml (original) +++ cassandra/branches/cassandra-0.8/src/resources/org/apache/cassandra/cli/CliHelp.yaml Sat Jul 9 21:21:50 2011 @@ -683,7 +683,7 @@ commands: store the whole values of its rows, so it is extremely space-intensive. It's best to only use the row cache if you have hot rows or static rows. -- keys_cache_save_period: Duration in seconds after which Cassandra should +- key_cache_save_period: Duration in seconds after which Cassandra should safe the keys cache. Caches are saved to saved_caches_directory as specified in conf/Cassandra.yaml. Default is 14400 or 4 hours.
[jira] [Updated] (CASSANDRA-2873) Typo in src/java/org/apache/cassandra/cli/CliClient
[ https://issues.apache.org/jira/browse/CASSANDRA-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2873: -- Priority: Trivial (was: Major) Affects Version/s: (was: 0.8.1) 0.8.0 Fix Version/s: 0.8.2 Assignee: Jonathan Ellis Typo in src/java/org/apache/cassandra/cli/CliClient - Key: CASSANDRA-2873 URL: https://issues.apache.org/jira/browse/CASSANDRA-2873 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.8.0 Environment: ubuntu linux 10.4 Reporter: Michał Bartoszewski Assignee: Jonathan Ellis Priority: Trivial Fix For: 0.8.2 I have read your documentation about syntax for creating column family and parameters that I can pass. According to documentation i can use parameter : - keys_cache_save_period: Duration in seconds after which Cassandra should safe the keys cache. Caches are saved to saved_caches_directory as specified in conf/Cassandra.yaml. Default is 14400 or 4 hours. but then i was receiving error: No enum const class org.apache.cassandra.cli.CliClient$ColumnFamilyArgument.KEYS_CACHE_SAVE_PERIOD In class mentioned in title we have: protected enum ColumnFamilyArgument 115 { 116 COLUMN_TYPE, 117 COMPARATOR, 118 SUBCOMPARATOR, 119 COMMENT, 120 ROWS_CACHED, 121 ROW_CACHE_SAVE_PERIOD, 122 KEYS_CACHED, 123 KEY_CACHE_SAVE_PERIOD, TYPO ! 124 READ_REPAIR_CHANCE, 125 GC_GRACE, 126 COLUMN_METADATA, 127 MEMTABLE_OPERATIONS, 128 MEMTABLE_THROUGHPUT, 129 MEMTABLE_FLUSH_AFTER, 130 DEFAULT_VALIDATION_CLASS, 131 MIN_COMPACTION_THRESHOLD, 132 MAX_COMPACTION_THRESHOLD, 133 REPLICATE_ON_WRITE, 134 ROW_CACHE_PROVIDER, 135 KEY_VALIDATION_CLASS 136 } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2873) Typo in src/java/org/apache/cassandra/cli/CliClient
[ https://issues.apache.org/jira/browse/CASSANDRA-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2873. --- Resolution: Fixed key_cache_save_period is correct. updated CliHelp.yaml to this. Typo in src/java/org/apache/cassandra/cli/CliClient - Key: CASSANDRA-2873 URL: https://issues.apache.org/jira/browse/CASSANDRA-2873 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.8.0 Environment: ubuntu linux 10.4 Reporter: Michał Bartoszewski Assignee: Jonathan Ellis Priority: Trivial Fix For: 0.8.2 I have read your documentation about syntax for creating column family and parameters that I can pass. According to documentation i can use parameter : - keys_cache_save_period: Duration in seconds after which Cassandra should safe the keys cache. Caches are saved to saved_caches_directory as specified in conf/Cassandra.yaml. Default is 14400 or 4 hours. but then i was receiving error: No enum const class org.apache.cassandra.cli.CliClient$ColumnFamilyArgument.KEYS_CACHE_SAVE_PERIOD In class mentioned in title we have: protected enum ColumnFamilyArgument 115 { 116 COLUMN_TYPE, 117 COMPARATOR, 118 SUBCOMPARATOR, 119 COMMENT, 120 ROWS_CACHED, 121 ROW_CACHE_SAVE_PERIOD, 122 KEYS_CACHED, 123 KEY_CACHE_SAVE_PERIOD, TYPO ! 124 READ_REPAIR_CHANCE, 125 GC_GRACE, 126 COLUMN_METADATA, 127 MEMTABLE_OPERATIONS, 128 MEMTABLE_THROUGHPUT, 129 MEMTABLE_FLUSH_AFTER, 130 DEFAULT_VALIDATION_CLASS, 131 MIN_COMPACTION_THRESHOLD, 132 MAX_COMPACTION_THRESHOLD, 133 REPLICATE_ON_WRITE, 134 ROW_CACHE_PROVIDER, 135 KEY_VALIDATION_CLASS 136 } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2498) Improve read performance in update-intensive workload
[ https://issues.apache.org/jira/browse/CASSANDRA-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062636#comment-13062636 ] Jonathan Ellis commented on CASSANDRA-2498: --- bq. CASSANDRA-2319 handles append-only wide row cases with an index lookup But it still has to do the lookup-per-sstable, right? bq. Usecases involving slicing really need range/slice metadata to apply this type of optimization Yes. It would be easy to add an option like that for CfDef here, though. Improve read performance in update-intensive workload - Key: CASSANDRA-2498 URL: https://issues.apache.org/jira/browse/CASSANDRA-2498 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Sylvain Lebresne Priority: Minor Labels: ponies Fix For: 1.0 Read performance in an update-heavy environment relies heavily on compaction to maintain good throughput. (This is not the case for workloads where rows are only inserted once, because the bloom filter keeps us from having to check sstables unnecessarily.) Very early versions of Cassandra attempted to mitigate this by checking sstables in descending generation order (mostly equivalent to descending mtime): once all the requested columns were found, it would not check any older sstables. This was incorrect, because data timestamp will not correspond to sstable timestamp, both because compaction has the side effect of refreshing data to a newer sstable, and because hintead handoff may send us data older than what we already have. Instead, we could create a per-sstable piece of metadata containing the most recent (client-specified) timestamp for any column in the sstable. We could then sort sstables by this timestamp instead, and perform a similar optimization (if the remaining sstable client-timestamps are older than the oldest column found in the desired result set so far, we don't need to look further). Since under almost every workload, client timestamps of data in a given sstable will tend to be similar, we expect this to cut the number of sstables down proportionally to how frequently each column in the row is updated. (If each column is updated with each write, we only have to check a single sstable.) This may also be useful information when deciding which SSTables to compact. (Note that this optimization is only appropriate for named-column queries, not slice queries, since we don't know what non-overlapping columns may exist in older sstables.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062639#comment-13062639 ] Andrey Stepachev commented on CASSANDRA-47: --- It is a pity, that only in 1.0. This is a very desirable feature (especially from the point of view of the former user of hbase). SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2643) read repair/reconciliation breaks slice based iteration at QUORUM
[ https://issues.apache.org/jira/browse/CASSANDRA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2643: -- Fix Version/s: 1.0 Assignee: Sylvain Lebresne read repair/reconciliation breaks slice based iteration at QUORUM - Key: CASSANDRA-2643 URL: https://issues.apache.org/jira/browse/CASSANDRA-2643 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.5 Reporter: Peter Schuller Assignee: Sylvain Lebresne Priority: Critical Fix For: 1.0 Attachments: short_read.sh, slicetest.py In short, I believe iterating over columns is impossible to do reliably with QUORUM due to the way reconciliation works. The problem is that the SliceQueryFilter is executing locally when reading on a node, but no attempts seem to be made to consider limits when doing reconciliation and/or read-repair (RowRepairResolver.resolveSuperset() and ColumnFamily.resolve()). If a node slices and comes up with 100 columns, and another node slices and comes up with 100 columns, some of which are unique to each side, reconciliation results in 100 columns in the result set. In this case the effect is limited to client gets more than asked for, but the columns still accurately represent the range. This is easily triggered by my test-case. In addition to the client receiving too many columns, I believe some of them will not be satisfying the QUORUM consistency level for the same reasons as with deletions (see discussion below). Now, there *should* be a problem for tombstones as well, but it's more subtle. Suppose A has: 1 2 3 4 5 6 and B has: 1 del 2 del 3 del 4 5 6 If you now slice 1-6 with count=3 the tombstones from B will reconcile with those from A - fine. So you end up getting 1,5,6 back. This made it a bit difficult to trigger in a test case until I realized what was going on. At first I was hoping to see a short iteration result, which would mean that the process of iterating until you get a short result will cause spurious end of columns and thus make it impossible to iterate correctly. So; due to 5-6 existing (and if they didn't, you legitimately reached end-of-columns) we do indeed get a result of size 3 which contains 1,5 and 6. However, only node B would have contributed columns 5 and 6; so there is actually no QUORUM consistency on the co-ordinating node with respect to these columns. If node A and C also had 5 and 6, they would not have been considered. Am I wrong? In any case; using script I'm about to attach, you can trigger the over-delivery case very easily: (0) disable hinted hand-off to avoid that interacting with the test (1) start three nodes (2) create ks 'test' with rf=3 and cf 'slicetest' (3) ./slicetest.py hostname_of_node_C insert # let it run for a few seconds, then ctrl-c (4) stop node A (5) ./slicetest.py hostname_of_node_C insert # let it run for a few seconds, then ctrl-c (6) start node A, wait for B and C to consider it up (7) ./slicetest.py hostname_of_node_A slice # make A co-ordinator though it doesn't necessarily matter You can also pass 'delete' (random deletion of 50% of contents) or 'deleterange' (delete all in [0.2,0.8]) to slicetest, but you don't trigger a short read by doing that (see discussion above). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation
[ https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062638#comment-13062638 ] Jonathan Ellis commented on CASSANDRA-2864: --- bq. I was more thinking of replacing the old row cache That does make more sense than having both, but it's not clear to me that a new container that has some properties of both memtable and sstable, is better than building something out of those primitives. Taking that (2498) approach, you get all the benefits of the sstable infrastructure (persistence, stat tracking, even streaming to new nodes) for free, as well as playing nicely with the OS's page cache instead of being a separate memory area. bq. implementing a variation of CASSANDRA-1956 will be pretty easy since we can work with the standard filters now True, but you could do the same kind of IColumnIterator for the existing cache api just as easily, no? bq. it seems that they dont help for slicing Not without extra metadata, no. But I'm okay with adding that. Alternative Row Cache Implementation Key: CASSANDRA-2864 URL: https://issues.apache.org/jira/browse/CASSANDRA-2864 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.1 Reporter: Daniel Doubleday Priority: Minor Attachments: rowcache.patch we have been working on an alternative implementation to the existing row cache(s) We have 2 main goals: - Decrease memory - get more rows in the cache without suffering a huge performance penalty - Reduce gc pressure This sounds a lot like we should be using the new serializing cache in 0.8. Unfortunately our workload consists of loads of updates which would invalidate the cache all the time. The second unfortunate thing is that the idea we came up with doesn't fit the new cache provider api... It looks like this: Like the serializing cache we basically only cache the serialized byte buffer. we don't serialize the bloom filter and try to do some other minor compression tricks (var ints etc not done yet). The main difference is that we don't deserialize but use the normal sstable iterators and filters as in the regular uncached case. So the read path looks like this: return filter.collectCollatedColumns(memtable iter, cached row iter) The write path is not affected. It does not update the cache During flush we merge all memtable updates with the cached rows. The attached patch is based on 0.8 branch r1143352 It does not replace the existing row cache but sits aside it. Theres environment switch to choose the implementation. This way it is easy to benchmark performance differences. -DuseSSTableCache=true enables the alternative cache. It shares its configuration with the standard row cache. So the cache capacity is shared. We have duplicated a fair amount of code. First we actually refactored the existing sstable filter / reader but than decided to minimize dependencies. Also this way it is easy to customize serialization for in memory sstable rows. We have also experimented a little with compression but since this task at this stage is mainly to kick off discussion we wanted to keep things simple. But there is certainly room for optimizations. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1144754 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/cli/CliClient.java
Author: jbellis Date: Sat Jul 9 21:42:26 2011 New Revision: 1144754 URL: http://svn.apache.org/viewvc?rev=1144754view=rev Log: reset cli keyspace after dropping the current one patch by Joe Stein; reviewed by jbellis for CASSANDRA-2763 Modified: cassandra/branches/cassandra-0.8/CHANGES.txt cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java Modified: cassandra/branches/cassandra-0.8/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1144754r1=1144753r2=1144754view=diff == --- cassandra/branches/cassandra-0.8/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.8/CHANGES.txt Sat Jul 9 21:42:26 2011 @@ -23,6 +23,7 @@ * add ant-optional as dependence for the debian package (CASSANDRA-2164) * add option to specify limit for get_slice in the CLI (CASSANDRA-2646) * decrease HH page size (CASSANDRA-2832) + * reset cli keyspace after dropping the current one (CASSANDRA-2763) 0.8.1 Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java?rev=1144754r1=1144753r2=1144754view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java Sat Jul 9 21:42:26 2011 @@ -1249,6 +1249,9 @@ public class CliClient String version = thriftClient.system_drop_keyspace(keyspaceName); sessionState.out.println(version); validateSchemaIsSettled(version); + +if (keyspaceName.equals(keySpace)) //we just deleted the keyspace we were authenticated too +keySpace = null; } /**
[jira] [Commented] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062641#comment-13062641 ] Jonathan Ellis commented on CASSANDRA-47: - I agree, but 1.0 will be out in October. This ticket is over two years old, it can wait another couple months. :) SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2763) When dropping a keyspace you're currently authenticated to, might be nice to de-authenticate upon completion
[ https://issues.apache.org/jira/browse/CASSANDRA-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2763. --- Resolution: Fixed Fix Version/s: 0.8.2 Reviewer: jbellis Assignee: Joe Stein Looks good to me. Thanks Joe! committed. When dropping a keyspace you're currently authenticated to, might be nice to de-authenticate upon completion Key: CASSANDRA-2763 URL: https://issues.apache.org/jira/browse/CASSANDRA-2763 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jeremy Hanna Assignee: Joe Stein Priority: Trivial Labels: cli, lhf Fix For: 0.8.2 Attachments: 2763.txt I found that when I'm authenticated to MyKeyspace, then do 'drop keyspace MyKeyspace;', I'm still authenticated to it. It's trivial I know, but seems reasonable to unauthenticate from it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062644#comment-13062644 ] Andrey Stepachev commented on CASSANDRA-47: --- I agree, but I came from hbase, where this feature was implemented long time ago. So it was very surprising, that cassandra doesn't has such feature. But with such timeframe (like October) it is really not a very big concern, so I'll wait. In any case, thanks for your work :). SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2876) JDBC 1.1 Roadmap of Enhancements
JDBC 1.1 Roadmap of Enhancements Key: CASSANDRA-2876 URL: https://issues.apache.org/jira/browse/CASSANDRA-2876 Project: Cassandra Issue Type: Improvement Components: Drivers Affects Versions: 0.8.1 Reporter: Rick Shaw Priority: Minor Fix For: 1.0 Organizational ticket to tie together the proposed improvements to Cassandra's JDBC driver in order to coincide with the 1.0 release of the server-side product in the fall of 2011. The target list of improvements (in no particular order for the moment) are as follows: # Complete the {{PreparedStatement}} functionality by implementing true server side variable binding against pre-compiled CQL references. # Provide simple {{DataSource}} Support. # Provide a full {{PooledDataSource}} implementation that integrates C* with App Servers and POJO Frameworks (like Spring). # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to complete the planed datatype support for {{PreparedStatement}} and {{ResultSet}}. # Enhance the {{Driver}} features to support automatic error recovery and reconnection. # Allow bi-directional row access scrolling to complete functionality in the {{ResultSet}}. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2873) Typo in src/java/org/apache/cassandra/cli/CliClient
[ https://issues.apache.org/jira/browse/CASSANDRA-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062649#comment-13062649 ] Hudson commented on CASSANDRA-2873: --- Integrated in Cassandra-0.8 #212 (See [https://builds.apache.org/job/Cassandra-0.8/212/]) fix typo for CASSANDRA-2873 fix typo for CASSANDRA-2873 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1144749 Files : * /cassandra/branches/cassandra-0.8/src/resources/org/apache/cassandra/cli/CliHelp.yaml jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1144748 Files : * /cassandra/branches/cassandra-0.8/src/resources/org/apache/cassandra/cli/CliHelp.yaml Typo in src/java/org/apache/cassandra/cli/CliClient - Key: CASSANDRA-2873 URL: https://issues.apache.org/jira/browse/CASSANDRA-2873 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.8.0 Environment: ubuntu linux 10.4 Reporter: Michał Bartoszewski Assignee: Jonathan Ellis Priority: Trivial Fix For: 0.8.2 I have read your documentation about syntax for creating column family and parameters that I can pass. According to documentation i can use parameter : - keys_cache_save_period: Duration in seconds after which Cassandra should safe the keys cache. Caches are saved to saved_caches_directory as specified in conf/Cassandra.yaml. Default is 14400 or 4 hours. but then i was receiving error: No enum const class org.apache.cassandra.cli.CliClient$ColumnFamilyArgument.KEYS_CACHE_SAVE_PERIOD In class mentioned in title we have: protected enum ColumnFamilyArgument 115 { 116 COLUMN_TYPE, 117 COMPARATOR, 118 SUBCOMPARATOR, 119 COMMENT, 120 ROWS_CACHED, 121 ROW_CACHE_SAVE_PERIOD, 122 KEYS_CACHED, 123 KEY_CACHE_SAVE_PERIOD, TYPO ! 124 READ_REPAIR_CHANCE, 125 GC_GRACE, 126 COLUMN_METADATA, 127 MEMTABLE_OPERATIONS, 128 MEMTABLE_THROUGHPUT, 129 MEMTABLE_FLUSH_AFTER, 130 DEFAULT_VALIDATION_CLASS, 131 MIN_COMPACTION_THRESHOLD, 132 MAX_COMPACTION_THRESHOLD, 133 REPLICATE_ON_WRITE, 134 ROW_CACHE_PROVIDER, 135 KEY_VALIDATION_CLASS 136 } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2763) When dropping a keyspace you're currently authenticated to, might be nice to de-authenticate upon completion
[ https://issues.apache.org/jira/browse/CASSANDRA-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062648#comment-13062648 ] Hudson commented on CASSANDRA-2763: --- Integrated in Cassandra-0.8 #212 (See [https://builds.apache.org/job/Cassandra-0.8/212/]) reset cli keyspace after dropping the current one patch by Joe Stein; reviewed by jbellis for CASSANDRA-2763 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1144754 Files : * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java * /cassandra/branches/cassandra-0.8/CHANGES.txt When dropping a keyspace you're currently authenticated to, might be nice to de-authenticate upon completion Key: CASSANDRA-2763 URL: https://issues.apache.org/jira/browse/CASSANDRA-2763 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jeremy Hanna Assignee: Joe Stein Priority: Trivial Labels: cli, lhf Fix For: 0.8.2 Attachments: 2763.txt I found that when I'm authenticated to MyKeyspace, then do 'drop keyspace MyKeyspace;', I'm still authenticated to it. It's trivial I know, but seems reasonable to unauthenticate from it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2829) always flush memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062653#comment-13062653 ] Jonathan Ellis commented on CASSANDRA-2829: --- Good detective work finding this! I'm not sure about the proposed fix, though -- I think this reasoning still applies: {noformat} // we can't just mark the segment where the flush happened clean, // since there may have been writes to it between when the flush // started and when it finished. {noformat} ... the memtable may have been clean when the flush started, but we don't block writes until flush finishes, so some may have finished in between (so the CL may have writes for this segment now). (I don't have a better fix yet, this is a tough one.) always flush memtables -- Key: CASSANDRA-2829 URL: https://issues.apache.org/jira/browse/CASSANDRA-2829 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.6 Reporter: Aaron Morton Assignee: Aaron Morton Priority: Minor Attachments: 0001-2829-unit-test.patch, 0002-2829.patch Only dirty Memtables are flushed, and so only dirty memtables are used to discard obsolete commit log segments. This can result it log segments not been deleted even though the data has been flushed. Was using a 3 node 0.7.6-2 AWS cluster (DataStax AMI's) with pre 0.7 data loaded and a running application working against the cluster. Did a rolling restart and then kicked off a repair, one node filled up the commit log volume with 7GB+ of log data, there was about 20 hours of log files. {noformat} $ sudo ls -lah commitlog/ total 6.9G drwx-- 2 cassandra cassandra 12K 2011-06-24 20:38 . drwxr-xr-x 3 cassandra cassandra 4.0K 2011-06-25 01:47 .. -rw--- 1 cassandra cassandra 129M 2011-06-24 01:08 CommitLog-1308876643288.log -rw--- 1 cassandra cassandra 28 2011-06-24 20:47 CommitLog-1308876643288.log.header -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 01:36 CommitLog-1308877711517.log -rw-r--r-- 1 cassandra cassandra 28 2011-06-24 20:47 CommitLog-1308877711517.log.header -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 02:20 CommitLog-1308879395824.log -rw-r--r-- 1 cassandra cassandra 28 2011-06-24 20:47 CommitLog-1308879395824.log.header ... -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 20:38 CommitLog-1308946745380.log -rw-r--r-- 1 cassandra cassandra 36 2011-06-24 20:47 CommitLog-1308946745380.log.header -rw-r--r-- 1 cassandra cassandra 112M 2011-06-24 20:54 CommitLog-1308947888397.log -rw-r--r-- 1 cassandra cassandra 44 2011-06-24 20:47 CommitLog-1308947888397.log.header {noformat} The user KS has 2 CF's with 60 minute flush times. System KS had the default settings which is 24 hours. Will create another ticket see if these can be reduced or if it's something users should do, in this case it would not have mattered. I grabbed the log headers and used the tool in CASSANDRA-2828 and most of the segments had the system CF's marked as dirty. {noformat} $ bin/logtool dirty /tmp/logs/commitlog/ Not connected to a server, Keyspace and Column Family names are not available. /tmp/logs/commitlog/CommitLog-1308876643288.log.header Keyspace Unknown: Cf id 0: 444 /tmp/logs/commitlog/CommitLog-1308877711517.log.header Keyspace Unknown: Cf id 1: 68848763 ... /tmp/logs/commitlog/CommitLog-1308944451460.log.header Keyspace Unknown: Cf id 1: 61074 /tmp/logs/commitlog/CommitLog-1308945597471.log.header Keyspace Unknown: Cf id 1000: 43175492 Cf id 1: 108483 /tmp/logs/commitlog/CommitLog-1308946745380.log.header Keyspace Unknown: Cf id 1000: 239223 Cf id 1: 172211 /tmp/logs/commitlog/CommitLog-1308947888397.log.header Keyspace Unknown: Cf id 1001: 57595560 Cf id 1: 816960 Cf id 1000: 0 {noformat} CF 0 is the Status / LocationInfo CF and 1 is the HintedHandof CF. I dont have it now, but IIRC CFStats showed the LocationInfo CF with dirty ops. I was able to repo a case where flushing the CF's did not mark the log segments as obsolete (attached unit-test patch). Steps are: 1. Write to cf1 and flush. 2. Current log segment is marked as dirty at the CL position when the flush started, CommitLog.discardCompletedSegmentsInternal() 3. Do not write to cf1 again. 4. Roll the log, my test does this manually. 5. Write to CF2 and flush. 6. Only CF2 is flushed because it is the only dirty CF. cfs.maybeSwitchMemtable() is not called for cf1 and so log segment 1 is still marked as dirty from cf1. Step 5 is not essential, just matched what I thought was happening. I thought SystemTable.updateToken() was called which does not flush, and this was the last
[jira] [Updated] (CASSANDRA-957) convenience workflow for replacing dead node
[ https://issues.apache.org/jira/browse/CASSANDRA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-957: Attachment: 0003-Make-HintedHandoff-More-reliable.patch 0002-Rework-Hints-to-be-on-token.patch 0001-Support-Token-Replace.patch Adding support for replacing token This also supports replacement with the same IP. Reworked Hints to be based on Token instead of IP's The 3rd part of the patch also makes the hints be delivered to the host (currently seems like it is not delivered at all... Let me know if you want to move this to a different ticket). convenience workflow for replacing dead node Key: CASSANDRA-957 URL: https://issues.apache.org/jira/browse/CASSANDRA-957 Project: Cassandra Issue Type: Wish Components: Core, Tools Reporter: Jonathan Ellis Fix For: 1.0 Attachments: 0001-Support-Token-Replace.patch, 0001-Support-bringing-back-a-node-to-the-cluster-that-exi.patch, 0002-Do-not-include-local-node-when-computing-workMap.patch, 0002-Rework-Hints-to-be-on-token.patch, 0003-Make-HintedHandoff-More-reliable.patch Original Estimate: 24h Remaining Estimate: 24h Replacing a dead node with a new one is a common operation, but nodetool removetoken followed by bootstrap is inefficient (re-replicating data first to the remaining nodes, then to the new one) and manually bootstrapping to a token just less than the old one's, followed by nodetool removetoken is slightly painful and prone to manual errors. First question: how would you expose this in our tool ecosystem? It needs to be a startup-time option to the new node, so it can't be nodetool, and messing with the config xml definitely takes the convenience out. A one-off -DreplaceToken=XXY argument? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of HowToContribute by jeremyhanna
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The HowToContribute page has been changed by jeremyhanna: http://wiki.apache.org/cassandra/HowToContribute?action=diffrev1=40rev2=41 Comment: making the lhf link filter out resolved issues. == Overview == - 1. Pick an issue to work on. If you don't have a specific [[http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s02.html|itch to scratch]], some possibilities are marked with [[https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+12310865+AND+labels+%3D+lhf|the low-hanging fruit label]] in JIRA. + 1. Pick an issue to work on. If you don't have a specific [[http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s02.html|itch to scratch]], some possibilities are marked with [[https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+12310865+AND+labels+%3D+lhf+AND+status+!%3D+resolved|the low-hanging fruit label]] in JIRA. 1. Read the relevant parts of ArchitectureInternals; watching http://www.channels.com/episodes/show/11765800/Getting-to-know-the-Cassandra-Codebase will probably also be useful 1. Check if someone else has already begun work on the change you have in mind in the [[https://issues.apache.org/jira/browse/CASSANDRA|issue tracker]] 1. If not, create a ticket describing the change you're proposing in the issue tracker
[jira] [Commented] (CASSANDRA-674) New SSTable Format
[ https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062662#comment-13062662 ] Stu Hood commented on CASSANDRA-674: I've posted the slightly-divergent branch of YCSB I used for this workload at https://github.com/stuhood/YCSB/tree/monotonic-timeseries New SSTable Format -- Key: CASSANDRA-674 URL: https://issues.apache.org/jira/browse/CASSANDRA-674 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Stu Hood Fix For: 1.0 Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, 674-ycsb.log, trunk-ycsb.log Various tickets exist due to limitations in the SSTable file format, including #16, #47 and #328. Attached is a proposed design/implementation of a new file format for SSTables that addresses a few of these limitations. This v2 implementation is not ready for serious use: see comments for remaining issues. It is roughly the format described here: http://wiki.apache.org/cassandra/FileFormatDesignDoc -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060004#comment-13060004 ] Yang Yang edited comment on CASSANDRA-2843 at 7/10/11 1:42 AM: --- Actually my tests did not use reconcile. All col names are uniq was (Author: yangyangyyy): Actually my tests did not use reconcile. All col names are uniq https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] ColumnFamily during is a good idea. It is clear that avoiding synchronization will be faster, and given the type of operations we do during reads (insertion in sorted order and iteration), an ArrayList backed solution is sure to be faster too. I will also be much gentle on the GC that the linked list ConcurrentSkipListMap uses. I think that all those will help even with relatively small reads. So let's focus on that for this ticket and let other potential improvement to other ticket, especially if it is unclear they bear any noticeable speedup. is quite frankly ugly and will be a maintenance nightmare (you'll have to check you did overwrite every function that touch the map (which is not the case in the patch) and every update to ColumnFamily have to be aware that it should update FastColumnFamily as well). functionnal ColumnFamily implementation (albeit not synchronized). That is, we can't assume that addition will always be in strict increasing order, otherwise again this will be too hard to use. Granted, I don't think it is used in the read path, but I think that the new ColumnFamily implementation could advantageously be used during compaction (by preCompactedRow typically, and possibly other places where concurrent access is not an issue) where this would matter. the remarks above. The patch is against trunk (not 0.8 branch), because it build on the recently committed refactor of ColumnFamily. It refactors ColumnFamily (AbstractColumnContainer actually) to allow for a pluggable backing column map. The ConcurrentSkipListMap implemn is name ThreadSafeColumnMap and the new one is called ArrayBackedColumnMap (which I prefer to FastSomething since it's not a very helpful name). getTopLevelColumns, I pass along a factory (that each backing implementation provides). The main goal was to avoid creating a columnFamily when it's useless (if row cache is enabled on the CF -- btw, this ticket only improve on read for column family with no cache). (addition of column + iteration), the ArrayBacked implementation is faster than the ConcurrentSkipListMap based one. Interestingly though, this is mainly true when some reconciliation of columns happens. That is, if you only add columns with different names, the ArrayBacked implementation is faster, but not dramatically so. If you start adding column that have to be resolved, the ArrayBacked implementation becomes much faster, even with a reasonably small number of columns (inserting 100 columns with only 10 unique column names, the ArrayBacked is already 30% faster). And this mostly due to the overhead of synchronization (of replace()): a TreeMap based implementation is slightly slower than the ArrayBacked one but not by a lot and thus is much faster than the ConcurrentSkipListMap implementation. use a few unit test for the new ArrayBacked implementation). considerably slow (my test of and 40 bytes in value, is about 16ms. org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) concurrentSkipListMap() that maps column names to values. it needs to maintain a more complex structure of map. output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. agree on the general direction. provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in
[jira] [Issue Comment Edited] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061134#comment-13061134 ] Yang Yang edited comment on CASSANDRA-2843 at 7/10/11 1:42 AM: --- i just got down the patch and transfered it to a computer to read Sylvain's approach to compare the last element is quite clean i see no problems the only problem was due to me: the bin search high=mid-1 should be changed to high=mid also with this error fixed , you dont need to special case 1 2 in the end of binsearch was (Author: yangyangyyy): i just got down the patch and transfered it to a computer to read Sylvain's approach to compare the last element is quite clean i see no problems the only problem was due to me: the bin search high=mid-1 should be changed to high=mid also with this error fixed , you dont need to special case 1 2 in the end of binsearch https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] already sorted order it. Which doesn't mean it cannot optimize for it. If you look at the version I attached, at least as far a addColumn is concerned, it does the exact same thing as your version, with the only difference that I first check if adding at the tail is legit and fail back to a binary search if that is not the case. That is, as long as the input is in sorted order, it will be as fast as your implementation (there is one more bytebuffer comparison but I'm willing to bet that it has no noticeable impact on performance). But it won't create unsorted CF if the input is not in sorted order. stick w/ TreeMap for simplicity? the TreeMap implementation so that people can look for themselves. The test simply creates a CF, add columns to it (in sorted order) and do a simple iteration at the end. I've also add a delete at the end because at least in the case of super columns, we do call removeDeleted so the goal was to see if this has a significant impact (the deletes are made at the beginning of the CF, which is the worst case for the ArrayBacked solution). The test also allow to have some column overwrap (to exercise reconciliation). Not that when that happens, the input is not in strict sorted order anymore, but it's mostly at the disadvantage of the ArrayBack implementation there too. Playing with the parameters (number of columns added, number that overlaps, number of deletes) the results seems to always be the same. The ArrayBacked is consistently faster than the TreeMap one that is itself consistently faster than the CSLM one. Now what I meant is that the difference between ArrayBacked and TreeMap is generally not as big as the one with CSLM, but it is still often very noticeable. insertion in sorted order: the insertion is then O(1) and with a small constant factor because we're using ArrayList. TreeMap can't beat that. Given this, and given that ColumnFamily is one of our core data structure, I think we should choose the more efficient implementation for each use case. And truth is, the ArrayBacked implementation is really not very complicated, that's basic stuff. CSLM, and that's what we do on reads. does show that we're much much faster even without reconciliation happening. https://issues.apache.org/jira/browse/CASSANDRA-2843 microBenchmark.patch considerably slow (my test of and 40 bytes in value, is about 16ms. org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) concurrentSkipListMap() that maps column names to values. it needs to maintain a more complex structure of map. output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. agree on the general direction. provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted
[jira] [Updated] (CASSANDRA-2876) JDBC 1.1 Roadmap of Enhancements
[ https://issues.apache.org/jira/browse/CASSANDRA-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Shaw updated CASSANDRA-2876: - Description: Organizational ticket to tie together the proposed improvements to Cassandra's JDBC driver in order to coincide with the 1.0 release of the server-side product in the fall of 2011. The target list of improvements (in no particular order for the moment) are as follows: # Complete the {{PreparedStatement}} functionality by implementing true server side variable binding against pre-compiled CQL references. # Provide simple {{DataSource}} Support. # Provide a full {{PooledDataSource}} implementation that integrates C* with App Servers and POJO Frameworks (like Spring). # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to complete the planned datatype support for {{PreparedStatement}} and {{ResultSet}}. # Enhance the {{Driver}} features to support automatic error recovery and reconnection. # Support {{RowId}} in {{ResultSet}} # Allow bi-directional row access scrolling to complete functionality in the {{ResultSet}}. # Deliver unit tests for each of the major components of the suite. was: Organizational ticket to tie together the proposed improvements to Cassandra's JDBC driver in order to coincide with the 1.0 release of the server-side product in the fall of 2011. The target list of improvements (in no particular order for the moment) are as follows: # Complete the {{PreparedStatement}} functionality by implementing true server side variable binding against pre-compiled CQL references. # Provide simple {{DataSource}} Support. # Provide a full {{PooledDataSource}} implementation that integrates C* with App Servers and POJO Frameworks (like Spring). # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to complete the planed datatype support for {{PreparedStatement}} and {{ResultSet}}. # Enhance the {{Driver}} features to support automatic error recovery and reconnection. # Allow bi-directional row access scrolling to complete functionality in the {{ResultSet}}. JDBC 1.1 Roadmap of Enhancements Key: CASSANDRA-2876 URL: https://issues.apache.org/jira/browse/CASSANDRA-2876 Project: Cassandra Issue Type: Improvement Components: Drivers Affects Versions: 0.8.1 Reporter: Rick Shaw Priority: Minor Labels: cql, jdbc Fix For: 1.0 Organizational ticket to tie together the proposed improvements to Cassandra's JDBC driver in order to coincide with the 1.0 release of the server-side product in the fall of 2011. The target list of improvements (in no particular order for the moment) are as follows: # Complete the {{PreparedStatement}} functionality by implementing true server side variable binding against pre-compiled CQL references. # Provide simple {{DataSource}} Support. # Provide a full {{PooledDataSource}} implementation that integrates C* with App Servers and POJO Frameworks (like Spring). # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to complete the planned datatype support for {{PreparedStatement}} and {{ResultSet}}. # Enhance the {{Driver}} features to support automatic error recovery and reconnection. # Support {{RowId}} in {{ResultSet}} # Allow bi-directional row access scrolling to complete functionality in the {{ResultSet}}. # Deliver unit tests for each of the major components of the suite. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2876) JDBC 1.1 Roadmap of Enhancements
[ https://issues.apache.org/jira/browse/CASSANDRA-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Shaw updated CASSANDRA-2876: - Description: Organizational ticket to tie together the proposed improvements to Cassandra's JDBC driver in order to coincide with the 1.0 release of the server-side product in the fall of 2011. The target list of improvements (in no particular order for the moment) are as follows: # Complete the {{PreparedStatement}} functionality by implementing true server side variable binding against pre-compiled CQL references. # Provide simple {{DataSource}} Support. # Provide a full {{PooledDataSource}} implementation that integrates the C* JDBC driver with App Servers, JPA implementations and POJO Frameworks (like Spring). # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to complete the planned datatype support for {{PreparedStatement}} and {{ResultSet}}. # Enhance the {{Driver}} features to support automatic error recovery and reconnection. # Support {{RowId}} in {{ResultSet}} # Allow bi-directional row access scrolling to complete functionality in the {{ResultSet}}. # Deliver unit tests for each of the major components of the suite. was: Organizational ticket to tie together the proposed improvements to Cassandra's JDBC driver in order to coincide with the 1.0 release of the server-side product in the fall of 2011. The target list of improvements (in no particular order for the moment) are as follows: # Complete the {{PreparedStatement}} functionality by implementing true server side variable binding against pre-compiled CQL references. # Provide simple {{DataSource}} Support. # Provide a full {{PooledDataSource}} implementation that integrates C* with App Servers and POJO Frameworks (like Spring). # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to complete the planned datatype support for {{PreparedStatement}} and {{ResultSet}}. # Enhance the {{Driver}} features to support automatic error recovery and reconnection. # Support {{RowId}} in {{ResultSet}} # Allow bi-directional row access scrolling to complete functionality in the {{ResultSet}}. # Deliver unit tests for each of the major components of the suite. JDBC 1.1 Roadmap of Enhancements Key: CASSANDRA-2876 URL: https://issues.apache.org/jira/browse/CASSANDRA-2876 Project: Cassandra Issue Type: Improvement Components: Drivers Affects Versions: 0.8.1 Reporter: Rick Shaw Priority: Minor Labels: cql, jdbc Fix For: 1.0 Organizational ticket to tie together the proposed improvements to Cassandra's JDBC driver in order to coincide with the 1.0 release of the server-side product in the fall of 2011. The target list of improvements (in no particular order for the moment) are as follows: # Complete the {{PreparedStatement}} functionality by implementing true server side variable binding against pre-compiled CQL references. # Provide simple {{DataSource}} Support. # Provide a full {{PooledDataSource}} implementation that integrates the C* JDBC driver with App Servers, JPA implementations and POJO Frameworks (like Spring). # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to complete the planned datatype support for {{PreparedStatement}} and {{ResultSet}}. # Enhance the {{Driver}} features to support automatic error recovery and reconnection. # Support {{RowId}} in {{ResultSet}} # Allow bi-directional row access scrolling to complete functionality in the {{ResultSet}}. # Deliver unit tests for each of the major components of the suite. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062676#comment-13062676 ] Stu Hood commented on CASSANDRA-47: --- I haven't had any luck seeing actual compression with this patch... is there a manual step to enable it? On OSX, the patch slowed the server down to a crawl, but did not result in compression. Performance seems to be reasonable on Linux, but without any effect: running {{bin/stress -S 1024 -n 100 -C 250 -V}} resulted in 3.3 GB of data. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira