[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089129#comment-13089129 ] Alan Liang commented on CASSANDRA-1608: --- >From a high level, it's looking good. In Manifest.java, either "public void add(SSTableReader reader)" should be should be synchronized or use a NonBlockingHashMap to hold generations because multiple threads could be calling this. > Redesigned Compaction > - > > Key: CASSANDRA-1608 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Benjamin Coverston > Attachments: 1608-22082011.txt, 1608-v2.txt > > > After seeing the I/O issues in CASSANDRA-1470, I've been doing some more > thinking on this subject that I wanted to lay out. > I propose we redo the concept of how compaction works in Cassandra. At the > moment, compaction is kicked off based on a write access pattern, not read > access pattern. In most cases, you want the opposite. You want to be able to > track how well each SSTable is performing in the system. If we were to keep > statistics in-memory of each SSTable, prioritize them based on most accessed, > and bloom filter hit/miss ratios, we could intelligently group sstables that > are being read most often and schedule them for compaction. We could also > schedule lower priority maintenance on SSTable's not often accessed. > I also propose we limit the size of each SSTable to a fix sized, that gives > us the ability to better utilize our bloom filters in a predictable manner. > At the moment after a certain size, the bloom filters become less reliable. > This would also allow us to group data most accessed. Currently the size of > an SSTable can grow to a point where large portions of the data might not > actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087991#comment-13087991 ] Alan Liang commented on CASSANDRA-1608: --- There's a problem with Interval#intersects: {code} public boolean intersects(Interval interval) { return this.contains(interval.min) || this.contains(interval.min); } {code} I think you wanted: {code} return this.contains(interval.min) || this.contains(interval.max); {code} However, a more efficient way to do this would be: {code} return this.min.compareTo(interval.max) <= 0 && return this.max.compareTo(interval.min) >= 0; {code} > Redesigned Compaction > - > > Key: CASSANDRA-1608 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Benjamin Coverston > Attachments: 1608-v11.txt, 1608-v13.txt, 1608-v2.txt > > > After seeing the I/O issues in CASSANDRA-1470, I've been doing some more > thinking on this subject that I wanted to lay out. > I propose we redo the concept of how compaction works in Cassandra. At the > moment, compaction is kicked off based on a write access pattern, not read > access pattern. In most cases, you want the opposite. You want to be able to > track how well each SSTable is performing in the system. If we were to keep > statistics in-memory of each SSTable, prioritize them based on most accessed, > and bloom filter hit/miss ratios, we could intelligently group sstables that > are being read most often and schedule them for compaction. We could also > schedule lower priority maintenance on SSTable's not often accessed. > I also propose we limit the size of each SSTable to a fix sized, that gives > us the ability to better utilize our bloom filters in a predictable manner. > At the moment after a certain size, the bloom filters become less reliable. > This would also allow us to group data most accessed. Currently the size of > an SSTable can grow to a point where large portions of the data might not > actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087991#comment-13087991 ] Alan Liang edited comment on CASSANDRA-1608 at 8/19/11 9:34 PM: There's a problem with Interval#intersects: {code} public boolean intersects(Interval interval) { return this.contains(interval.min) || this.contains(interval.min); } {code} I think you wanted: {code} return this.contains(interval.min) || this.contains(interval.max); {code} However, a more efficient way to do this would be: {code} return this.min.compareTo(interval.max) <= 0 && this.max.compareTo(interval.min) >= 0; {code} was (Author: alanliang): There's a problem with Interval#intersects: {code} public boolean intersects(Interval interval) { return this.contains(interval.min) || this.contains(interval.min); } {code} I think you wanted: {code} return this.contains(interval.min) || this.contains(interval.max); {code} However, a more efficient way to do this would be: {code} return this.min.compareTo(interval.max) <= 0 && return this.max.compareTo(interval.min) >= 0; {code} > Redesigned Compaction > - > > Key: CASSANDRA-1608 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Benjamin Coverston > Attachments: 1608-v11.txt, 1608-v13.txt, 1608-v2.txt > > > After seeing the I/O issues in CASSANDRA-1470, I've been doing some more > thinking on this subject that I wanted to lay out. > I propose we redo the concept of how compaction works in Cassandra. At the > moment, compaction is kicked off based on a write access pattern, not read > access pattern. In most cases, you want the opposite. You want to be able to > track how well each SSTable is performing in the system. If we were to keep > statistics in-memory of each SSTable, prioritize them based on most accessed, > and bloom filter hit/miss ratios, we could intelligently group sstables that > are being read most often and schedule them for compaction. We could also > schedule lower priority maintenance on SSTable's not often accessed. > I also propose we limit the size of each SSTable to a fix sized, that gives > us the ability to better utilize our bloom filters in a predictable manner. > At the moment after a certain size, the bloom filters become less reliable. > This would also allow us to group data most accessed. Currently the size of > an SSTable can grow to a point where large portions of the data might not > actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: 0001-timestamp-bucketed-compaction-strategy-V2.patch rebased onto trunk > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0001-timestamp-bucketed-compaction-strategy-V2.patch, > 0001-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064230#comment-13064230 ] Alan Liang commented on CASSANDRA-2753: --- Daniel, Which test does this break? Can you elaborate? > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Fix For: 1.0 > > Attachments: > 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch, > 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V3.patch, > 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch, > supercolumn.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2753: -- Attachment: 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V3.patch added maxTimestamp() to IColumn > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch, > 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V3.patch, > 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2753: -- Attachment: 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch V2 patch based on jbellis' comments > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch, > 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056362#comment-13056362 ] Alan Liang commented on CASSANDRA-2753: --- bq. No support for supercolumns? Wow. Good catch. I've added test tests for this as well. bq. it would be more clear if observeColumnsInSSTable took a CFMetaData object instead of a CF, to get a serializer from. I've added a helper method CFMetaData.getColumnSerializer() to do this. bq. nit: SSTMC.setMaxTimestamp would be more accurately named updateMaxTimestamp Makes sense. bq. IMO SSTM deserialize versioning logic would be clearer if it were all in SSTMSerializer instead of split between that and openFromDescriptor. Makes sense. bq. Suggest adding a comment that SSTableWriter.append(AbstractCompactedRow row) deliberately avoids calling updateMaxTimestamp b/c otherwise we'd have to deserialize EchoedRow. Sounds good. bq. where is the max-timestamp-of-compacted-sstables logic? I didn't notice it. I put this in ColumnFamilyStore.createCompactionWriter(): {code} public SSTableWriter createCompactionWriter(long estimatedRows, String location, Collection sstables) throws IOException { ReplayPosition rp = ReplayPosition.getReplayPosition(sstables); SSTableMetadata.Collector sstableMetadataCollector = SSTableMetadata.createCollector().replayPosition(rp); // get the max timestamp of the precompacted sstables for (SSTableReader sstable : sstables) sstableMetadataCollector.updateMaxTimestamp(sstable.getMaxTimestamp()); return new SSTableWriter(getTempSSTablePath(location), estimatedRows, metadata, partitioner, sstableMetadataCollector); } {code} bq. nit: renaming SSTableWriter.writeMetadata feels gratuitous I renamed it back to writeMetadata. bq. nit: prefer initializing fields that don't need constructor parameters, at declaration time (looking at RowIndexer.sstMC) Makes sense. > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: (was: 0004-timestamp-bucketed-compaction-strategy.patch) > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0001-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: 0001-timestamp-bucketed-compaction-strategy.patch Highlights of this patch: - Introduce a timestamp compaction strategy - Introduce Expiration Task - option to delete or move to expired folder - Tests for timestamp bucketing strategy This patch depends on https://issues.apache.org/jira/browse/CASSANDRA-2753 to be committed. > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0001-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052834#comment-13052834 ] Alan Liang commented on CASSANDRA-2735: --- This compaction strategy is useful for time series data. Eg. you capture counts for each minute, hour, day. Ordering and compacting the sstables by column timestamp allows you to expire sstables more effectively compared to the size tiered approach in trunk. This is because the size tiered approach could combine an old sstable with a new sstable, which renders the sstable to look like it is quite new. You would not be able to expire the old data in this case. > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0004-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2753: -- Attachment: 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch 2nd attempt based on Jonathan Ellis' comments. Highlights of the patch are: - captures max column timestamp at the following places: memtable flush, compaction and rebuilding after streamed - store max timestamp in stats file and created SSTableMetadata class to encapsulate the stats file - moved estimated histograms for column/row counts and replay position into stats file - bumped version number - tests > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2778) Unable to set compaction strategy in cli using create column family command
[ https://issues.apache.org/jira/browse/CASSANDRA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2778: -- Attachment: (was: 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch) > Unable to set compaction strategy in cli using create column family command > --- > > Key: CASSANDRA-2778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2778 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang > Attachments: > 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch > > > The following command does not set compaction strategy and its options: > {code} > create column family Standard1 > with comparator = BytesType > and compaction_strategy = > 'org.apache.cassandra.db.compaction.TimestampBucketedCompactionStrategy' > and compaction_strategy_options = [{max_sstable_size:504857600, > retention_in_seconds:60}]; > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2778) Unable to set compaction strategy in cli using create column family command
[ https://issues.apache.org/jira/browse/CASSANDRA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2778: -- Attachment: 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch > Unable to set compaction strategy in cli using create column family command > --- > > Key: CASSANDRA-2778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2778 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang > Attachments: > 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch, > 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch > > > The following command does not set compaction strategy and its options: > {code} > create column family Standard1 > with comparator = BytesType > and compaction_strategy = > 'org.apache.cassandra.db.compaction.TimestampBucketedCompactionStrategy' > and compaction_strategy_options = [{max_sstable_size:504857600, > retention_in_seconds:60}]; > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2778) Unable to set compaction strategy in cli using create column family command
[ https://issues.apache.org/jira/browse/CASSANDRA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2778: -- Attachment: 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch > Unable to set compaction strategy in cli using create column family command > --- > > Key: CASSANDRA-2778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2778 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang > Attachments: > 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch > > > The following command does not set compaction strategy and its options: > {code} > create column family Standard1 > with comparator = BytesType > and compaction_strategy = > 'org.apache.cassandra.db.compaction.TimestampBucketedCompactionStrategy' > and compaction_strategy_options = [{max_sstable_size:504857600, > retention_in_seconds:60}]; > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2778) Unable to set compaction strategy in cli using create column family command
Unable to set compaction strategy in cli using create column family command --- Key: CASSANDRA-2778 URL: https://issues.apache.org/jira/browse/CASSANDRA-2778 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alan Liang Assignee: Alan Liang The following command does not set compaction strategy and its options: {code} create column family Standard1 with comparator = BytesType and compaction_strategy = 'org.apache.cassandra.db.compaction.TimestampBucketedCompactionStrategy' and compaction_strategy_options = [{max_sstable_size:504857600, retention_in_seconds:60}]; {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2769) Cannot Create Duplicate Compaction Marker
[ https://issues.apache.org/jira/browse/CASSANDRA-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050081#comment-13050081 ] Alan Liang commented on CASSANDRA-2769: --- Instead of letting DataTracker#markCompacting modify the subset of sstables to be compacted, I think it might be cleaner if it didn't and relied on the CompactionStrategy to select the correct sstables. We can do this by having the CompactionStrategy get the non compacting sstables from the DataTracker and work with those to generate the buckets. The strategy should also be responsible for creating buckets that fit within the min/max thresholds. #markCompacting would then be changed such that it can either accept/reject a bucket to be compacted instead of modifying the subset. #markCompacting will also serve to handle the race condition of the DataTracker being inaccurate, whereby, it will move on to other buckets. With this, we can avoid generating buckets that are already compacting and it gives full control of what actually is compacted by the CompactionStrategy. What do you guys think? > Cannot Create Duplicate Compaction Marker > - > > Key: CASSANDRA-2769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2769 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Benjamin Coverston >Assignee: Sylvain Lebresne > Fix For: 0.8.2 > > Attachments: > 0001-0.8.0-Remove-useless-unmarkCompacting-in-doCleanup.patch, > 0001-Do-compact-only-smallerSSTables.patch, > 0002-Only-compact-what-has-been-succesfully-marked-as-com.patch > > > Concurrent compaction can trigger the following exception when two threads > compact the same sstable. DataTracker attempts to prevent this but apparently > not successfully. > java.io.IOError: java.io.IOException: Unable to create compaction marker > at > org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:638) > at > org.apache.cassandra.db.DataTracker.removeOldSSTablesSize(DataTracker.java:321) > at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:294) > at > org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:255) > at > org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:932) > at > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173) > at > org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:119) > at > org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:102) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.io.IOException: Unable to create compaction marker > at > org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:634) > ... 12 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049444#comment-13049444 ] Alan Liang commented on CASSANDRA-2753: --- There are basically 3 places where we need to track max timestamps: 1. Memtable flush 2. During compaction (we simply take the max timestamp already recorded for the sstables) 3. Streamed data (normal columns and counter columns) The challenge here is to capture the max timestamp for newly streamed data. For non-counter streamed data, RowIndexer#doIndexing goes through the streamed data files and simply updates the cache for the new rows. It iterates over the column families without deserializing the columns. To capture max timestamp here, I actually deserialize the columns from disk. This incurs more CPU but since it is already doing disk seeks when calling deserializeFromSSTableNoColumns(), the seek is less costly. For counter streamed data, CommutativeRowIndexer#doIndexing actually creates new data files from the streamed data files. It does this by building an AbstractCompactedRow which can be either PreCompactedRow or LazilyCompactedRow. Collecting the max timestamp for PreCompactedRow is easy since all the columns are in memory. For LazilyCompactedRow, the only place where I can observe the max timestamp is during the #write method. Capturing the max timestamp is obviously not ideal since it would introduce a side effect. Alternatively, I could capture the max timestamp by deserializing the entire LazilyCompactedRow again but this obviously would mean more IO/CPU. So it looks like I have to capture the max timestamp inside #write. > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049444#comment-13049444 ] Alan Liang edited comment on CASSANDRA-2753 at 6/14/11 9:40 PM: There are basically 3 places where we need to track max timestamps: 1. Memtable flush 2. During compaction (we simply take the max timestamp already recorded for the sstables) 3. Streamed data (normal columns and counter columns) The challenge here is to capture the max timestamp for newly streamed data. For non-counter streamed data, RowIndexer#doIndexing goes through the streamed data files and simply updates the cache for the new rows. It iterates over the column families without deserializing the columns. To capture max timestamp here, I actually deserialize the columns from disk. This incurs more CPU but since it is already doing disk seeks when calling deserializeFromSSTableNoColumns(), the seek is less costly. For counter streamed data, CommutativeRowIndexer#doIndexing actually creates new data files from the streamed data files. It does this by building an AbstractCompactedRow which can be either PreCompactedRow or LazilyCompactedRow. Collecting the max timestamp for PreCompactedRow is easy since all the columns are in memory. For LazilyCompactedRow, the only place where I can observe the max timestamp is during the #write method. Capturing the max timestamp inside #write is obviously not ideal since it would introduce a side effect. Alternatively, I could capture the max timestamp by deserializing the entire LazilyCompactedRow again but this obviously would mean more IO/CPU. So it looks like I have to capture the max timestamp inside #write. was (Author: alanliang): There are basically 3 places where we need to track max timestamps: 1. Memtable flush 2. During compaction (we simply take the max timestamp already recorded for the sstables) 3. Streamed data (normal columns and counter columns) The challenge here is to capture the max timestamp for newly streamed data. For non-counter streamed data, RowIndexer#doIndexing goes through the streamed data files and simply updates the cache for the new rows. It iterates over the column families without deserializing the columns. To capture max timestamp here, I actually deserialize the columns from disk. This incurs more CPU but since it is already doing disk seeks when calling deserializeFromSSTableNoColumns(), the seek is less costly. For counter streamed data, CommutativeRowIndexer#doIndexing actually creates new data files from the streamed data files. It does this by building an AbstractCompactedRow which can be either PreCompactedRow or LazilyCompactedRow. Collecting the max timestamp for PreCompactedRow is easy since all the columns are in memory. For LazilyCompactedRow, the only place where I can observe the max timestamp is during the #write method. Capturing the max timestamp is obviously not ideal since it would introduce a side effect. Alternatively, I could capture the max timestamp by deserializing the entire LazilyCompactedRow again but this obviously would mean more IO/CPU. So it looks like I have to capture the max timestamp inside #write. > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049413#comment-13049413 ] Alan Liang commented on CASSANDRA-2753: --- I already have a solution to capture max timestamp for non counter data as seen in the current patch. So this really is only a problem for streamed counter data. > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049323#comment-13049323 ] Alan Liang commented on CASSANDRA-2753: --- Makes sense, I'll move the tracking outside of the serializer. However, one thing I realized that I missed is to also capture max timestamp of counter data being streamed over from the other nodes. The challenge is where to capture the max timestamp without doing it within the AbstractedCompactedRow#write method. But it seems like I have no choice without sacrificing performance by iterating over the file again to collect the max timestamp. This is because a LazilyCompactedRow keeps only a single column in memory and this only happens within the write method. What do you think? > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2629) Move key reads into SSTableIterators
[ https://issues.apache.org/jira/browse/CASSANDRA-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048946#comment-13048946 ] Alan Liang commented on CASSANDRA-2629: --- CompactionManager.java: -retrying from key/length from index is useful, we should add this back, as you mentioned in your comments above. -move "long rowSizeFromIndex = nextRowPositionFromIndex - currentRowPositionFromIndex;" into the IF statement where it is needed -in your log warnings, specifying the actual sstable will help with debugging SSTableNamesIterator.java: -remove "this.key = key;" for both constructors and that means "public DecoratedKey key;" can still be final *init() method should be more descriptive -remove @param key comment from IFilter.java and SSTableSliceIterator.java SSTableWriter.java: -calling close() on an SSTableIdentityIterator to go to the end doesn't sound right. Use another name other than "close()" -safer to updateCache(iter) AFTER appending to writer > Move key reads into SSTableIterators > > > Key: CASSANDRA-2629 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2629 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Stu Hood >Assignee: Stu Hood > Fix For: 1.0 > > Attachments: > 0001-CASSANDRA-2629-Move-key-and-row-size-reading-into-the-.txt, > 0002-CASSANDRA-2629-Remove-the-retry-with-key-from-index-st.txt > > > All SSTableIterators have a constructor that assumes the key and length has > already been parsed. Moving this logic inside the iterator will improve > symmetry and allow the file format to change without iterator consumers > knowing it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-rename-major-minor-to-maximal-background-in-Compacti.patch 0001-pluggable-compaction.patch > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-pluggable-compaction.patch, 0001-pluggable-compaction.patch, > 0001-pluggable-compaction.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, > 0002-rename-major-minor-to-maximal-background-in-Compacti.patch, > 0002-rename-major-minor-to-maximal-background-in-Compacti.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047442#comment-13047442 ] Alan Liang commented on CASSANDRA-1610: --- The only difference is: 84 public boolean isCompactionDisabled() 89 public int getMinCompactionThreshold() 94 public int getMaxCompactionThreshold() They were for convenience for the strategy implementer to have all things in one place. I'll remove. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-pluggable-compaction.patch, 0001-pluggable-compaction.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, > 0002-rename-major-minor-to-maximal-background-in-Compacti.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2336) Extract SSTable.Builder/IndexWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047329#comment-13047329 ] Alan Liang commented on CASSANDRA-2336: --- These changes look good. +1 > Extract SSTable.Builder/IndexWriter > --- > > Key: CASSANDRA-2336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2336 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Stu Hood >Assignee: Stu Hood >Priority: Minor > Fix For: 1.0 > > Attachments: 0001-CASSANDRA-2336-Extract-IndexWriter.txt, > 0002-CASSANDRA-2336-Extract-Builder.txt, > 0003-CASSANDRA-2336-Move-statistics-writing-into-IndexWrite.txt > > > The Builder and IndexWriter classes in SSTableWriter are static, and > independently useful. Additionally, we need the ability to subclass them for > CASSANDRA-674 and CASSANDRA-2319. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046939#comment-13046939 ] Alan Liang commented on CASSANDRA-2753: --- In this patch, I've captured the max timestamp and stored it as part of the stats file. I've encapsulated this file through a class called SSTableMetadata. Estimated histograms for row size and column counts and replay positions will also be available via this class. > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: (was: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch) > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0004-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: (was: 0002-timestamp-bucketed-compaction-strategy.patch) > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0004-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: 0004-timestamp-bucketed-compaction-strategy.patch New patch has code just for timestamp compaction strategy. > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0004-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2753: -- Attachment: 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > Capture the max client timestamp for an SSTable > --- > > Key: CASSANDRA-2753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-rename-major-minor-to-maximal-background-in-Compacti.patch 0001-pluggable-compaction.patch new patch incorporates suggestions by jbellis, also, renamed minor/major -> background/maximal > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-pluggable-compaction.patch, 0001-pluggable-compaction.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, > 0002-rename-major-minor-to-maximal-background-in-Compacti.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046750#comment-13046750 ] Alan Liang edited comment on CASSANDRA-2735 at 6/9/11 8:01 PM: --- Splitting out the capturing of max client supplied timestamp into a separate ticket (#2753) so that other tickets can benefit. was (Author: alanliang): Splitting out the capturing of max client supplied timestamp into a separate ticket so that other tickets can benefit. > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0002-timestamp-bucketed-compaction-strategy.patch, > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
Capture the max client timestamp for an SSTable --- Key: CASSANDRA-2753 URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Alan Liang Assignee: Alan Liang Priority: Minor -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046199#comment-13046199 ] Alan Liang edited comment on CASSANDRA-1610 at 6/8/11 9:10 PM: --- bq. I think Ben's selection of methods for the CompactionStrategy is an improvement, but I do like having an abstract class so it's obvious what the contract is for us vs having to inject parameters post-construction. I agree, I'll go back to the Abstract class approach. bq. I'd like to move away from minor/major terms as too tied to the old compaction internals. Perhaps background/maximal instead? Sounds good to me. bq. We should also make user defined compactions part of ACS – for some strategies (e.g. leveldb) we want to be able to reject user requests that would break strategy invariants. Note that this should probably return a single Task, rather than a list. ("Maximal" will also usually return a single task, but it's cleaner to represent "nothing to do" as an empty list, than as null.) Sounds good to me. bq. handleInsufficientSpaceForCompaction is a bad encapsulation; it means both it and its caller have to deal with "find a place for an sstable." suggest leaving it up to CT.execute to deal with. Sounds good to me. So if a strategy wants to customize the behavior of handling insufficient space, they'd have to implement their own CompactionTask (or override the existing one). What do you think about that? Another thing is... since space is always a race condition, I could leave it up to the strategy to ensure the sstable it has selected has a reasonable amount of space for compaction. I'll resubmit a patch with all these suggestions. Thanks! was (Author: alanliang): bq. I think Ben's selection of methods for the CompactionStrategy is an improvement, but I do like having an abstract class so it's obvious what the contract is for us vs having to inject parameters post-construction. I agree, I'll go back to the Abstract class approach. bq. I'd like to move away from minor/major terms as too tied to the old compaction internals. Perhaps background/maximal instead? Sounds good to me. bq. We should also make user defined compactions part of ACS – for some strategies (e.g. leveldb) we want to be able to reject user requests that would break strategy invariants. Note that this should probably return a single Task, rather than a list. ("Maximal" will also usually return a single task, but it's cleaner to represent "nothing to do" as an empty list, than as null.) Sounds good to me. bq. handleInsufficientSpaceForCompaction is a bad encapsulation; it means both it and its caller have to deal with "find a place for an sstable." suggest leaving it up to CT.execute to deal with. Sounds good to me. I'll resubmit a patch with all these suggestions. Thanks! > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-pluggable-compaction.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046199#comment-13046199 ] Alan Liang commented on CASSANDRA-1610: --- bq. I think Ben's selection of methods for the CompactionStrategy is an improvement, but I do like having an abstract class so it's obvious what the contract is for us vs having to inject parameters post-construction. I agree, I'll go back to the Abstract class approach. bq. I'd like to move away from minor/major terms as too tied to the old compaction internals. Perhaps background/maximal instead? Sounds good to me. bq. We should also make user defined compactions part of ACS – for some strategies (e.g. leveldb) we want to be able to reject user requests that would break strategy invariants. Note that this should probably return a single Task, rather than a list. ("Maximal" will also usually return a single task, but it's cleaner to represent "nothing to do" as an empty list, than as null.) Sounds good to me. bq. handleInsufficientSpaceForCompaction is a bad encapsulation; it means both it and its caller have to deal with "find a place for an sstable." suggest leaving it up to CT.execute to deal with. Sounds good to me. I'll resubmit a patch with all these suggestions. Thanks! > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-pluggable-compaction.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: 0002-timestamp-bucketed-compaction-strategy.patch Rebased once again due to change from AbstractCompactionStrategy to ICompactionStrategy #1610 > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: 0002-timestamp-bucketed-compaction-strategy.patch, > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0001-pluggable-compaction.patch Removed updateEstimatedCompactions() from strategy since it is no longer called. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-pluggable-compaction.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Comment: was deleted (was: Removed unused/duplicate imports.) > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: (was: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch) > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch Removed unused/duplicate imports. > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch, > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-pluggable-compaction.patch 0001-move-compaction-code-into-own-package.patch Combed through the files and removed unused/duplicate imports > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: (was: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch) > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch Upload correct patch. > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-pluggable-compaction.patch 0001-move-compaction-code-into-own-package.patch I apologize, I uploaded the wrong diffs. This is the one. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch Rebased. > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: (was: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch) > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Attachments: > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-pluggable-compaction.patch 0001-move-compaction-code-into-own-package.patch Rebased, fixed tests, added documentation in the cli help. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, > 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043867#comment-13043867 ] Alan Liang commented on CASSANDRA-1610: --- Wanted to add a little bit more context. This ticket now only addresses pluggable compaction only, I've moved the implementation of a timestamp based compaction to https://issues.apache.org/jira/browse/CASSANDRA-2735. This patch makes compaction pluggable in the sense that, you can implement your own AbstractCompactionStrategy. An AbstractCompactionStrategy is responsible for selecting the sstables for minor and major compaction. The strategy returns a list of AbstractCompactionTasks that are to be executed by the CompactionManager. These tasks can be regular compaction, expiration of sstables (see #2735), cleanup tasks, etc. For compaction, a strategy returns a list of CompactionTask's. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043867#comment-13043867 ] Alan Liang edited comment on CASSANDRA-1610 at 6/3/11 4:56 PM: --- Wanted to add a little bit more context. This ticket now only addresses pluggable compaction, I've moved the implementation of a timestamp based compaction to https://issues.apache.org/jira/browse/CASSANDRA-2735. This patch makes compaction pluggable in the sense that, you can implement your own AbstractCompactionStrategy. An AbstractCompactionStrategy is responsible for selecting the sstables for minor and major compaction. The strategy returns a list of AbstractCompactionTasks that are to be executed by the CompactionManager. These tasks can be regular compaction, expiration of sstables (see #2735), cleanup tasks, etc. For compaction, a strategy returns a list of CompactionTask's. was (Author: alanliang): Wanted to add a little bit more context. This ticket now only addresses pluggable compaction only, I've moved the implementation of a timestamp based compaction to https://issues.apache.org/jira/browse/CASSANDRA-2735. This patch makes compaction pluggable in the sense that, you can implement your own AbstractCompactionStrategy. An AbstractCompactionStrategy is responsible for selecting the sstables for minor and major compaction. The strategy returns a list of AbstractCompactionTasks that are to be executed by the CompactionManager. These tasks can be regular compaction, expiration of sstables (see #2735), cleanup tasks, etc. For compaction, a strategy returns a list of CompactionTask's. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-pluggable-compaction.patch 0001-move-compaction-code-into-own-package.patch rebased to trunk > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2576) Rewrite into new file post streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043078#comment-13043078 ] Alan Liang commented on CASSANDRA-2576: --- Looks good, but why are we not adding row sizes and column counts to the estimated histograms for CommutativeRowIndexer#doIndexing ? > Rewrite into new file post streaming > > > Key: CASSANDRA-2576 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2576 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Stu Hood >Assignee: Stu Hood > Fix For: 1.0 > > Attachments: > 0001-CASSANDRA-2576-Don-t-depend-on-a-byte-for-byte-match-f.txt, > 0002-CASSANDRA-2576-Rebuild-into-a-new-file-to-minimize-mag.txt > > > Commutative/counter column families use a separate path to rebuild sstables > post streaming, and that path currently rewrites the data within the streamed > file. While this is great for space efficiency, it means a duplicated code > path for writing sstables, which makes it more difficult to make changes like > #674. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy
[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2735: -- Attachment: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > Timestamp Based Compaction Strategy > --- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Alan Liang >Assignee: Alan Liang >Priority: Minor > Attachments: > 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2735) Timestamp Based Compaction Strategy
Timestamp Based Compaction Strategy --- Key: CASSANDRA-2735 URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Alan Liang Assignee: Alan Liang Priority: Minor Compaction strategy implementation based on max timestamp ordering of the sstables while satisfying max sstable size, min and max compaction thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Description: In CASSANDRA-1608, I proposed some changes on how compaction works. I think it also makes sense to allow the ability to have pluggable compaction per CF. There could be many types of workloads where this makes sense. One example we had at Digg was to completely throw away certain SSTables after N days. This ticket addresses making compaction pluggable only. was: In CASSANDRA-1608, I proposed some changes on how compaction works. I think it also makes sense to allow the ability to have pluggable compaction per CF. There could be many types of workloads where this makes sense. One example we had at Digg was to completely throw away certain SSTables after N days. The goal of this ticket is to make compaction pluggable enough to support compaction based on max timestamp ordering of the sstables while satisfying max sstable size, min and max compaction thresholds. Another goal is to allow expiration of sstables based on a timestamp. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > This ticket addresses making compaction pluggable only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-pluggable-compaction.patch 0001-move-compaction-code-into-own-package.patch 2nd attempt after rebasing with trunk > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch, > 0002-pluggable-compaction.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > The goal of this ticket is to make compaction pluggable enough to support > compaction based on max timestamp ordering of the sstables while satisfying > max sstable size, min and max compaction thresholds. Another goal is to allow > expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Description: In CASSANDRA-1608, I proposed some changes on how compaction works. I think it also makes sense to allow the ability to have pluggable compaction per CF. There could be many types of workloads where this makes sense. One example we had at Digg was to completely throw away certain SSTables after N days. The goal of this ticket is to make compaction pluggable enough to support compaction based on max timestamp ordering of the sstables while satisfying max sstable size, min and max compaction thresholds. Another goal is to allow expiration of sstables based on a timestamp. was:In CASSANDRA-1608, I proposed some changes on how compaction works. I think it also makes sense to allow the ability to have pluggable compaction per CF. There could be many types of workloads where this makes sense. One example we had at Digg was to completely throw away certain SSTables after N days. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. > The goal of this ticket is to make compaction pluggable enough to support > compaction based on max timestamp ordering of the sstables while satisfying > max sstable size, min and max compaction thresholds. Another goal is to allow > expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033258#comment-13033258 ] Alan Liang commented on CASSANDRA-1610: --- "Looking quickly through that code, it looks a good chunk of the code is here to support the expiring of sstables, and it's pretty much hardcoded. Isn't there a way to encapsulate that better ?" You're right, it might make more sense to allow a strategy to define how it should expire the sstables. I'll try and fix the description. But I want to keep the implemented strategies with this ticket because they justify why the interfaces are worthwhile as Stu pointed out above. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032235#comment-13032235 ] Alan Liang commented on CASSANDRA-1610: --- Updated patch files. > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0001-move-compaction-code-into-own-package.patch > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: (was: 0002-Pluggable-Compaction-and-Expiration.patch) > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: (was: 0001-move-compaction-code-into-own-package.patch) > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-Pluggable-Compaction-and-Expiration.patch > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032205#comment-13032205 ] Alan Liang commented on CASSANDRA-1610: --- Some TODOs: -add mockito dependency to test build only -determine why DatabaseDescriptorTest#serDe() fails -validation of compaction_strategy_options -more tests for expiration of files > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0002-Pluggable-Compaction-and-Expiration.patch > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch, > 0002-Pluggable-Compaction-and-Expiration.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-1610: -- Attachment: 0001-move-compaction-code-into-own-package.patch > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Labels: compaction > Fix For: 1.0 > > Attachments: 0001-move-compaction-code-into-own-package.patch > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-1610) Pluggable Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang reassigned CASSANDRA-1610: - Assignee: Alan Liang > Pluggable Compaction > > > Key: CASSANDRA-1610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1610 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Goffinet >Assignee: Alan Liang >Priority: Minor > Fix For: 1.0 > > > In CASSANDRA-1608, I proposed some changes on how compaction works. I think > it also makes sense to allow the ability to have pluggable compaction per CF. > There could be many types of workloads where this makes sense. One example we > had at Digg was to completely throw away certain SSTables after N days. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2288) AES Counter Repair Improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2288: -- Attachment: CASSANDRA-2288-aes_counter_repair_improvements.diff > AES Counter Repair Improvements > --- > > Key: CASSANDRA-2288 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2288 > Project: Cassandra > Issue Type: Improvement > Components: Core >Affects Versions: 0.8 >Reporter: Alan Liang >Assignee: Alan Liang > Attachments: CASSANDRA-2288-aes_counter_repair_improvements.diff > > > A few issues found for AES Counter Repair in > AESCommutativeRowIndexer#doIndexing: > - sync() being called for each row in sstable > - because the sstable is rebuilt inline (read and write on same file), this > causes seeking back and forth of write and read positions which causes many > flushes > - BufferedRandomAccessFile#setLength does not work with buffers > Fixed: > - remove sync() until end > - use two BufferedRandomAccessFile's one for reader, one for writer > - cache length of reader file > - implement BufferedRandomAccessFile#setLength to work with buffer -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2288) AES Counter Repair Improvements
AES Counter Repair Improvements --- Key: CASSANDRA-2288 URL: https://issues.apache.org/jira/browse/CASSANDRA-2288 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8 Reporter: Alan Liang Assignee: Alan Liang A few issues found for AES Counter Repair in AESCommutativeRowIndexer#doIndexing: - sync() being called for each row in sstable - because the sstable is rebuilt inline (read and write on same file), this causes seeking back and forth of write and read positions which causes many flushes - BufferedRandomAccessFile#setLength does not work with buffers Fixed: - remove sync() until end - use two BufferedRandomAccessFile's one for reader, one for writer - cache length of reader file - implement BufferedRandomAccessFile#setLength to work with buffer -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (CASSANDRA-2171) Record and expose flush rate per CF
[ https://issues.apache.org/jira/browse/CASSANDRA-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang reassigned CASSANDRA-2171: - Assignee: Alan Liang (was: Stu Hood) > Record and expose flush rate per CF > --- > > Key: CASSANDRA-2171 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2171 > Project: Cassandra > Issue Type: Improvement >Affects Versions: 0.8 >Reporter: Stu Hood >Assignee: Alan Liang > Fix For: 0.8 > > Attachments: expose_flush_rate_per_cf_patch.diff > > > In order to automatically throttle compaction to some multiple of the flush > rate, we need to record the flush rate across the system. Since this might be > useful information on a per CF basis, this ticket will deal with recording > the flush rate in the CFStore object, and exposing it via JMX. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2171) Record and expose flush rate per CF
[ https://issues.apache.org/jira/browse/CASSANDRA-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Liang updated CASSANDRA-2171: -- Attachment: expose_flush_rate_per_cf_patch.diff Attached patch to record flush rate per CF. Exposed this rate through JMX. > Record and expose flush rate per CF > --- > > Key: CASSANDRA-2171 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2171 > Project: Cassandra > Issue Type: Improvement >Affects Versions: 0.8 >Reporter: Stu Hood >Assignee: Stu Hood > Fix For: 0.8 > > Attachments: expose_flush_rate_per_cf_patch.diff > > > In order to automatically throttle compaction to some multiple of the flush > rate, we need to record the flush rate across the system. Since this might be > useful information on a per CF basis, this ticket will deal with recording > the flush rate in the CFStore object, and exposing it via JMX. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira