[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071172#comment-13071172 ] Pavel Yaskevich edited comment on CASSANDRA-47 at 7/26/11 4:04 PM: --- We need to decide do we need to do this per CF or at the global level. I don't think that mmap of the compressed file is a good idea because we anyway won't be able to avoid a buffer copies as we do with uncompressed data (see MappedFileDataInput). Agree with other arguments not related to mmap mode. bq. Let's add the 'compression algorithm' in the compressionInfo component. It's fine to hard set it to Snappy for writes and ignore the value on read for now. We will need that field to be fixed size or size + value because just writing a string at the header could potentially be dangerous. bq. In SSTR and SSTW, we can use the isCompressed SSTable flag instead of 'if (components.contains(Component.COMPRESSION_INFO))'. I will remove one use of it in the SSTW but in the SSTR it is used in the static method where we don't have isCompressed flag. was (Author: xedin): We need to decide do we need to do this per CF or at the global level. I don't think that mmap of the compressed file is a good idea because we anyway won't be able to avoid a buffer copies as we do with uncompressed data (see MappedFileDataInput). Agree with other arguments not related to mmap mode. bq. In SSTR and SSTW, we can use the isCompressed SSTable flag instead of 'if (components.contains(Component.COMPRESSION_INFO))'. I will remove one use of it in the SSTW but in the SSTR it is used in the static method where we don't have isCompressed flag. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47-v3-rebased.patch, CASSANDRA-47-v3.patch, CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067781#comment-13067781 ] Pavel Yaskevich edited comment on CASSANDRA-47 at 7/19/11 3:20 PM: --- bq. A small detail though is that I would store the chunk offsets instead of the chunk sizes, the reason being that it's more resilient to corruption (typically, with chunk sizes, if the first entry is corrupted you're screwed, with offsets, you only have one or two chunks that are unreadable). +1 if we will go with a separate file. I'm thinking if we will go with a separate file I will use the same strategy as I did in v1 - store chunk size at the beginning of the chunk and re-read it instead of keeping it in memory (lowers memory usage for larger files). bq. After all, CompressedDataFile is just a BRAF with a fixed buffer size, and a mechanism to translate pre-compaction file position to compressed file position (roughly). So I'm pretty sure it should be possible to have CompressedDataFile extend BRAF with minimum refactoring (of BRAF that is). It would also lift for free the limitation of not have read-write compressed file (not that we use them but ...). To extend BRAF we will need to split it into Input/Output classes which will imply refactoring of skip cache functionality and other parts of that class. I'd rather create a separate issue to do that after compression is committed instead of putting all eggs in one basket. +1 on everything else. was (Author: xedin): bq. A small detail though is that I would store the chunk offsets instead of the chunk sizes, the reason being that it's more resilient to corruption (typically, with chunk sizes, if the first entry is corrupted you're screwed, with offsets, you only have one or two chunks that are unreadable). +1 if we will go with a separate file. I'm thinking if we will go with a separate file I will use the same strategy as I did in v1 - store chunk size at the beginning of the chunk and re-read it instead of keeping it in memory (lowers memory usage for larger files). bq. After all, CompressedDataFile is just a BRAF with a fixed buffer size, and a mechanism to translate pre-compaction file position to compressed file position (roughly). So I'm pretty sure it should be possible to have CompressedDataFile extend BRAF with minimum refactoring (of BRAF that is). It would also lift for free the limitation of not have read-write compressed file (not that we use them but ...). To extend BRAF will will need to split it into Input/Output classes which will imply refactoring of skip cache functionality and other parts of that class. I'd rather create a separate issue to do that after compression is committed instead of putting all eggs in one basket. +1 on everything else. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064005#comment-13064005 ] Pavel Yaskevich edited comment on CASSANDRA-47 at 7/12/11 5:09 PM: --- Thanks for your report! This will be fixed in next patch. was (Author: xedin): Thanks for you report! This will be fixed in next patch. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062204#comment-13062204 ] Pavel Yaskevich edited comment on CASSANDRA-47 at 7/8/11 9:46 PM: -- Patch introduces CompressedDataFile with Input/Output classes. Snappy is used for compression/decompression because it showed better speeds in tests comparing to ning. Files are split into 4 bytes + 64kb chunks where 4 bytes hold information about compressed chunk size, not that current SSTable file format is preserved and no modifications were made to index, statistics or filter components. Both Input and Output classes extend RandomAccessFile so random I/O works as expected. All SSTable files are opened using CompressedDataFile.Input. On startup when SSTableReader.open gets called it first checks if data file is already compressed and compresses if it was not already compressed so users won't have a problem after they update. At the header of the file it reserves 8 bytes for a real data size so other components of the system that use SSTables and SSTables itself have no idea that data file is compressed. Streaming of data file sends decompressed chunks for convenience of maintaing transfer and receiving party compresses all data before write to the backing file (see CompressedDataFile.transfer(...) and CompressedFileReceiver class). Tests are showing dramatic performance increase when reading 1 million rows created with 1024 bytes random values. Current code takes 1000 secs to read but with current path only 175 secs. Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress -n 100 -S 1024 -r, where -r option generates random values). Writes perform a bit better like 5-10%. was (Author: xedin): Patch introduces CompressedDataFile with Input/Output classes. Snappy is used for compression/decompression because it showed better speeds in tests comparing to ning. Files are split into 4 bytes + 64kb chunks where 4 bytes hold information about compressed chunk size. Both Input and Output classes extend RandomAccessFile so random I/O works as expected. All SSTable files are opened using CompressedDataFile.Input. On startup when SSTableReader.open gets called it first checks if data file is already compressed and compresses if it was not already compressed so users won't have a problem after they update. At the header of the file it reserves 8 bytes for a real data size so other components of the system that use SSTables and SSTables itself have no idea that data file is compressed. Streaming of data file sends decompressed chunks for convenience of maintaing transfer and receiving party compresses all data before write to the backing file (see CompressedDataFile.transfer(...) and CompressedFileReceiver class). Tests are showing dramatic performance increase when reading 1 million rows created with 1024 bytes random values. Current code takes 1000 secs to read but with current path only 175 secs. Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress -n 100 -S 1024 -r, where -r option generates random values). Writes perform a bit better like 5-10%. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062228#comment-13062228 ] Pavel Yaskevich edited comment on CASSANDRA-47 at 7/8/11 10:21 PM: --- bq. The -r flag generates random keys: unless you modified stress.java, the values will be the same for every row. oh, sorry! I meant -V not -r also used various cardinality 50-250 in the tests was (Author: xedin): bq. The -r flag generates random keys: unless you modified stress.java, the values will be the same for every row. CASSANDRA-2266 . Also used various cardinality 50-250 in the tests SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062204#comment-13062204 ] Pavel Yaskevich edited comment on CASSANDRA-47 at 7/8/11 10:24 PM: --- Patch introduces CompressedDataFile with Input/Output classes. Snappy is used for compression/decompression because it showed better speeds in tests comparing to ning. Files are split into 4 bytes + 64kb chunks where 4 bytes hold information about compressed chunk size, not that current SSTable file format is preserved and no modifications were made to index, statistics or filter components. Both Input and Output classes extend RandomAccessFile so random I/O works as expected. All SSTable files are opened using CompressedDataFile.Input. On startup when SSTableReader.open gets called it first checks if data file is already compressed and compresses if it was not already compressed so users won't have a problem after they update. At the header of the file it reserves 8 bytes for a real data size so other components of the system that use SSTables and SSTables itself have no idea that data file is compressed. Streaming of data file sends decompressed chunks for convenience of maintaing transfer and receiving party compresses all data before write to the backing file (see CompressedDataFile.transfer(...) and CompressedFileReceiver class). Tests are showing dramatic performance increase when reading 1 million rows created with 1024 bytes random values. Current code takes 1000 secs to read but with current path only 175 secs. Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress -n 100 -S 1024 -V, where -V option generates average size values and different cardinality from 50 (default) to 250). Writes perform a bit better like 5-10%. was (Author: xedin): Patch introduces CompressedDataFile with Input/Output classes. Snappy is used for compression/decompression because it showed better speeds in tests comparing to ning. Files are split into 4 bytes + 64kb chunks where 4 bytes hold information about compressed chunk size, not that current SSTable file format is preserved and no modifications were made to index, statistics or filter components. Both Input and Output classes extend RandomAccessFile so random I/O works as expected. All SSTable files are opened using CompressedDataFile.Input. On startup when SSTableReader.open gets called it first checks if data file is already compressed and compresses if it was not already compressed so users won't have a problem after they update. At the header of the file it reserves 8 bytes for a real data size so other components of the system that use SSTables and SSTables itself have no idea that data file is compressed. Streaming of data file sends decompressed chunks for convenience of maintaing transfer and receiving party compresses all data before write to the backing file (see CompressedDataFile.transfer(...) and CompressedFileReceiver class). Tests are showing dramatic performance increase when reading 1 million rows created with 1024 bytes random values. Current code takes 1000 secs to read but with current path only 175 secs. Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress -n 100 -S 1024 -V, where -V option generates random values). Writes perform a bit better like 5-10%. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062204#comment-13062204 ] Pavel Yaskevich edited comment on CASSANDRA-47 at 7/8/11 10:22 PM: --- Patch introduces CompressedDataFile with Input/Output classes. Snappy is used for compression/decompression because it showed better speeds in tests comparing to ning. Files are split into 4 bytes + 64kb chunks where 4 bytes hold information about compressed chunk size, not that current SSTable file format is preserved and no modifications were made to index, statistics or filter components. Both Input and Output classes extend RandomAccessFile so random I/O works as expected. All SSTable files are opened using CompressedDataFile.Input. On startup when SSTableReader.open gets called it first checks if data file is already compressed and compresses if it was not already compressed so users won't have a problem after they update. At the header of the file it reserves 8 bytes for a real data size so other components of the system that use SSTables and SSTables itself have no idea that data file is compressed. Streaming of data file sends decompressed chunks for convenience of maintaing transfer and receiving party compresses all data before write to the backing file (see CompressedDataFile.transfer(...) and CompressedFileReceiver class). Tests are showing dramatic performance increase when reading 1 million rows created with 1024 bytes random values. Current code takes 1000 secs to read but with current path only 175 secs. Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress -n 100 -S 1024 -V, where -V option generates random values). Writes perform a bit better like 5-10%. was (Author: xedin): Patch introduces CompressedDataFile with Input/Output classes. Snappy is used for compression/decompression because it showed better speeds in tests comparing to ning. Files are split into 4 bytes + 64kb chunks where 4 bytes hold information about compressed chunk size, not that current SSTable file format is preserved and no modifications were made to index, statistics or filter components. Both Input and Output classes extend RandomAccessFile so random I/O works as expected. All SSTable files are opened using CompressedDataFile.Input. On startup when SSTableReader.open gets called it first checks if data file is already compressed and compresses if it was not already compressed so users won't have a problem after they update. At the header of the file it reserves 8 bytes for a real data size so other components of the system that use SSTables and SSTables itself have no idea that data file is compressed. Streaming of data file sends decompressed chunks for convenience of maintaing transfer and receiving party compresses all data before write to the backing file (see CompressedDataFile.transfer(...) and CompressedFileReceiver class). Tests are showing dramatic performance increase when reading 1 million rows created with 1024 bytes random values. Current code takes 1000 secs to read but with current path only 175 secs. Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress -n 100 -S 1024 -r, where -r option generates random values). Writes perform a bit better like 5-10%. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062240#comment-13062240 ] Pavel Yaskevich edited comment on CASSANDRA-47 at 7/9/11 12:14 AM: --- It just refers to uncompressed locations, I didn't see a need to change that. was (Author: xedin): It just refers to uncompressed locations, I didn't see a need to change to change that. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Labels: compression Fix For: 1.0 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033467#comment-13033467 ] Terje Marthinussen edited comment on CASSANDRA-47 at 5/14/11 5:45 AM: -- Just curious if any active work is done or planned near future on compressing larger data blocks or is it all suspended waiting for a new sstable design? Having played with compression of just supercolumns for a while, I am a bit tempted to test out compression of larger blocks of data. At least row level compression seems reasonably easy to do. Some experiences so far which may be usefull: - Compression on sstables may actually be helpfull on memory pressure, but with my current implementation, non-batched update throughput may drop 50%.I am not 100% sure why actually. - Flushing of (compressed) memtables and compactions are clear potential bottlenecks The obvious trouble makers here is the fact that you ceep For really high pressure work, I think it would be usefull to only compress tables once they pass a certain size to reduce the amount of recompression occuring on memtable flushes and when compacting small sstables (which is generally not a big disk problem anyway) This is a bit awkward when doing things like I do in the super columns as I believe the supercolumn does not know anything about the data it is part of (except for recently, the deserializer has that info through inner. It would anyway probably be cleaner to let the datastructures/methods using the SC decide when to compress and noth - Working on a SC level, there seems to be some 10-15% extra compression on this specific data if column names that are highly repetetive in SC's can be extracted into some meta data structure so you only store references to these in the column names. That is, the final data is goes from about 40% compression to 50% compression. I don't think the effect of this will be equally big with larger blocks, but I suspect there should be some effect. - total size reduction of the sstables when using a dictionary for column names as well as timestamps and variable length lenght fields, is currently in the 60-65% range. It is however mainly beneficial for those that have supercolumns with at least a handfull of columns (400-600 bytes of serialized column data per sc at least) - Reducing the meta data on columns by building a dictionary of timestamps as well as variable length name/value length data (instead of fixed short/int) cuts down another 10% in my test (I have just done a very quick simulation of this by a very quick 10 minute hack on the serializer) - We may want to look at how we can reuse whole compressed rows on compactions if for instance the other tables you compact with do not have the same data - We may want a new cache on the uncompressed disk chunks. In my case, I preserve the compressed part of the supercolumn and In my supercolumn compression case, I have a cache for the compressed data so I can write that back without recompression if not modified. This also makes calls to get the serialized size cheaper (don't need to compress both to find serialized size and to actually serialize) If people are interested in adding any of the above to current cassandra, I will try to get time to make some of this up to a quality where it could be used by the general public. If not, I will wait for new sstables to get a bit more ready and see if I can contribute there instead. was (Author: terjem): Just curious if any active work is done or planned near future on compressing larger data blocks or is it all suspended waiting for a new sstable design? Having played with compression of just supercolumns for a while, I am a bit tempted to test out compression of larger blocks of data. At least row level compression seems reasonably easy to do. Some experiences so far which may be usefull: - Compression on sstables may actually be helpfull on memory pressure, but with my current implementation, non-batched update throughput may drop 50%.I am not 100% sure why actually. - Flushing of (compressed) memtables and compactions are clear potential bottlenecks The obvious trouble makers here is the fact that you ceep For really high pressure work, I think it would be usefull to only compress tables once they pass a certain size to reduce the amount of recompression occuring on memtable flushes and when compacting small sstables (which is generally not a big disk problem anyway) This is a bit awkward when doing things like I do in the super columns as I believe the supercolumn does not know anything about the data it is part of (except for recently, the deserializer has that info through inner. It would anyway probably be cleaner to let the datastructures/methods using the SC decide when to compress and noth - Working on a SC
[jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010197#comment-13010197 ] Brandon Williams edited comment on CASSANDRA-47 at 3/23/11 4:16 PM: I think this idea hits the sweet spot where we currently stand. Compression is a *huge* win for us, and not having to rewrite the entire format simplifies the complexity greatly. was (Author: brandon.williams): I think is idea hits the sweet spot where we currently stand. Compression is a *huge* win for us, and not having to rewrite the entire format simplifies the complexity greatly. SSTable compression --- Key: CASSANDRA-47 URL: https://issues.apache.org/jira/browse/CASSANDRA-47 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Priority: Minor Fix For: 0.8 We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira