[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
[ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081569#comment-13081569 ] Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/9/11 11:25 AM: - bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw). Not a problem since it was due to Sylvain's bad merge. bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that). Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide. bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else. Checksum is moved to the original data. bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time). Test was added to CompressedRandomAccessReaderTest. bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite frankly if it is that simple and since the hadoop guys have done all the hard work for us, I say we start with the efficient version directly. As decided previously this will be a matter of the separate ticket. Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f) was (Author: xedin): bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw). Not a problem since it was due to Sylvain's bad merge. bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that). Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide. bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else. Checksum is moved to the original data. bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time). Test was added to CompressedRandomAccessReaderTest. As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite
[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
[ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081569#comment-13081569 ] Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/9/11 11:29 AM: - bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw). Not a problem since it was due to Sylvain's bad merge. bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that). Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide. bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else. It checksums original (non-compressed) data and stores checksum at the end of the compressed chunk, reader makes a checksum check after decompression. bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time). Test was added to CompressedRandomAccessReaderTest. bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite frankly if it is that simple and since the hadoop guys have done all the hard work for us, I say we start with the efficient version directly. As decided previously this will be a matter of the separate ticket. Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f) was (Author: xedin): bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw). Not a problem since it was due to Sylvain's bad merge. bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that). Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide. bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else. Checksum is moved to the original data. bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time). Test was added to CompressedRandomAccessReaderTest. bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems
[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
[ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079934#comment-13079934 ] Lior Golan edited comment on CASSANDRA-1717 at 8/5/11 12:31 PM: Seems like in terms of overhead (which based on HADOOP-6148 is potentially very significant in both storage and CPU) - block level checksums is much better. I understand you believe block level checksums are easy in the compressed case but not easy in the non-compressed case. So can't you just implement a no-op compression option that will utilize what you're doing / planning to do for compression in terms of block structure and block level checksums? That would be easy if you already designed the compression algorithm to be plugable. And if the compression algorithm is not plugable yet - adding that would have an obvious side benefit besides having easier implementation of block level checksums. was (Author: liorgo2): Seems like in terms of overhead (which based on HADOOP-6148 is potentially very significant in both storage and CPU) - block level checksums is much better. I understand you believe block level checksums are easy in the compressed case but to not easy in the non-compressed case. So can't you just implement a no-op compression option that will utilize what you're doing for compression in terms of block structure and block level checksums. That would be easy if you already designed for the compression algorithm to be plugable. And if the compression algorithm is not plugable yet - adding that would have an obvious side benefit besides an easier implementation of block level checksums. Cassandra cannot detect corrupt-but-readable column data Key: CASSANDRA-1717 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 1.0 Attachments: checksums.txt Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy. But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
[ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079616#comment-13079616 ] Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/4/11 9:39 PM: This is a good idea but it has few complications: - buffer length should be stored in order to be used by reader - reads should be aligned by that buffer length so we always read a whole checksummed chunk of the data which implies that we will potentially always need to read more data on each request This seems to be a clear tradeoff between using additional space to store checksum for index + columns for each row v.s. doing more I/O... was (Author: xedin): This is a good idea but it has few complications: - buffer length should be store in order to be used by reader - reads should be aligned by that buffer length so we always read a whole checksummed chunk of the data which implies that we will potentially always need to read more data on each request This seems to be a clear tradeoff between using additional space to store checksum for index + columns for each row v.s. doing more I/O... Cassandra cannot detect corrupt-but-readable column data Key: CASSANDRA-1717 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 1.0 Attachments: checksums.txt Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy. But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
[ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001649#comment-13001649 ] Benjamin Coverston edited comment on CASSANDRA-1717 at 3/3/11 12:13 AM: Would we really save that much by waiting until #16 is done? Perhaps we should take a shot at this in 0.7. Right now it's possible to have data corrupted, then replicated leading to loss of data. Edit: Apparently 16 is already done. was (Author: bcoverston): Would we really save that much by waiting until #16 is done? Perhaps we should take a shot at this in 0.7. Right now it's possible to have data corrupted, then replicated leading to loss of data. Cassandra cannot detect corrupt-but-readable column data Key: CASSANDRA-1717 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Fix For: 0.8 Attachments: checksums.txt Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy. But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira