[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485756#comment-13485756 ] Ted Yu commented on HBASE-6597: --- Update on my recent fidings. I came up with patch for 0.94 branch. Most data block encoding related tests pass. TestHFileBlockCompatibility poses a little challenge. There is no embedded checksum feature in 0.89-fb branch. So this test is unique to 0.94 / trunk. In the test, there is a copy of Writer class which I assume shouldn't be modified, at least not for a point release. The test reuses some code from TestHFileBlock.java where there is some change related to usage of Writer: {code} - static int writeTestKeyValues(OutputStream dos, int seed, boolean includesMemstoreTS) + static void writeTestKeyValues(OutputStream dos, Writer hbw, int seed, boolean includesMemstoreTS) {code} This is the test failure I am observing now: {code} testDataBlockEncoding[0](org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility) Time elapsed: 0.129 sec FAILURE! org.junit.ComparisonFailure: Content mismath for compression NONE, encoding PREFIX, pread false, commonPrefix 2, expected length 1859, actual length 1859 expected:\x00\x00\x0[B\xB8]*\x0A\x00\x00\x0A\x0... but was:\x00\x00\x0[0\x00]*\x0A\x00\x00\x0A\x0... at org.junit.Assert.assertEquals(Assert.java:125) at org.apache.hadoop.hbase.io.hfile.TestHFileBlock.assertBuffersEqual(TestHFileBlock.java:463) at org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility.testDataBlockEncoding(TestHFileBlockCompatibility.java:337) {code} Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484436#comment-13484436 ] Andrew Purtell commented on HBASE-6597: --- [~ted_yu] Are you planning to fix the compilation errors? Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484441#comment-13484441 ] Ted Yu commented on HBASE-6597: --- I am planning to continue the work. Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482543#comment-13482543 ] Phabricator commented on HBASE-6597: mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. Replying to comments inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29 Updated the comment. src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:34 Done. src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:35 Done. src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java:132 Updated the comment. length Bytes.SIZEOF_INT is correct (this is the length of the part of the array we are allowed to write into). src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:445 Done. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:446 No. The two last parameters are for assertEquals. REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482737#comment-13482737 ] Phabricator commented on HBASE-6597: mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. Verified that the effective in-cache block size for encoded blocks is close enough to the configured size during a load test. Committing. REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482818#comment-13482818 ] Phabricator commented on HBASE-6597: mbautin has abandoned the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. Committed as rHBASEEIGHTNINEFBBRANCH1401499. REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481530#comment-13481530 ] Phabricator commented on HBASE-6597: Kannan has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29 Regarding the second bullet (in my comment), I suppose it is ok to leave this as is... as it does simplify the calling logic a little bit. We should just add here in comments that * this is used for non-encoded blocks. * and, keeps blocks in old format (without the DBE specific headers). src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:35 use integer compression for key, value and prefix (7-bit encoding) -- use integer compression for key, value and prefix lengths (7-bit encoding) src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:34 ditto (as other comment) s/prefix/prefix lengths REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481531#comment-13481531 ] Phabricator commented on HBASE-6597: Kannan has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. Mikhail - looks great. Pending comments are very trivial. REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480431#comment-13480431 ] Phabricator commented on HBASE-6597: Kannan has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. comments thus far... INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:445 mismath - mismatch src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:446 don't you need two more %s's. src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java:132 shouldn't this be: if (length - offset Bytes.SIZEOF_INT) src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29 I think this comment is no longer valid. * It gets used for ENCODING = 'NONE' case now correct? * Wondering now, if that was a correct choice... because we seem to be having to jump through some hoops to handle this encoder as a separate case (such as to not write the headers, etc.). REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471766#comment-13471766 ] Phabricator commented on HBASE-6597: mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:38 Done. src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:93 Yes, that's a good catch. src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:340 Done. src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:420 Done (here and in other encoded writers). src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:422 Yes, the comment was incorrect. Moved this field to the encoder state. src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:497 Done. src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:411 Done. src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:173 Done. src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:186 Yes, this javadoc was incorrect. Moved this to encoder state. REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471784#comment-13471784 ] Phabricator commented on HBASE-6597: mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:76 includesMemstoreTS means that we are memstore timestamp is part of both input and output. We don't change that aspect of the data format on data block encoding/decoding. src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:119 Added an assertion to BufferedEncodedWriter. The code below won't make currentState null if it is not null initially. src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447 As you can see, the DataBlockEncoder class does not have a lot of state (unlike the EncodedWriter) so I don't know what else I could include here. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:96 Oops. That was for debugging. Good catch! REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471799#comment-13471799 ] Phabricator commented on HBASE-6597: tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447 Got you. Then this implementation is fine. REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471330#comment-13471330 ] Phabricator commented on HBASE-6597: tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. Code is cleaner in patch v2. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:173 This class can be private. src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:186 Javadoc and code mismatch. src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:76 Would memstoreTS always be part of in ? If so, do you need to advance the position inside in when includesMemstoreTS is false ? src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:119 Consider adding an assertion. src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447 Do you want to include more information in String representation ? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:96 Why skipping the case of includesMemstoreTS being true ? REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471355#comment-13471355 ] Phabricator commented on HBASE-6597: tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. My previous comments for PrefixKeyDeltaEncoder.java were for patch v1. Phabricator remembered my unfinished comments and sent them out. FYI REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471358#comment-13471358 ] Phabricator commented on HBASE-6597: tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. I got some test failures in TestCacheOnWrite: testStoreFileCacheOnWrite[2](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT... testStoreFileCacheOnWrite[5](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT... testStoreFileCacheOnWrite[8](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT... testStoreFileCacheOnWrite[11](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT... testStoreFileCacheOnWrite[14](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT... testStoreFileCacheOnWrite[17](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT... Here is one of the above: testStoreFileCacheOnWrite[2](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite) Time elapsed: 0.295 sec FAILURE! org.junit.ComparisonFailure: expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT... at org.junit.Assert.assertEquals(Assert.java:123) at org.junit.Assert.assertEquals(Assert.java:145) at org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.readStoreFile(TestCacheOnWrite.java:259) at org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testStoreFileCacheOnWrite(TestCacheOnWrite.java:203) REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch, D5895.2.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470475#comment-13470475 ] Phabricator commented on HBASE-6597: tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:38 Length of prefix is returned. Name this method getCommonPrefixLength ? src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:93 Do we need to consider memstoreTS ? src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:340 Check / assert that skipLastBytes is not negative ? src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:422 prevKey stores the previous key, right ? src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:497 Name this variable negativeDiffTimestamp ? src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:411 This class can be private, right ? src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:420 This class can be private, right ? REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Priority: Minor Attachments: D5895.1.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469933#comment-13469933 ] Phabricator commented on HBASE-6597: mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. The original review (Facebook-internal, just for the record): https://phabricator.fb.com/D554523 REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Priority: Minor Attachments: D5895.1.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira