[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485756#comment-13485756
 ] 

Ted Yu commented on HBASE-6597:
---

Update on my recent fidings.
I came up with patch for 0.94 branch.
Most data block encoding related tests pass.
TestHFileBlockCompatibility poses a little challenge. There is no embedded 
checksum feature in 0.89-fb branch. So this test is unique to 0.94 / trunk.
In the test, there is a copy of Writer class which I assume shouldn't be 
modified, at least not for a point release.
The test reuses some code from TestHFileBlock.java where there is some change 
related to usage of Writer:
{code}
- static int writeTestKeyValues(OutputStream dos, int seed, boolean 
includesMemstoreTS)
+ static void writeTestKeyValues(OutputStream dos, Writer hbw, int seed, 
boolean includesMemstoreTS)
{code}
This is the test failure I am observing now:
{code}
testDataBlockEncoding[0](org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility)
  Time elapsed: 0.129 sec   FAILURE!
org.junit.ComparisonFailure: Content mismath for compression NONE, encoding 
PREFIX, pread false, commonPrefix 2, expected length 1859, actual length 1859 
expected:\x00\x00\x0[B\xB8]*\x0A\x00\x00\x0A\x0... but 
was:\x00\x00\x0[0\x00]*\x0A\x00\x00\x0A\x0...
  at org.junit.Assert.assertEquals(Assert.java:125)
  at 
org.apache.hadoop.hbase.io.hfile.TestHFileBlock.assertBuffersEqual(TestHFileBlock.java:463)
  at 
org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility.testDataBlockEncoding(TestHFileBlockCompatibility.java:337)
{code}

 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, 
 D5895.3.patch, D5895.4.patch, D5895.5.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484436#comment-13484436
 ] 

Andrew Purtell commented on HBASE-6597:
---

[~ted_yu] Are you planning to fix the compilation errors?

 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, 
 D5895.3.patch, D5895.4.patch, D5895.5.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484441#comment-13484441
 ] 

Ted Yu commented on HBASE-6597:
---

I am planning to continue the work.

 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, 
 D5895.3.patch, D5895.4.patch, D5895.5.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-23 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482543#comment-13482543
 ] 

Phabricator commented on HBASE-6597:


mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  Replying to comments inline.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29
 Updated the comment.
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:34 
Done.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:35 
Done.
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java:132 Updated 
the comment. length  Bytes.SIZEOF_INT is correct (this is the length of the 
part of the array we are allowed to write into).
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:445 Done.
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:446 No. 
The two last parameters are for assertEquals.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, 
 D5895.4.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-23 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482737#comment-13482737
 ] 

Phabricator commented on HBASE-6597:


mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  Verified that the effective in-cache block size for encoded blocks is close 
enough to the configured size during a load test. Committing.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, 
 D5895.4.patch, D5895.5.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-23 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482818#comment-13482818
 ] 

Phabricator commented on HBASE-6597:


mbautin has abandoned the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  Committed as rHBASEEIGHTNINEFBBRANCH1401499.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, 
 D5895.4.patch, D5895.5.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-22 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481530#comment-13481530
 ] 

Phabricator commented on HBASE-6597:


Kannan has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29
 Regarding the second bullet (in my comment), I suppose it is ok to leave this 
as is... as it does simplify the calling logic a little bit. We should just add 
here in comments that

  * this is used for non-encoded blocks.
  * and, keeps blocks in old format (without the DBE specific headers).
  
src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:35 
use integer compression for key, value and prefix (7-bit encoding)
  --
  use integer compression for key, value and prefix lengths (7-bit encoding)

  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:34 
ditto (as other comment)

  s/prefix/prefix lengths

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, 
 D5895.4.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-22 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481531#comment-13481531
 ] 

Phabricator commented on HBASE-6597:


Kannan has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  Mikhail - looks great. Pending comments are very trivial.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, 
 D5895.4.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-19 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480431#comment-13480431
 ] 

Phabricator commented on HBASE-6597:


Kannan has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  comments thus far...

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:445 
mismath - mismatch
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:446 don't 
you need two more %s's.
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java:132 shouldn't 
this be:

  if (length - offset  Bytes.SIZEOF_INT)

  
src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29
 I think this comment is no longer valid.

  * It gets used for ENCODING = 'NONE' case now correct?
  * Wondering now, if that was a correct choice... because we seem to be having 
to jump through some hoops to handle this encoder as a separate case (such as 
to not write the headers, etc.).


REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, 
 D5895.4.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-08 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471766#comment-13471766
 ] 

Phabricator commented on HBASE-6597:


mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:38
 Done.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:93
 Yes, that's a good catch.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:340 
Done.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:420 
Done (here and in other encoded writers).
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:422 
Yes, the comment was incorrect. Moved this field to the encoder state.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:497 
Done.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:411 
Done.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:173
 Done.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:186
 Yes, this javadoc was incorrect. Moved this to encoder state.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-08 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471784#comment-13471784
 ] 

Phabricator commented on HBASE-6597:


mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:76
 includesMemstoreTS means that we are memstore timestamp is part of both input 
and output. We don't change that aspect of the data format on data block 
encoding/decoding.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:119
 Added an assertion to BufferedEncodedWriter. The code below won't make 
currentState null if it is not null initially.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447
 As you can see, the DataBlockEncoder class does not have a lot of state 
(unlike the EncodedWriter) so I don't know what else I could include here.
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:96 Oops. 
That was for debugging. Good catch!

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-08 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471799#comment-13471799
 ] 

Phabricator commented on HBASE-6597:


tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447
 Got you.

  Then this implementation is fine.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471330#comment-13471330
 ] 

Phabricator commented on HBASE-6597:


tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  Code is cleaner in patch v2.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:173
 This class can be private.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:186
 Javadoc and code mismatch.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:76
 Would memstoreTS always be part of in ?
  If so, do you need to advance the position inside in when includesMemstoreTS 
is false ?
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:119
 Consider adding an assertion.
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447
 Do you want to include more information in String representation ?
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:96 Why 
skipping the case of includesMemstoreTS being true ?

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471355#comment-13471355
 ] 

Phabricator commented on HBASE-6597:


tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  My previous comments for PrefixKeyDeltaEncoder.java were for patch v1.
  Phabricator remembered my unfinished comments and sent them out.

  FYI

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471358#comment-13471358
 ] 

Phabricator commented on HBASE-6597:


tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  I got some test failures in TestCacheOnWrite:


testStoreFileCacheOnWrite[2](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite):
 expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but 
was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...

testStoreFileCacheOnWrite[5](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite):
 expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but 
was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...

testStoreFileCacheOnWrite[8](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite):
 expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but 
was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...

testStoreFileCacheOnWrite[11](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite):
 expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but 
was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...

testStoreFileCacheOnWrite[14](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite):
 expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but 
was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...

testStoreFileCacheOnWrite[17](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite):
 expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT... but 
was:{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...

  Here is one of the above:

  
testStoreFileCacheOnWrite[2](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite) 
 Time elapsed: 0.295 sec   FAILURE!
  org.junit.ComparisonFailure: expected:{ENCODED_DATA=9[65, LEAF_INDEX=121], 
BLOOM_CHUNK=9, INT... but was:{ENCODED_DATA=9[91, LEAF_INDEX=124], 
BLOOM_CHUNK=9, INT...
at org.junit.Assert.assertEquals(Assert.java:123)
at org.junit.Assert.assertEquals(Assert.java:145)
at 
org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.readStoreFile(TestCacheOnWrite.java:259)
at 
org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testStoreFileCacheOnWrite(TestCacheOnWrite.java:203)

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D5895.1.patch, D5895.2.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470475#comment-13470475
 ] 

Phabricator commented on HBASE-6597:


tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:38
 Length of prefix is returned.
  Name this method getCommonPrefixLength ?
  
src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:93
 Do we need to consider memstoreTS ?
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:340 
Check / assert that skipLastBytes is not negative ?
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:422 
prevKey stores the previous key, right ?
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:497 
Name this variable negativeDiffTimestamp ?
  
src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:411 
This class can be private, right ?
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:420 
This class can be private, right ?

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Priority: Minor
 Attachments: D5895.1.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

2012-10-04 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469933#comment-13469933
 ] 

Phabricator commented on HBASE-6597:


mbautin has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental 
data block encoding.

  The original review (Facebook-internal, just for the record): 
https://phabricator.fb.com/D554523

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin


 Block Encoding Size Estimation
 --

 Key: HBASE-6597
 URL: https://issues.apache.org/jira/browse/HBASE-6597
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.89-fb
Reporter: Brian Nixon
Priority: Minor
 Attachments: D5895.1.patch


 Blocks boundaries as created by current writers are determined by the size of 
 the unencoded data. However, blocks in memory are kept encoded. By using an 
 estimate for the encoded size of the block, we can get greater consistency in 
 size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira