[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-12-07 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164613#comment-13164613
 ] 

Jean-Daniel Cryans commented on HBASE-3857:
---

Any reason why this patch is removing compaction and flush queue sizes?

{code}
-this.metrics.compactionQueueSize.set(compactSplitThread
-.getCompactionQueueSize());
-this.metrics.flushQueueSize.set(cacheFlusher
-.getFlushQueueSize());
{code}

If it was intentional, there's a bunch of dead code that also needs to be 
removed like those methods that were called. If it wasn't, meaning there's 
currently no way in 0.92 to get the compaction queue size, then this would be 
sufficient for me to kill the RC.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-Adding-release-notes-for-HBASE-3857.patch, 
 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 
 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-12-07 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164618#comment-13164618
 ] 

Zhihong Yu commented on HBASE-3857:
---

@J-D:
Nice catch.

We should open another JIRA to deal with queue sizes.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-Adding-release-notes-for-HBASE-3857.patch, 
 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 
 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-12-07 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164619#comment-13164619
 ] 

Jean-Daniel Cryans commented on HBASE-3857:
---

Yeah I just want to verify first what's the situation, maybe I'm missing 
something.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-Adding-release-notes-for-HBASE-3857.patch, 
 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 
 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078886#comment-13078886
 ] 

Ted Yu commented on HBASE-3857:
---

I got one test failure for 
0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch:
{code}
Failed tests:   
testGzipCompression(org.apache.hadoop.hbase.io.hfile.TestHFileBlock): 
expected:...0\x00\x00\x00\x00\x0[0]\xED\xC3\xC1\x11\x00... but 
was:...0\x00\x00\x00\x00\x0[3]\xED\xC3\xC1\x11\x00...
at org.junit.Assert.assertEquals(Assert.java:123)
at org.junit.Assert.assertEquals(Assert.java:145)
at 
org.apache.hadoop.hbase.io.hfile.TestHFileBlock.testGzipCompression(TestHFileBlock.java:126){code}


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-03 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078979#comment-13078979
 ] 

Mikhail Bautin commented on HBASE-3857:
---

TL;DR version: Can we commit this patch?

@Nicolas: you mentioned some last-minute sanity checking you wanted to do.

Here are some recent test results with and without the patch. I am running the 
tests against the stable revision 1153300. All the sporadically failing unit 
tests with the patch applied (TestDistributedLogSplitting, TestHRegion, 
TestServerCustomProtocol) also intermittently fail without the patch. The 
failures look similar for the two versions of code, as shown below. The most 
recent version of the patch is also passing automated randomized read/write 
load testing. Given all of that, I think it is safe to assume that the patch 
does not introduce a regression. 

Note: some of these test failures might be problems with my test setup.

TestServerCustomProtocol: java.lang.AssertionError: Results should contain 
region test,,1312378402385.4b887abff65a2e74eab86c859d255c89. for row 'aaa'
TestDistirbutedLogSplitting:

testThreeRSAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)  
Time elapsed: 95.921 sec   FAILURE!
java.lang.AssertionError: 
. . .
at 
org.apache.hadoop.hbase.master.TestDistributedLogSplitting.testThreeRSAbort
TestHRegion:
testWritesWhileGetting(org.apache.hadoop.hbase.regionserver.TestHRegion)  
Time elapsed: 0.398 sec   FAILURE!
junit.framework.AssertionFailedError: expected:\x00\x00\x008 but 
was:\x00\x00\x006
at 
org.apache.hadoop.hbase.HBaseTestCase.assertEquals(HBaseTestCase.java:691)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:2759)

With the patch:

2011-08-03_02_58_34 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, 
fail: 2, err: 0, skip: 9, time: 4784.9, failed: DistributedLogSplitting, HRegion
2011-08-03_03_24_45 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, 
fail: 0, err: 0, skip: 9, time: 4544.1
2011-08-03_05_00_32 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, 
fail: 1, err: 0, skip: 9, time: 4735.5, failed: HRegion
2011-08-03_05_21_43 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, 
fail: 1, err: 0, skip: 9, time: 4485.4, failed: ServerCustomProtocol
2011-08-03_07_00_19 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, 
fail: 0, err: 0, skip: 9, time: 4624.5
2011-08-03_07_14_08 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, 
fail: 1, err: 0, skip: 9, time: 4687.8, failed: ServerCustomProtocol
2011-08-03_09_08_22 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, 
fail: 1, err: 0, skip: 9, time: 4711.4, failed: ServerCustomProtocol

Without the patch:
2011-08-02_19_23_26 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, 
fail: 0, err: 0, skip: 9, time: 4496.4
2011-08-02_21_19_54 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, 
fail: 0, err: 0, skip: 9, time: 4520.2
2011-08-02_23_15_58 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, 
fail: 0, err: 0, skip: 9, time: 4438.1
2011-08-03_01_10_12 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, 
fail: 0, err: 0, skip: 9, time: 4503.5
2011-08-03_03_05_10 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, 
fail: 1, err: 0, skip: 9, time: 4550.1, failed: HRegion
2011-08-03_05_00_26 commit: HBASE-3065 don't prepend MAGIC if d | tests: 836, 
fail: 1, err: 0, skip: 9, time: 4418.0, failed: ServerCustomProtocol
2011-08-03_07_05_57 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, 
fail: 1, err: 0, skip: 9, time: 4443.9, failed: Replication
2011-08-03_08_59_20 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, 
fail: 2, err: 0, skip: 9, time: 4427.2, failed: CoprocessorEndpoint, 
DistributedLogSplitting
2011-08-03_10_52_41 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, 
fail: 0, err: 0, skip: 9, time: 4483.8


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the 

[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078980#comment-13078980
 ] 

Ted Yu commented on HBASE-3857:
---

Committed to TRUNK.
I moved the binary file to 
src/test/resources/org/apache/hadoop/hbase/8e8ab58dcf39412da19833fcd8f687ac

Thanks for the great work, Mikhail.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078988#comment-13078988
 ] 

Hudson commented on HBASE-3857:
---

Integrated in HBase-TRUNK #2076 (See 
[https://builds.apache.org/job/HBase-TRUNK/2076/])
HBASE-3857 New files
HBASE-3857 added the binary V1 HFile
HBASE-3857  Change the HFile Format (Mikhail  Liyin)

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterWriter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

tedyu : 
Files : 
* 
/hbase/trunk/src/test/resources/org/apache/hadoop/hbase/8e8ab58dcf39412da19833fcd8f687ac

tedyu : 
Files : 
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFilePerformance.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/RandomSeek.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCachedBlockQueue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HServerLoad.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/CachedBlock.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileSeek.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Hash.java
* /hbase/trunk/src/main/resources/hbase-default.xml
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestByteBloomFilter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompressionTest.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/ByteBloomFilter.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support 

[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079102#comment-13079102
 ] 

Hudson commented on HBASE-3857:
---

Integrated in HBase-TRUNK #2077 (See 
[https://builds.apache.org/job/HBase-TRUNK/2077/])
HBASE-3857 binary HFile should be under io/hfile
HBASE-3857 relocated binary HFile
HBASE-3857 New test classes.
HBASE-3857 New files
HBASE-3857 more new files

tedyu : 
Files : 
* 
/hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/hfile/8e8ab58dcf39412da19833fcd8f687ac
* 
/hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/file/8e8ab58dcf39412da19833fcd8f687ac

tedyu : 
Files : 
* /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/hfile
* 
/hbase/trunk/src/test/resources/org/apache/hadoop/hbase/8e8ab58dcf39412da19833fcd8f687ac
* /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io
* 
/hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/file/8e8ab58dcf39412da19833fcd8f687ac
* /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/file

tedyu : 
Files : 
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestIdLock.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java

tedyu : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/IdLock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/DynamicByteBloomFilter.java

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilterBase.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/DoubleOutputStream.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InlineBlockWriter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterBase.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilterWriter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079166#comment-13079166
 ] 

Ted Yu commented on HBASE-3857:
---

@Mikhail:
Test fix committed.

Thanks for the follow-up.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 
 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079197#comment-13079197
 ] 

Hudson commented on HBASE-3857:
---

Integrated in HBase-TRUNK #2078 (See 
[https://builds.apache.org/job/HBase-TRUNK/2078/])
HBASE-3857  Fix TestHFileBlock.testBlockHeapSize test failure (Mikhail)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 
 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 
 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-01 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073814#comment-13073814
 ] 

Mikhail Bautin commented on HBASE-3857:
---

The new version of the patch is successfully passing the randomized load test 
as well.

I am trying to compare unit test results between r1152122 and the same version 
with the patch applied.

Without the patch:
2011-08-01_10_55_23 commit: HBASE-4144  RS does not abort if th | tests: 812, 
fail: 4, err: 14, skip: 9, time: 4791.1, failed: FullLogReconstruction, 
DistributedLogSplitting, SplitTransactionOnCluster, ScannerTimeout, 
MasterFailover, MultiParallel

With the patch:
2011-08-01_11_34_28 commit: review_hfile-v2-r1152122-2011_08_01 | tests: 845, 
fail: 5, err: 14, skip: 9, time: 5426.5, failed: FullLogReconstruction, 
DistributedLogSplitting, ServerCustomProtocol, Replication, 
SplitTransactionOnCluster, ScannerTimeout, MasterFailover, MultiParallel

Looking at the ServerCustomProtocol (the only test that has failures with the 
patch but not without), I see that this is a quite frequent intermittent test 
failure in my automated runs of HBase trunk tests.

Any advice about what version I should select as the baseline for my unit test 
run? The only trunk version that I managed to get a clean unit test run with 
was r1147350, and my patch against that passed all unit tests as well, but I 
think some changes were introduced since then that made the patch not apply 
cleanly, so I rebased the patch to a more recent version (and got all the test 
failures of that version). 

Notwithstanding all the above, I believe the patch is currently in a stable 
state and can be committed, and I will confirm that once a few more unit test 
runs complete.


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073819#comment-13073819
 ] 

Ted Yu commented on HBASE-3857:
---

@Mikhail:
Thanks for your effort of bringing this closer to checkin.

Minor note: Replication was new in your second list of unit tests above.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-01 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073880#comment-13073880
 ] 

Mikhail Bautin commented on HBASE-3857:
---

@Ted:

Yes, you are right, TestReplication also failed with the patch applied while it 
did not fail without the patch. The failure message was as follows:

java.lang.AssertionError: Waited too much time for queueFailover replication

Looking at my archive of unit test results for the trunk, TestReplication 
failure shows up in 67 cases out of 291 total runs of the test suite, so I 
suspect it is a highly problematic unit test. I will look into this a bit more.

However, my question is the following: with the extremely agile development 
process and a non-trivial number of routine unit test failures in the trunk, 
what is the accepted approach of testing new patches? How do people select the 
right revision to test their patch against? Is it always the trunk, and are the 
existing unit test failures in the trunk ignored (which is bad because this may 
mask new bugs), or is there a different approach?


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-01 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073982#comment-13073982
 ] 

Jean-Daniel Cryans commented on HBASE-3857:
---

The reason TestReplication sometimes fails is HBASE-3515, if you see that stack 
trace then you can disregard. If not, that's a new bug.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-08-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076009#comment-13076009
 ] 

Ted Yu commented on HBASE-3857:
---

@Mikhail:
If 2011-08-01_11_34_28 test failures match those in TRUNK build 2067 (18 
failures), that would give us confidence of not introducing regression.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-07-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072027#comment-13072027
 ] 

stack commented on HBASE-3857:
--

@Mikhail You see Todd's comments.  Will you have a chance to work on them?  Let 
us know if not and someone of us will take a crack at it.  Thanks.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-07-27 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072033#comment-13072033
 ] 

Mikhail Bautin commented on HBASE-3857:
---

@Todd: thank you for your comments!
@Michael: I will try to make changes and post another version of the patch 
tonight or tomorrow.

By the way, I've been running an automated load test of the trunk with the 
patch applied on a 5-node cluster, and it seems to work fine. Here are some 
stats, FWIW, although without the details of this particular load test the 
following is probably incomprehensible. I guess we will open-source the load 
test later -- need to talk about this with Nicolas and other folks. It's 
basically a bunch of random insertions and random full-row reads.

11/07/27 14:30:13 INFO utils.MultiThreadedAction: [W:20] Keys = 54313718, cols 
= 2B, time = 17:15:04 Overall: [keys/s = 874, latency = 22 ms] Current: [keys/s 
= 1323, latency = 14 ms]
11/07/27 14:30:13 INFO utils.MultiThreadedAction: [R:20] Keys = 100115674, cols 
= 5B, time = 17:15:04 Overall: [keys/s = 1612, latency = 12 ms] Current: 
[keys/s = 410, latency = 49 ms], verified = 50061516

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, 
 hfile_format_v2_design_draft_0.4.odt


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-07-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071467#comment-13071467
 ] 

stack commented on HBASE-3857:
--

@Mikhail Thanks for updated patch.  Do you have a couple of answers for my 
questions above at 
https://issues.apache.org/jira/browse/HBASE-3857?focusedCommentId=13068151page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13068151?

Thanks

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-07-26 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071473#comment-13071473
 ] 

Mikhail Bautin commented on HBASE-3857:
---

@Michael:

1. From my conversations with Nicolas and Jonathan I got a sense that HFile v2 
would be part of 0.94, but if we can get it into 0.92, that would be even 
better, because I think in that case we will have to do less merging further 
down the road. I am also doing some load-testing of the patch on a 5-node 
cluster this week.
2. I will attach the spec document in the OpenOffice format later tonight (as 
soon as I can download OpenOffice and convert the doc).

Thank you!
--Mikhail


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-07-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071474#comment-13071474
 ] 

Ted Yu commented on HBASE-3857:
---

+1 on putting this into 0.92

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-07-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069377#comment-13069377
 ] 

stack commented on HBASE-3857:
--

Resolve hbase-3417 when this goes in.  This subsumes it.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-07-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068151#comment-13068151
 ] 

stack commented on HBASE-3857:
--

@Mikhail I'll finish my skim through your quality patch this evening but 
meantime here are a few things:

1. Mind if I run a vote up on dev on committing this to trunk?  (My sense is 
that some would rather it go in after 0.92 branch is cut but I'm thinking it 
should be included in 0.92, our next major release; its a radical change but 
the quality of the contribution, the tests included, the careful consideration 
given to migrating in-place, and that you fellas have been running this in 
house make me think it low risk pulling this in now.  It looks like Andrew 
agrees with me).
2. Would you mind attaching the src document for your pdf design?  It looks 
like its word.  My thinking is that we could use the openoffice saveas docbook 
to get a rough cut at a docbook section on this new format that we could work 
over to include in our site documentation.

This stuff is excellent. 

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 
 hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-07-14 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065620#comment-13065620
 ] 

Mikhail Bautin commented on HBASE-3857:
---

I have uploaded a diff at https://reviews.apache.org/r/1134/ containing HFile 
v2 changes, multi-level block indexes, and compound Bloom filters. The patch 
can be downloaded at https://reviews.apache.org/r/1134/diff/raw/.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf, 
 hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-06-27 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055794#comment-13055794
 ] 

Mikhail Bautin commented on HBASE-3857:
---

@Stack: there were no significant changes to the spec, just cleanup after the 
implementation is complete and working. Is it safe to assume that the file 
format itself is OK for production deployment, i.e. that we will not have to 
write migration software from HFile format v2 to an HFile format v2.1 as a 
result of review comments? I will get the patch out for review this week.


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf, 
 hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-06-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056195#comment-13056195
 ] 

stack commented on HBASE-3857:
--

@Mikhail When you put up the patch I can see if migration is needed (IIRC, your 
design had it that you could deal with both old and new hfiles).  Good on you.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf, 
 hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-06-27 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056201#comment-13056201
 ] 

Mikhail Bautin commented on HBASE-3857:
---

@Stack: no separate migration is needed to go from HFile v1 to HFile v2, 
because the new code can read both formats. What we are concerned about is that 
if we deploy HFile v2 in production and then get review comments that require a 
format change, we might have to migrate from HFile-v2-as-it-currently-is to 
HFile-v2-final-with-all-comments-addressed.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf, 
 hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-06-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056202#comment-13056202
 ] 

stack commented on HBASE-3857:
--

bq. What we are concerned about is that if we deploy HFile v2 in production and 
then get review comments that require a format change, we might have to migrate 
from HFile-v2-as-it-currently-is to HFile-v2-final-with-all-comments-addressed.

How can we help with that?  If you post the code before you roll it out, my 
guess is that you'd get some outside reviewers.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf, 
 hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-06-27 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056203#comment-13056203
 ] 

Mikhail Bautin commented on HBASE-3857:
---

I agree. I guess it might be difficult to say that the new format is OK based 
on the spec only. I am working on getting the patch out asap.


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf, 
 hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-06-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050893#comment-13050893
 ] 

stack commented on HBASE-3857:
--

@Mikhail Did much change?  Thanks.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf, 
 hfile_format_v2_design_draft_0.3.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-06-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042596#comment-13042596
 ] 

Jason Rutherglen commented on HBASE-3857:
-

I think the FST created in LUCENE-2792 could be used to compress the rowids in 
the HFile while simultaneously enabling fast lookup.  

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033270#comment-13033270
 ] 

stack commented on HBASE-3857:
--

Thanks lads for the answers.  Helps.

High level, do you lads think this format 'brittle'?  Will corruption or an 
error logic writing any piece of the file render the file as a whole unreadable 
or large swaths of the file unreadable? A corrupt root index makes the file 
totally unreadable I suppose.   A corrupt intermediate index will render all 
subbranches unreadable?  So, this file format seems more 'brittle' than V1 
because of the chaining between the index parts (root to intermediate, etc.)?  
What do you think?  Its unavoidable I suppose if we want the nice feature that 
Liyin describes where we dump out index as we cross over an index size 
threshold (And yes Mikhail, in V1, there is not code that makes use of the 
'magic' to skip bad bits of the file.  And does 'magic' for a parser to pick up 
the parse again even make sense on a filesystem that is checksummed?  Or, in 
your words ' I am not sure what are the specific data corruption cases [magic] 
might help fix.')

@Mikhail

I forgot we were vint'ing already.  Its probably not a bad idea having root 
keep same format as old v1 index.




 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-13 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033295#comment-13033295
 ] 

Mikhail Bautin commented on HBASE-3857:
---

Hi St.Ack,

Thank you for all the feedback! 

To scan an HFile in the new format we don't even need the root index. Each 
block is self-sufficient in that the header contains all the information 
necessary to decode the block, except the compression type, which is found in 
the trailer. We could create an HFile fix tool that would rebuild the block 
index if necessary. In HFile format v1, however, if the block index is corrupt, 
we would not be able to read any data blocks at all. So I don't see how HFile 
format v2 is more brittle than v1.

Implementation update: a load test (org.apache.hadoop.hbase.manual.HBaseTest) 
is successfully running on a 5-node cluster, and I see some 2-level indexes 
being created with 5-15 root-level entries so far (with the max index block 
size set to 128K), as well as some compound ROW Bloom filters.

Regards,
--Mikhail


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033322#comment-13033322
 ] 

stack commented on HBASE-3857:
--

@Mikhail cool  I like the bit about each block being 'self-sufficient'.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-12 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032656#comment-13032656
 ] 

Mikhail Bautin commented on HBASE-3857:
---

I will try to answer the rest of the questions:

 + You say 
 Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!magic!records Is 
 this the case? The magic was supposed to be a sequence you could search to 
 pick up the parse again after hitting a bad patch of corrupted data. You seem 
 to instead start blocks with a type?

In our design, the magic record a serialized representation of the block 
type. 
I did not see any logic that searches for a magic record after hitting a 
block 
of bad data in version 1, so I did not implement it in version 2. I am not 
sure
what are the specific data corruption cases this might help fix.


 + How are blocks sized now? Are we still cutting blocks off at first KV 
boundary after we go past configured hfile block size – e.g. 64k – or instead, 
is the block cutoff instead determined by fill of the bloom filter array or the 
index?

The blocks are sized the same way as before. Block cutoff happens 
independently
for regular data blocks and for inline blocks (Bloom blocks and leaf data 
index
blocks). When a normal data block fills up, we give all registered inline
block writers a chance to insert their next block into the stream. The 
Bloom
filter writer has an ability to queue filled-up blocks until its next 
chance to
write them, and block index writer's chunks can only fill up on data block
boundary.


 + I think I know what the following refers to in the diagram, 
Version!2!root index,!stored!in!the!data!block!index!section!of!the!file – 
its kept in the 'load-on-open section', right?
   
This should have been Version 2 root index, stored in the load-on-open 
section
of the file. Thanks for catching this. I will fix this in the spec.


 + • Offset!(long)!
 o For this description 
This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.!
 • 
On?disk!size!(int)!
 • Key!(a!serialized!byte!array)!
 o Key!(VInt)!
 o 
Key!bytes
 You are using vint specifying key size. We didn't do that in v1? You have a 
 good implementation (was costly IIRC using hadoops').

Actually, version 1 already uses VInt to store the block index, because it 
uses
Bytes.writeByteArray, which stores the length as a VInt. We decided to keep 
the
root-level block index format similar to the version 1 block index format, 
since
it gets de-serialized into a byte[][], a long[], and an int[] anyway.

 + Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?)
   
The Root Data Index is one particular instance of a root index block. We 
use the
same root index block format for the data index root level, meta index
(which is always single-level), and Bloom index (also single-level). For
intermediate and leaf-level blocks we use another non-root index block 
format 
that allows to do binary search of the serialized data structure.


 + • entryOffsets:!the!“secondary!index” 
of!offsets!of!entries!in!the!block,!to!

facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values)
 Is this worth the bother? A binary search of in-memory data structure? How 
 many entries are you thinking there will be in these blocks?

After discussing this with Nicolas, we decided not to change the data block
format, because in our case there are somewhere between 10-500 key/value 
pairs
per data block, so binary search does not offer much benefit compared to the
current linear search, and the read time is dominated by input/output 
anyway. 

Hope this helps. Please let me know if you have any further questions/concerns 
about
the HFile format v2.

Thanks!
--Mikhail


 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-10 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031489#comment-13031489
 ] 

Liyin Tang commented on HBASE-3857:
---

Let me try to answer this:)
1) When writing the HFile, the writer will hold the current inline block index 
into memory. If this inline block index is larger than a threshold (default 
128KB), it will flush the index into disk. The benefit here is to avoid hold 
all the block index during the write operation.

2) But after writing all the data block and the block index is still smaller a 
threshold number, there will be only one level of block index, which is the 
root level and it is the same as before. So multi-level block index is totally 
optional based on block index size. 

3) After flushing the each of inline block index onto disk, the Writer will 
generate its parent level of block index on the fly, which will point to the 
offset of these inline block index. It will be called intermediate level. So 
only the leaf level of block index will point to the data block, other level 
block index will point to its children level block index. Also for each level 
of block index, it will be partitioned into blocks as well. Each index block is 
a chunk of block index, which will have the same format of data block.

4) If the intermediate level of block is still larger than the threshold (128 
KB), it will generate its parent level. Keep doing this until the current level 
of block index is smaller than the threshold. The last one is called root level 
block index, which is smaller than the threshold. It works like generating a 
balanced tree from bottom to up until the root level is smaller than the 
threshold.

5) So when reading the file, it will load back the root level of block index 
and hold it into memory. When Reader calls seekTo a key, it will lookup the 
tree from root to leaf to find out the data block contains this key. All the 
index block will be cached during the searching.


Please let me know there is any concerns about the block index:)
Thanks a lot

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029455#comment-13029455
 ] 

stack commented on HBASE-3857:
--

I don't see the attachment.  Is it just me?

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin

 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-05 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029462#comment-13029462
 ] 

Mikhail Bautin commented on HBASE-3857:
---

Liyin initially added and deleted the attachment, which was just our design doc 
wiki page copy-and-paste into Word. I will upload another draft of the design 
doc shortly.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin

 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029615#comment-13029615
 ] 

stack commented on HBASE-3857:
--

Design looks excellent.

A few comments:

+ It looks like it will be self-migrating in that it can read version1 hfiles.  
Thats great.
+ You say 
Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!magic!records  Is 
this the case?  The magic was supposed to be a sequence you could search to 
pick up the parse again after hitting a bad patch of corrupted data.  You seem 
to instead start blocks with a type?
+ How are blocks sized now?  Are we still cutting blocks off at first KV 
boundary after we go past configured hfile block size -- e.g. 64k -- or 
instead, is the block cutoff instead determined by fill of the bloom filter 
array or the index?
+ I think I know what the following refers to in the diagram, Version!2!root 
index,!stored!in!the!data!block!index!section!of!the!file -- its kept in the 
'load-on-open section', right?
+ Can we have example of how root, intermediate and leaf indices interrelate?  
Whats in the root, intermediates, and leaf indices?  Are intermediates 
optional?  At what boundary do they cut in?  Leaf indices are optional too?  
What are these? indices into the data block?
+ • Offset!(long)!
o For this description 
This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.!
• On?disk!size!(int)!
• Key!(a!serialized!byte!array)!
o Key!(VInt)!
o Key!bytes

You are using vint specifying key size.  We didn't do that in v1?  You have a 
good implementation (was costly IIRC using hadoops').

+ Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?)
+ • entryOffsets:!the!“secondary!index” of!offsets!of!entries!in!the!block,!to!
facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values)

Is this worth the bother?  A binary search of in-memory data structure?  How 
many entries are you thinking there will be in these blocks?

 
+1



 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3857) Change the HFile Format

2011-05-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029739#comment-13029739
 ] 

stack commented on HBASE-3857:
--

bq.  • entryOffsets:!the!“secondary!index” 
of!offsets!of!entries!in!the!block,!to!
facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values) Is this 
worth the bother? A binary search of in-memory data structure? How many entries 
are you thinking there will be in these blocks?

nvm on the above.  I see your comment over in HBASE-3855 on how you want to 
binary search seek/reseeking.

 Change the HFile Format
 ---

 Key: HBASE-3857
 URL: https://issues.apache.org/jira/browse/HBASE-3857
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Mikhail Bautin
 Attachments: hfile_format_v2_design_draft_0.1.pdf


 In order to support HBASE-3763 and HBASE-3856, we need to change the format 
 of the HFile. The new format proposal is attached here. Thanks for Mikhail 
 Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira