[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164613#comment-13164613 ] Jean-Daniel Cryans commented on HBASE-3857: --- Any reason why this patch is removing compaction and flush queue sizes? {code} -this.metrics.compactionQueueSize.set(compactSplitThread -.getCompactionQueueSize()); -this.metrics.flushQueueSize.set(cacheFlusher -.getFlushQueueSize()); {code} If it was intentional, there's a bunch of dead code that also needs to be removed like those methods that were called. If it wasn't, meaning there's currently no way in 0.92 to get the compaction queue size, then this would be sufficient for me to kill the RC. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-Adding-release-notes-for-HBASE-3857.patch, 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164618#comment-13164618 ] Zhihong Yu commented on HBASE-3857: --- @J-D: Nice catch. We should open another JIRA to deal with queue sizes. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-Adding-release-notes-for-HBASE-3857.patch, 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164619#comment-13164619 ] Jean-Daniel Cryans commented on HBASE-3857: --- Yeah I just want to verify first what's the situation, maybe I'm missing something. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-Adding-release-notes-for-HBASE-3857.patch, 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078886#comment-13078886 ] Ted Yu commented on HBASE-3857: --- I got one test failure for 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch: {code} Failed tests: testGzipCompression(org.apache.hadoop.hbase.io.hfile.TestHFileBlock): expected:...0\x00\x00\x00\x00\x0[0]\xED\xC3\xC1\x11\x00... but was:...0\x00\x00\x00\x00\x0[3]\xED\xC3\xC1\x11\x00... at org.junit.Assert.assertEquals(Assert.java:123) at org.junit.Assert.assertEquals(Assert.java:145) at org.apache.hadoop.hbase.io.hfile.TestHFileBlock.testGzipCompression(TestHFileBlock.java:126){code} Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078979#comment-13078979 ] Mikhail Bautin commented on HBASE-3857: --- TL;DR version: Can we commit this patch? @Nicolas: you mentioned some last-minute sanity checking you wanted to do. Here are some recent test results with and without the patch. I am running the tests against the stable revision 1153300. All the sporadically failing unit tests with the patch applied (TestDistributedLogSplitting, TestHRegion, TestServerCustomProtocol) also intermittently fail without the patch. The failures look similar for the two versions of code, as shown below. The most recent version of the patch is also passing automated randomized read/write load testing. Given all of that, I think it is safe to assume that the patch does not introduce a regression. Note: some of these test failures might be problems with my test setup. TestServerCustomProtocol: java.lang.AssertionError: Results should contain region test,,1312378402385.4b887abff65a2e74eab86c859d255c89. for row 'aaa' TestDistirbutedLogSplitting: testThreeRSAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) Time elapsed: 95.921 sec FAILURE! java.lang.AssertionError: . . . at org.apache.hadoop.hbase.master.TestDistributedLogSplitting.testThreeRSAbort TestHRegion: testWritesWhileGetting(org.apache.hadoop.hbase.regionserver.TestHRegion) Time elapsed: 0.398 sec FAILURE! junit.framework.AssertionFailedError: expected:\x00\x00\x008 but was:\x00\x00\x006 at org.apache.hadoop.hbase.HBaseTestCase.assertEquals(HBaseTestCase.java:691) at org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:2759) With the patch: 2011-08-03_02_58_34 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, fail: 2, err: 0, skip: 9, time: 4784.9, failed: DistributedLogSplitting, HRegion 2011-08-03_03_24_45 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, fail: 0, err: 0, skip: 9, time: 4544.1 2011-08-03_05_00_32 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, fail: 1, err: 0, skip: 9, time: 4735.5, failed: HRegion 2011-08-03_05_21_43 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, fail: 1, err: 0, skip: 9, time: 4485.4, failed: ServerCustomProtocol 2011-08-03_07_00_19 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, fail: 0, err: 0, skip: 9, time: 4624.5 2011-08-03_07_14_08 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, fail: 1, err: 0, skip: 9, time: 4687.8, failed: ServerCustomProtocol 2011-08-03_09_08_22 commit: review_hfile-v2-r1153300-2011_08_02 | tests: 870, fail: 1, err: 0, skip: 9, time: 4711.4, failed: ServerCustomProtocol Without the patch: 2011-08-02_19_23_26 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, fail: 0, err: 0, skip: 9, time: 4496.4 2011-08-02_21_19_54 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, fail: 0, err: 0, skip: 9, time: 4520.2 2011-08-02_23_15_58 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, fail: 0, err: 0, skip: 9, time: 4438.1 2011-08-03_01_10_12 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, fail: 0, err: 0, skip: 9, time: 4503.5 2011-08-03_03_05_10 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, fail: 1, err: 0, skip: 9, time: 4550.1, failed: HRegion 2011-08-03_05_00_26 commit: HBASE-3065 don't prepend MAGIC if d | tests: 836, fail: 1, err: 0, skip: 9, time: 4418.0, failed: ServerCustomProtocol 2011-08-03_07_05_57 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, fail: 1, err: 0, skip: 9, time: 4443.9, failed: Replication 2011-08-03_08_59_20 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, fail: 2, err: 0, skip: 9, time: 4427.2, failed: CoprocessorEndpoint, DistributedLogSplitting 2011-08-03_10_52_41 commit: HBASE-3065 don't prepend MAGIC if d | tests: 837, fail: 0, err: 0, skip: 9, time: 4483.8 Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078980#comment-13078980 ] Ted Yu commented on HBASE-3857: --- Committed to TRUNK. I moved the binary file to src/test/resources/org/apache/hadoop/hbase/8e8ab58dcf39412da19833fcd8f687ac Thanks for the great work, Mikhail. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078988#comment-13078988 ] Hudson commented on HBASE-3857: --- Integrated in HBase-TRUNK #2076 (See [https://builds.apache.org/job/HBase-TRUNK/2076/]) HBASE-3857 New files HBASE-3857 added the binary V1 HFile HBASE-3857 Change the HFile Format (Mikhail Liyin) tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterWriter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java tedyu : Files : * /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/8e8ab58dcf39412da19833fcd8f687ac tedyu : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFilePerformance.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/RandomSeek.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCachedBlockQueue.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HServerLoad.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/CachedBlock.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileSeek.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Hash.java * /hbase/trunk/src/main/resources/hbase-default.xml * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestByteBloomFilter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompressionTest.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/ByteBloomFilter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079102#comment-13079102 ] Hudson commented on HBASE-3857: --- Integrated in HBase-TRUNK #2077 (See [https://builds.apache.org/job/HBase-TRUNK/2077/]) HBASE-3857 binary HFile should be under io/hfile HBASE-3857 relocated binary HFile HBASE-3857 New test classes. HBASE-3857 New files HBASE-3857 more new files tedyu : Files : * /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/hfile/8e8ab58dcf39412da19833fcd8f687ac * /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/file/8e8ab58dcf39412da19833fcd8f687ac tedyu : Files : * /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/hfile * /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/8e8ab58dcf39412da19833fcd8f687ac * /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io * /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/file/8e8ab58dcf39412da19833fcd8f687ac * /hbase/trunk/src/test/resources/org/apache/hadoop/hbase/io/file tedyu : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestIdLock.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java tedyu : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/IdLock.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/DynamicByteBloomFilter.java tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilterBase.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/DoubleOutputStream.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InlineBlockWriter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterBase.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilterWriter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079166#comment-13079166 ] Ted Yu commented on HBASE-3857: --- @Mikhail: Test fix committed. Thanks for the follow-up. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079197#comment-13079197 ] Hudson commented on HBASE-3857: --- Integrated in HBase-TRUNK #2078 (See [https://builds.apache.org/job/HBase-TRUNK/2078/]) HBASE-3857 Fix TestHFileBlock.testBlockHeapSize test failure (Mikhail) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-Fix-TestHFileBlock.testBlockHeapSize.patch, 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_02_19_4.patch, 0001-review_hfile-v2-r1153300-git-1152532-2011_08_03_12_4.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073814#comment-13073814 ] Mikhail Bautin commented on HBASE-3857: --- The new version of the patch is successfully passing the randomized load test as well. I am trying to compare unit test results between r1152122 and the same version with the patch applied. Without the patch: 2011-08-01_10_55_23 commit: HBASE-4144 RS does not abort if th | tests: 812, fail: 4, err: 14, skip: 9, time: 4791.1, failed: FullLogReconstruction, DistributedLogSplitting, SplitTransactionOnCluster, ScannerTimeout, MasterFailover, MultiParallel With the patch: 2011-08-01_11_34_28 commit: review_hfile-v2-r1152122-2011_08_01 | tests: 845, fail: 5, err: 14, skip: 9, time: 5426.5, failed: FullLogReconstruction, DistributedLogSplitting, ServerCustomProtocol, Replication, SplitTransactionOnCluster, ScannerTimeout, MasterFailover, MultiParallel Looking at the ServerCustomProtocol (the only test that has failures with the patch but not without), I see that this is a quite frequent intermittent test failure in my automated runs of HBase trunk tests. Any advice about what version I should select as the baseline for my unit test run? The only trunk version that I managed to get a clean unit test run with was r1147350, and my patch against that passed all unit tests as well, but I think some changes were introduced since then that made the patch not apply cleanly, so I rebased the patch to a more recent version (and got all the test failures of that version). Notwithstanding all the above, I believe the patch is currently in a stable state and can be committed, and I will confirm that once a few more unit test runs complete. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073819#comment-13073819 ] Ted Yu commented on HBASE-3857: --- @Mikhail: Thanks for your effort of bringing this closer to checkin. Minor note: Replication was new in your second list of unit tests above. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073880#comment-13073880 ] Mikhail Bautin commented on HBASE-3857: --- @Ted: Yes, you are right, TestReplication also failed with the patch applied while it did not fail without the patch. The failure message was as follows: java.lang.AssertionError: Waited too much time for queueFailover replication Looking at my archive of unit test results for the trunk, TestReplication failure shows up in 67 cases out of 291 total runs of the test suite, so I suspect it is a highly problematic unit test. I will look into this a bit more. However, my question is the following: with the extremely agile development process and a non-trivial number of routine unit test failures in the trunk, what is the accepted approach of testing new patches? How do people select the right revision to test their patch against? Is it always the trunk, and are the existing unit test failures in the trunk ignored (which is bad because this may mask new bugs), or is there a different approach? Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073982#comment-13073982 ] Jean-Daniel Cryans commented on HBASE-3857: --- The reason TestReplication sometimes fails is HBASE-3515, if you see that stack trace then you can disregard. If not, that's a new bug. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076009#comment-13076009 ] Ted Yu commented on HBASE-3857: --- @Mikhail: If 2011-08-01_11_34_28 test failures match those in TRUNK build 2067 (18 failures), that would give us confidence of not introducing regression. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, 0001-review_hfile-v2-r1152122-2011_08_01_03_18_00.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072027#comment-13072027 ] stack commented on HBASE-3857: -- @Mikhail You see Todd's comments. Will you have a chance to work on them? Let us know if not and someone of us will take a crack at it. Thanks. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072033#comment-13072033 ] Mikhail Bautin commented on HBASE-3857: --- @Todd: thank you for your comments! @Michael: I will try to make changes and post another version of the patch tonight or tomorrow. By the way, I've been running an automated load test of the trunk with the patch applied on a 5-node cluster, and it seems to work fine. Here are some stats, FWIW, although without the details of this particular load test the following is probably incomprehensible. I guess we will open-source the load test later -- need to talk about this with Nicolas and other folks. It's basically a bunch of random insertions and random full-row reads. 11/07/27 14:30:13 INFO utils.MultiThreadedAction: [W:20] Keys = 54313718, cols = 2B, time = 17:15:04 Overall: [keys/s = 874, latency = 22 ms] Current: [keys/s = 1323, latency = 14 ms] 11/07/27 14:30:13 INFO utils.MultiThreadedAction: [R:20] Keys = 100115674, cols = 5B, time = 17:15:04 Overall: [keys/s = 1612, latency = 12 ms] Current: [keys/s = 410, latency = 49 ms], verified = 50061516 Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf, hfile_format_v2_design_draft_0.4.odt In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071467#comment-13071467 ] stack commented on HBASE-3857: -- @Mikhail Thanks for updated patch. Do you have a couple of answers for my questions above at https://issues.apache.org/jira/browse/HBASE-3857?focusedCommentId=13068151page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13068151? Thanks Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071473#comment-13071473 ] Mikhail Bautin commented on HBASE-3857: --- @Michael: 1. From my conversations with Nicolas and Jonathan I got a sense that HFile v2 would be part of 0.94, but if we can get it into 0.92, that would be even better, because I think in that case we will have to do less merging further down the road. I am also doing some load-testing of the patch on a 5-node cluster this week. 2. I will attach the spec document in the OpenOffice format later tonight (as soon as I can download OpenOffice and convert the doc). Thank you! --Mikhail Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071474#comment-13071474 ] Ted Yu commented on HBASE-3857: --- +1 on putting this into 0.92 Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, 0001-review_hfile-v2-r1147350_2011-07-26_11_55_59.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069377#comment-13069377 ] stack commented on HBASE-3857: -- Resolve hbase-3417 when this goes in. This subsumes it. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068151#comment-13068151 ] stack commented on HBASE-3857: -- @Mikhail I'll finish my skim through your quality patch this evening but meantime here are a few things: 1. Mind if I run a vote up on dev on committing this to trunk? (My sense is that some would rather it go in after 0.92 branch is cut but I'm thinking it should be included in 0.92, our next major release; its a radical change but the quality of the contribution, the tests included, the careful consideration given to migrating in-place, and that you fellas have been running this in house make me think it low risk pulling this in now. It looks like Andrew agrees with me). 2. Would you mind attaching the src document for your pdf design? It looks like its word. My thinking is that we could use the openoffice saveas docbook to get a rough cut at a docbook section on this new format that we could work over to include in our site documentation. This stuff is excellent. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: 0001-review_hfile-v2-r1144693_2011-07-15_11_14_44.patch, hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065620#comment-13065620 ] Mikhail Bautin commented on HBASE-3857: --- I have uploaded a diff at https://reviews.apache.org/r/1134/ containing HFile v2 changes, multi-level block indexes, and compound Bloom filters. The patch can be downloaded at https://reviews.apache.org/r/1134/diff/raw/. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055794#comment-13055794 ] Mikhail Bautin commented on HBASE-3857: --- @Stack: there were no significant changes to the spec, just cleanup after the implementation is complete and working. Is it safe to assume that the file format itself is OK for production deployment, i.e. that we will not have to write migration software from HFile format v2 to an HFile format v2.1 as a result of review comments? I will get the patch out for review this week. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056195#comment-13056195 ] stack commented on HBASE-3857: -- @Mikhail When you put up the patch I can see if migration is needed (IIRC, your design had it that you could deal with both old and new hfiles). Good on you. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056201#comment-13056201 ] Mikhail Bautin commented on HBASE-3857: --- @Stack: no separate migration is needed to go from HFile v1 to HFile v2, because the new code can read both formats. What we are concerned about is that if we deploy HFile v2 in production and then get review comments that require a format change, we might have to migrate from HFile-v2-as-it-currently-is to HFile-v2-final-with-all-comments-addressed. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056202#comment-13056202 ] stack commented on HBASE-3857: -- bq. What we are concerned about is that if we deploy HFile v2 in production and then get review comments that require a format change, we might have to migrate from HFile-v2-as-it-currently-is to HFile-v2-final-with-all-comments-addressed. How can we help with that? If you post the code before you roll it out, my guess is that you'd get some outside reviewers. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056203#comment-13056203 ] Mikhail Bautin commented on HBASE-3857: --- I agree. I guess it might be difficult to say that the new format is OK based on the spec only. I am working on getting the patch out asap. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050893#comment-13050893 ] stack commented on HBASE-3857: -- @Mikhail Did much change? Thanks. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf, hfile_format_v2_design_draft_0.3.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042596#comment-13042596 ] Jason Rutherglen commented on HBASE-3857: - I think the FST created in LUCENE-2792 could be used to compress the rowids in the HFile while simultaneously enabling fast lookup. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033270#comment-13033270 ] stack commented on HBASE-3857: -- Thanks lads for the answers. Helps. High level, do you lads think this format 'brittle'? Will corruption or an error logic writing any piece of the file render the file as a whole unreadable or large swaths of the file unreadable? A corrupt root index makes the file totally unreadable I suppose. A corrupt intermediate index will render all subbranches unreadable? So, this file format seems more 'brittle' than V1 because of the chaining between the index parts (root to intermediate, etc.)? What do you think? Its unavoidable I suppose if we want the nice feature that Liyin describes where we dump out index as we cross over an index size threshold (And yes Mikhail, in V1, there is not code that makes use of the 'magic' to skip bad bits of the file. And does 'magic' for a parser to pick up the parse again even make sense on a filesystem that is checksummed? Or, in your words ' I am not sure what are the specific data corruption cases [magic] might help fix.') @Mikhail I forgot we were vint'ing already. Its probably not a bad idea having root keep same format as old v1 index. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033295#comment-13033295 ] Mikhail Bautin commented on HBASE-3857: --- Hi St.Ack, Thank you for all the feedback! To scan an HFile in the new format we don't even need the root index. Each block is self-sufficient in that the header contains all the information necessary to decode the block, except the compression type, which is found in the trailer. We could create an HFile fix tool that would rebuild the block index if necessary. In HFile format v1, however, if the block index is corrupt, we would not be able to read any data blocks at all. So I don't see how HFile format v2 is more brittle than v1. Implementation update: a load test (org.apache.hadoop.hbase.manual.HBaseTest) is successfully running on a 5-node cluster, and I see some 2-level indexes being created with 5-15 root-level entries so far (with the max index block size set to 128K), as well as some compound ROW Bloom filters. Regards, --Mikhail Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033322#comment-13033322 ] stack commented on HBASE-3857: -- @Mikhail cool I like the bit about each block being 'self-sufficient'. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032656#comment-13032656 ] Mikhail Bautin commented on HBASE-3857: --- I will try to answer the rest of the questions: + You say Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!magic!records Is this the case? The magic was supposed to be a sequence you could search to pick up the parse again after hitting a bad patch of corrupted data. You seem to instead start blocks with a type? In our design, the magic record a serialized representation of the block type. I did not see any logic that searches for a magic record after hitting a block of bad data in version 1, so I did not implement it in version 2. I am not sure what are the specific data corruption cases this might help fix. + How are blocks sized now? Are we still cutting blocks off at first KV boundary after we go past configured hfile block size – e.g. 64k – or instead, is the block cutoff instead determined by fill of the bloom filter array or the index? The blocks are sized the same way as before. Block cutoff happens independently for regular data blocks and for inline blocks (Bloom blocks and leaf data index blocks). When a normal data block fills up, we give all registered inline block writers a chance to insert their next block into the stream. The Bloom filter writer has an ability to queue filled-up blocks until its next chance to write them, and block index writer's chunks can only fill up on data block boundary. + I think I know what the following refers to in the diagram, Version!2!root index,!stored!in!the!data!block!index!section!of!the!file – its kept in the 'load-on-open section', right? This should have been Version 2 root index, stored in the load-on-open section of the file. Thanks for catching this. I will fix this in the spec. + • Offset!(long)! o For this description This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.! • On?disk!size!(int)! • Key!(a!serialized!byte!array)! o Key!(VInt)! o Key!bytes You are using vint specifying key size. We didn't do that in v1? You have a good implementation (was costly IIRC using hadoops'). Actually, version 1 already uses VInt to store the block index, because it uses Bytes.writeByteArray, which stores the length as a VInt. We decided to keep the root-level block index format similar to the version 1 block index format, since it gets de-serialized into a byte[][], a long[], and an int[] anyway. + Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?) The Root Data Index is one particular instance of a root index block. We use the same root index block format for the data index root level, meta index (which is always single-level), and Bloom index (also single-level). For intermediate and leaf-level blocks we use another non-root index block format that allows to do binary search of the serialized data structure. + • entryOffsets:!the!“secondary!index” of!offsets!of!entries!in!the!block,!to! facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values) Is this worth the bother? A binary search of in-memory data structure? How many entries are you thinking there will be in these blocks? After discussing this with Nicolas, we decided not to change the data block format, because in our case there are somewhere between 10-500 key/value pairs per data block, so binary search does not offer much benefit compared to the current linear search, and the read time is dominated by input/output anyway. Hope this helps. Please let me know if you have any further questions/concerns about the HFile format v2. Thanks! --Mikhail Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031489#comment-13031489 ] Liyin Tang commented on HBASE-3857: --- Let me try to answer this:) 1) When writing the HFile, the writer will hold the current inline block index into memory. If this inline block index is larger than a threshold (default 128KB), it will flush the index into disk. The benefit here is to avoid hold all the block index during the write operation. 2) But after writing all the data block and the block index is still smaller a threshold number, there will be only one level of block index, which is the root level and it is the same as before. So multi-level block index is totally optional based on block index size. 3) After flushing the each of inline block index onto disk, the Writer will generate its parent level of block index on the fly, which will point to the offset of these inline block index. It will be called intermediate level. So only the leaf level of block index will point to the data block, other level block index will point to its children level block index. Also for each level of block index, it will be partitioned into blocks as well. Each index block is a chunk of block index, which will have the same format of data block. 4) If the intermediate level of block is still larger than the threshold (128 KB), it will generate its parent level. Keep doing this until the current level of block index is smaller than the threshold. The last one is called root level block index, which is smaller than the threshold. It works like generating a balanced tree from bottom to up until the root level is smaller than the threshold. 5) So when reading the file, it will load back the root level of block index and hold it into memory. When Reader calls seekTo a key, it will lookup the tree from root to leaf to find out the data block contains this key. All the index block will be cached during the searching. Please let me know there is any concerns about the block index:) Thanks a lot Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029455#comment-13029455 ] stack commented on HBASE-3857: -- I don't see the attachment. Is it just me? Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029462#comment-13029462 ] Mikhail Bautin commented on HBASE-3857: --- Liyin initially added and deleted the attachment, which was just our design doc wiki page copy-and-paste into Word. I will upload another draft of the design doc shortly. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029615#comment-13029615 ] stack commented on HBASE-3857: -- Design looks excellent. A few comments: + It looks like it will be self-migrating in that it can read version1 hfiles. Thats great. + You say Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!magic!records Is this the case? The magic was supposed to be a sequence you could search to pick up the parse again after hitting a bad patch of corrupted data. You seem to instead start blocks with a type? + How are blocks sized now? Are we still cutting blocks off at first KV boundary after we go past configured hfile block size -- e.g. 64k -- or instead, is the block cutoff instead determined by fill of the bloom filter array or the index? + I think I know what the following refers to in the diagram, Version!2!root index,!stored!in!the!data!block!index!section!of!the!file -- its kept in the 'load-on-open section', right? + Can we have example of how root, intermediate and leaf indices interrelate? Whats in the root, intermediates, and leaf indices? Are intermediates optional? At what boundary do they cut in? Leaf indices are optional too? What are these? indices into the data block? + • Offset!(long)! o For this description This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.! • On?disk!size!(int)! • Key!(a!serialized!byte!array)! o Key!(VInt)! o Key!bytes You are using vint specifying key size. We didn't do that in v1? You have a good implementation (was costly IIRC using hadoops'). + Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?) + • entryOffsets:!the!“secondary!index” of!offsets!of!entries!in!the!block,!to! facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values) Is this worth the bother? A binary search of in-memory data structure? How many entries are you thinking there will be in these blocks? +1 Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3857) Change the HFile Format
[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029739#comment-13029739 ] stack commented on HBASE-3857: -- bq. • entryOffsets:!the!“secondary!index” of!offsets!of!entries!in!the!block,!to! facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values) Is this worth the bother? A binary search of in-memory data structure? How many entries are you thinking there will be in these blocks? nvm on the above. I see your comment over in HBASE-3855 on how you want to binary search seek/reseeking. Change the HFile Format --- Key: HBASE-3857 URL: https://issues.apache.org/jira/browse/HBASE-3857 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Mikhail Bautin Attachments: hfile_format_v2_design_draft_0.1.pdf In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile. The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira