[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609514#comment-14609514 ] Ben Lau commented on HBASE-13991: - Hi guys, thanks for all the feedback. I think we agree that it may make sense to switchover entirely to the new layout instead of making it optional. Let me get back to you guys, I need to talk with Francis some more about the other suggestions. Thanks guys. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615813#comment-14615813 ] Ben Lau commented on HBASE-13991: - Hi guys, hope you had a happy 4th of July. We would like to do something akin to Lars’ last idea. That is, we will have code to support both the old layout and the new layout, but it will be on a per HBase cluster basis. You will be able to migrate a cluster entirely to the hierarchical layout or leave it on the old layout. This approach has the following pros: - If HBase users do not need/want the new layout, they will not have to do an offline upgrade in order to use new HBase code. The alternative is to make an online upgrade for the hierarchical layout, but this would require some very messy changes to the codebase and also be tricky to test fully. - HBase code will not have to ‘detect’ whether tables/paths/regions are hierarchical or not. The master or region server can simply look at the root table at startup and use that to determine if the cluster has migrated to the hierarchical layout. This single source of truth would make code less ugly since you don’t need to do in-context per-region/path checks in different parts of the codebase. What do you guys think about this approach? Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606793#comment-14606793 ] Ben Lau commented on HBASE-13991: - Sure. Created a reviewboard request here: https://reviews.apache.org/r/36029/ Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-13991: Attachment: HumongousTableDoc.pdf Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-13991: Attachment: HBASE-13991-master.patch Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-13991: Description: Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. was: Add support for humongous tables via a hierarchical layout for regions on filesystem. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13991) Hierarchical Layout for Humongous Tables
Ben Lau created HBASE-13991: --- Summary: Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Add support for humongous tables via a hierarchical layout for regions on filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
Ben Lau created HBASE-14283: --- Summary: Reverse scan doesn’t work with HFile inline index/bloom blocks Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707096#comment-14707096 ] Ben Lau commented on HBASE-14283: - In the patch I also added an extra unit test to TestFromClientside, that also tests reverse scan, but is unrelated to this bug, just as an additional test to strengthen the test suite. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707088#comment-14707088 ] Ben Lau commented on HBASE-14283: - We suspect the person in HBASE-13830 also ran into this bug, based on his similar exception, but can’t know for sure without more information from him. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-14283: Attachment: hfile-seek-before.patch Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13830) Hbase REVERSED may throw Exception sometimes
[ https://issues.apache.org/jira/browse/HBASE-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707111#comment-14707111 ] Ben Lau commented on HBASE-13830: - Possibly the same bug as in HBASE-14283, or related. Hbase REVERSED may throw Exception sometimes Key: HBASE-13830 URL: https://issues.apache.org/jira/browse/HBASE-13830 Project: HBase Issue Type: Bug Affects Versions: 0.98.1 Reporter: ryan.jin run a scan at hbase shell command. {code} scan 'analytics_access',{ENDROW='9223370603647713262-flume01.hadoop-10.32.117.111-373563509',LIMIT=10,REVERSED=true} {code} will throw exception {code} java.io.IOException: java.io.IOException: Could not seekToPreviousRow StoreFileScanner[HFileScanner for reader reader=hdfs://nameservice1/hbase/data/default/analytics_access/a54c47c568c00dd07f9d92cfab1accc7/cf/2e3a107e9fec4930859e992b61fb22f6, compression=lzo, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=9223370603542781142-flume01.hadoop-10.32.117.111-378180911/cf:key/1433311994702/Put, lastKey=9223370603715515112-flume01.hadoop-10.32.117.111-370923552/cf:timestamp/1433139261951/Put, avgKeyLen=80, avgValueLen=115, entries=43544340, length=1409247455, cur=9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0] to key 9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:448) at org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.seekToPreviousRow(ReversedKeyValueHeap.java:88) at org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToPreviousRow(ReversedStoreScanner.java:128) at org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToNextRow(ReversedStoreScanner.java:88) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:503) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3866) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3946) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3814) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3805) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3136) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: On-disk size without header provided is 47701, but block header contains 10134. Block offset: -1, data starts with: DATABLK*\x00\x00'\x96\x00\x01\x00\x04\x00\x00\x00\x005\x96^\xD2\x01\x00\x00@\x00\x00\x00' at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:451) at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$400(HFileBlock.java:87) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1466) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:569) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:413) ... 17 more at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.6.0_65] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) ~[na:1.6.0_65] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) ~[na:1.6.0_65] at java.lang.reflect.Constructor.newInstance(Constructor.java:513) ~[na:1.6.0_65] at
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606521#comment-14606521 ] Ben Lau commented on HBASE-13991: - The HBase master doesn't write much to disk compared to region servers/data nodes. Yes, this change is mostly to the filesystem layout. We may use the 'humongous' flag for other things in the future but currently it is only used to determine the layout. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-13991: Description: Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ was: Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606940#comment-14606940 ] Ben Lau commented on HBASE-13991: - I don't think we had one planned but if there were enough parties interested in a tool we could probably make one. It would probably not be too hard (unless I'm missing something) for a table that can be taken offline for a short period of time. On a side note there were some new conflicts in master since I created the patch so I have fixed them and re-uploaded a new patch in the reviewboard. I'll treat the reviewboard as the holder of the current version of the patch and leave the attachment in this ticket as the 1st draft submission. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-13991: Description: Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ Known limitation of this patch: It does not deal with HFileLinks, which means that humongous tables, as implemented in this patch, would not support snapshots or timeline consistent region replicas feature in their current form. This could be addressed later but we had decided not to implement HFileLink support for the first version. was: Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ Known limitation of this patch: It does not deal with HFileLinks, which means that humongous tables, as implemented in this patch, would not support snapshots or timeline consistent region replicas feature in their current form. This could be addressed later but we had decided not to implement HFileLink support for the first version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13830) Hbase REVERSED may throw Exception sometimes
[ https://issues.apache.org/jira/browse/HBASE-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652873#comment-14652873 ] Ben Lau commented on HBASE-13830: - Hey Ryan, do you have more information on this bug. We are interested in using the reverse scan feature at Yahoo and would like to clear up any known bugs before internal users take it up for production use. If you had for example an independent program and/or data that could be used to reproduce this issue, we would like to see it. If you cannot reproduce the bug anymore, we'd like to know anything else you remember, like the version of HDFS, any custom patches you had on your version of HBase, the table schema at the time (eg any particular block encodings), etc. Hbase REVERSED may throw Exception sometimes Key: HBASE-13830 URL: https://issues.apache.org/jira/browse/HBASE-13830 Project: HBase Issue Type: Bug Affects Versions: 0.98.1 Reporter: ryan.jin run a scan at hbase shell command. {code} scan 'analytics_access',{ENDROW='9223370603647713262-flume01.hadoop-10.32.117.111-373563509',LIMIT=10,REVERSED=true} {code} will throw exception {code} java.io.IOException: java.io.IOException: Could not seekToPreviousRow StoreFileScanner[HFileScanner for reader reader=hdfs://nameservice1/hbase/data/default/analytics_access/a54c47c568c00dd07f9d92cfab1accc7/cf/2e3a107e9fec4930859e992b61fb22f6, compression=lzo, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=9223370603542781142-flume01.hadoop-10.32.117.111-378180911/cf:key/1433311994702/Put, lastKey=9223370603715515112-flume01.hadoop-10.32.117.111-370923552/cf:timestamp/1433139261951/Put, avgKeyLen=80, avgValueLen=115, entries=43544340, length=1409247455, cur=9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0] to key 9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:448) at org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.seekToPreviousRow(ReversedKeyValueHeap.java:88) at org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToPreviousRow(ReversedStoreScanner.java:128) at org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToNextRow(ReversedStoreScanner.java:88) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:503) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3866) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3946) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3814) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3805) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3136) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: On-disk size without header provided is 47701, but block header contains 10134. Block offset: -1, data starts with: DATABLK*\x00\x00'\x96\x00\x01\x00\x04\x00\x00\x00\x005\x96^\xD2\x01\x00\x00@\x00\x00\x00' at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:451) at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$400(HFileBlock.java:87) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1466) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:569) at
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976020#comment-14976020 ] Ben Lau commented on HBASE-14283: - Reran the failures that looked relevant on the various branches, seems the tests are just unstable. When I get the time I'll try to restart the discussion about updating the HFile serialization for more efficient reverse scans in HBASE-14576. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975101#comment-14975101 ] Ben Lau commented on HBASE-14283: - Thanks Andrew. Can we merge these patches or is there still something that needs to be done or reviewed? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975489#comment-14975489 ] Ben Lau commented on HBASE-14283: - That's odd, something went wrong with the patch for 1.1 branch. That compiled fine for me but perhaps I overlooked something, or the patch became stale. I will take a look later tonight. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975584#comment-14975584 ] Ben Lau commented on HBASE-14283: - Ok thanks just wanted to confirm the other branches were fine as far as we know, eg we didnt swap 1.1 patch with 0.98 patch and need to fix/update 0.98 branch. Thanks. It looks like there are some test failures but I think they are not related to the patch. I will rerun the tests that fail tonight locally after they appear in this ticket. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975570#comment-14975570 ] Ben Lau commented on HBASE-14283: - [~andrew.purt...@gmail.com] Was the wrong patch applied to 1.1 or am I misunderstanding something? So I looked at https://github.com/apache/hbase/commits/branch-1.1 specifically https://github.com/apache/hbase/commit/0db04a1705e5e8cc04cc9c010ddfc5612f60cfec and it is missing the CellUtil method that I had in my HBASE-14283-branch-1.1.patch attached. It's fine if this is fixed as an addendum but there's nothing else amiss right (i.e. the other branches have the right patches applied?) > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974587#comment-14974587 ] Ben Lau commented on HBASE-14283: - Still there [~andrew.purt...@gmail.com]? Anyone else have comments/questions? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956228#comment-14956228 ] Ben Lau commented on HBASE-14283: - So anything I should change in the patches? How many +1's are needed? Does someone else need to +1? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956541#comment-14956541 ] Ben Lau commented on HBASE-14283: - [~lhofhansl] and [~anoop.hbase], I probably should’ve called it out explicitly but yes technically, it would affect forward scans (among other things), but I don’t think in any noticeable way. The patched code is called by HalfStoreFileReader.getLastKey(). This getLastKey() method is needed to figure out the split point of a region which is eg used in splitting a region. Extra IO op there but given that region splits don’t happen frequently nor at a very high rate when they do I think it is fine. The getLastKey() is also used by StoreFile.Reader for passesKeyRangeFilter() check to determine whether a store is applicable to a scan, both forward/reverse. It affects the initial scan creation as well as later next() RPC calls I think. Although this is too bad, it doesn’t really matter I think in practice, because the block of the last key will get cached in the BlockCache the first time we need to know the last key for that halfstore. So repeated calls later in the region server will not incur any overhead. Let me know if this addresses your concerns or not. Incidentally I think that caching is one of the primary reasons why there seems to be a decent # of people using reverse scan but almost no one reporting this bug— because caching often hides it, either completely or nondeterministically (requests succeeding on later retries). We had some problems initially reproducing this bug reliably because if we ran forward scans in a table concurrently with the reverse scans, it would cause the blocks to become cached and so certain key ranges that previously would’ve caused reverse scan to fail suddenly started working just fine. Re: Attaching the same patch, let me know if I'm doing this wrong but it sounds like all I should have to do is just upload the same patch file for master but with a different name and the QA tests will pick it up. I will do that. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-14283: Attachment: HBASE-14283-reupload-master.patch Attached a new patch for master, same as the previous patch but with 'reupload' in the name.. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948049#comment-14948049 ] Ben Lau commented on HBASE-14283: - Hmmm, can't seem to edit Jira comments. Anyways, I attached short term patches to the ticket per discussion for all versions of HBase from 0.98 to master. The patches are mostly the same other than 0.98. There were a couple of utility methods that were missing/private in earlier versions of HBase 1.X that I backported or changed to public to mirror the master version of the patch. (Let me know if that isn't kosher for some reason.) I probably won't have time to update the patches this week but I'll look at feedback and make appropriate changes when I get the chance next week. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947873#comment-14947873 ] Ben Lau commented on HBASE-14283: - Alright I'll work on that. I created HBASE-14576 for the longer term fix, so this ticket will just be to implement the short term fix we described. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14576) New HFile version for optimized reverse scans
Ben Lau created HBASE-14576: --- Summary: New HFile version for optimized reverse scans Key: HBASE-14576 URL: https://issues.apache.org/jira/browse/HBASE-14576 Project: HBase Issue Type: Improvement Reporter: Ben Lau Assignee: Ben Lau A new ticket to finish the work from HBASE-14283, which will fix the HFileReader seekBefore() previous block size calculation bug but make the resulting reverse scan take more I/O than a forward scan. Fixing the bug in the long term requires an HFile version bump, either major or minor. We will put the previous block's size in the HFileBlock header instead of trying to calculate it directly using block offset arithmetic. Per [~anoop.hbase]'s suggestion, I created this ticket so that we can separate the issue of fixing the bug (the responsibility of HBASE-14283) and the issue of getting reverse scans to run quickly (the responsibility of this ticket). It is also unlikely that this ticket will be backported to old versions of HBase eg 0.98 whereas HBASE-14283 can be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-14283: Attachment: HBASE-14283-master.patch HBASE-14283-branch-1.patch HBASE-14283-branch-1.2.patch HBASE-14283-branch-1.1.patch HBASE-14283-branch-1.0.patch HBASE-14283-0.98.patch Short term patches for this bug per discussion. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959998#comment-14959998 ] Ben Lau commented on HBASE-14283: - [~andrew.purt...@gmail.com] Let me know if I'm missing something, but I think there is more than 1 scan in play here. I think you're talking about an external hbase client scan. I'm talking about an internal hbase scan opened up by the regionserver and which we know for a fact caches the block. See the implementation of HalfStoreFileReader.getLastKey(), it is creating an internal scanner that does cache. Furthermore, the results of the method aren't cached in the Reader class (eg as a variable) and since the method is called repeatedly in the codebase it seems likely that the author's expectation was that the block cache would work correctly and make an internal cache for the file reader redundant. So not only does this scenario happen only for newly split regions but it only happens for the first time. I can add more comments to the patches if it is really necessary but there is already a comment in the code indicating that this fix is not performant and is meant to be updated by a later ticket whose jira # is listed. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965794#comment-14965794 ] Ben Lau commented on HBASE-14283: - [~andrew.purt...@gmail.com] Is the above explanation agreeable? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946197#comment-14946197 ] Ben Lau commented on HBASE-14283: - Hey guys, sorry, I should be able to get back to this soon. Finishing up an unrelated project right now. I didn't know that minor versions in HFiles were also non-backwards compatible. That's one less reason then to make this a major version bump. If anyone has a strong preference for this fix to go into a V3.X I can change the patch to use minor version (eg for header size calculation) when I have time to do it. If not I'll leave it as V4 since it's a little simpler in the code as a major version bump. My original intention btw if it wasn't clear was that this wouldn't be the only change in a V4, just the first change that would go into a V4, whose format/contents is not yet meant to be final even when this patch is committed, i.e. V4 would be essentially a WIP with more changes suggested and implemented in other tickets and eventually released in HBase 2.0. [~anoop.hbase] I'm down for committing a short-term read-the-header-always fix for now and then discussing the longer term solution second. Which branches do you want the patch for? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621502#comment-14621502 ] Ben Lau commented on HBASE-13991: - Hi stack, we did consider doing an online migration from the old format to the new one, but it would require messy changes to the codebase and be tricky to test fully. That's because whenever you access a region's contents you would have to test for both the humongous and non-humongous path contents instead of just using what you know it to be. Also there's a lot more going on during an online migration, regions can be moving, splitting, recovering from normal cluster operation and testing that an online migration works in all cases would be tricky. This can be ameliorated to some extent by making the migration 'mostly online', i.e. offlining regions, migrating them, then re-opening them. For Yahoo’s use case, an online migration is not necessary but if the community really needs it we could look into it. [~toffer] can comment more, but I believe we would prefer to insert the buckets under the table directory for now and gradually transition later to reworking meta to be the source of table/region association information, creating a uniform/non-table oriented data directory, etc. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715147#comment-14715147 ] Ben Lau commented on HBASE-14283: - Hi Anoop. The problem isn't that we read a previous block and see that the block is not the expected type. prevBlockOffset guarantees that we can seek to the previous block of the same type as the current one. See the comments on HFileBlock.getPrevBlockOffset(). We are always seeking to the previous data block, we are simply not calculating how much to read correctly once we have seeked to that previous data block because our prev data block size calculation can include other blocks because of the layout of scannable section in HFileV2+. We need a way of knowing apriori what the size of the previous data block is. The method you describe is used in HFileReaderImpl.readNextDataBlock(). Note that the reason this method works is because this method can use the method curBlock.getNextBlockOnDiskSizeWithHeader(). We need something similar to that when seeking backwards in order to achieve optimal performance. Let me know if I misunderstood what you meant. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717095#comment-14717095 ] Ben Lau commented on HBASE-14283: - Thanks ramkrishna. I will take a crack at fixing the calculation in the presence of bloom filters later and see how an interface update looks. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710583#comment-14710583 ] Ben Lau commented on HBASE-14283: - Anyone have any comments on this bug and the fix? Perhaps [~zjushch] (from HBASE-4811 which implemented reverse scan) or [~mikhail] (from HBASE-3857 which implemented HFileV2) or [~liyintang] (from HBASE-4532 which added the bloom filter method(s) I have a question about)? Or maybe one of them can tell me the person(s) who would be best suited to examine the bug and proposed fixes? Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711760#comment-14711760 ] Ben Lau commented on HBASE-14283: - Thanks [~ramkrishna.s.vasude...@gmail.com] for showing me the conventions for the patch process. The tests look like they passed, with 1 minor style comment. I can fix that but want to address the bloom filter blocks issue first. Who would be best for me to ask about it, how about you? Is it reasonable to change the HFile.Reader interface so that the HFile reader (instead of the higher level StoreFile reader) is in charge of deserializing and holding the bloom filter data structures? Can't see why not but maybe I'm missing something. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724040#comment-14724040 ] Ben Lau commented on HBASE-14283: - Correct me if I'm wrong but I think I can't do a generalBloomFilter == null check because there are 3 states not 2: (1) Tried to load filter and it exists (2) Tried to load filter and it does not exist (3) Have not tried to load filter yet. If we rely on the generalBloomFilter == null check we can't distinguish between (2) and (3) which means we would end up trying to reload the filter unnecessarily. {quote} Can information from FileInfo be included in the exception message to facilitate debugging ? {quote} What information from FileInfo should be provided? The point of the message (maybe it needs to be revised for clarity) is that we found an unexpected BloomFilter in the HFile-- unexpected because the HFile FileInfo metadata claims there is no bloom filter (type = NONE). I will clarify the msg a bit. I'll fix the other issues, thanks. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724104#comment-14724104 ] Ben Lau commented on HBASE-14283: - The other parts of the code consider BloomFilter to be a nullable type, not just here. I don't know if it makes sense to change that in this patch. It is a bit overkill to use a null object here and seems to increase complexity more than it eliminates currently (new class to eliminate a load flag), since unlike in some common use cases of having a null object, we can't avoid checking for the null object here. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724107#comment-14724107 ] Ben Lau commented on HBASE-14283: - Alright I'll put it on reviewboard, thanks. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724357#comment-14724357 ] Ben Lau commented on HBASE-14283: - Reviewboard link with updated version of patch (v3): https://reviews.apache.org/r/37971/ > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736093#comment-14736093 ] Ben Lau commented on HBASE-14283: - Yep, we talked with Anoop and agree that the patch adds a lot of complexity for a fix that doesn't fix the issue 100%. The portion of the patch that is required to fix the bug for bloom filters is especially long. We thought to aim for a longer term fix later, but based on our discussion with Anoop it sounds like a backwards compatible, complete fix that adds the necessary metadata to HFile should not be too complicated/much work (eg does not involve creating new HFileReader implementation or other infrastructure). We will submit a new patch later with a final fix. We will keep the unit tests from the 1st patch since they are still applicable. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746484#comment-14746484 ] Ben Lau commented on HBASE-14283: - After talking to some committers in HBase, it seems that unless there is a very strong case / no viable alternative, all new patches to HBase should not require a full cluster restart. Hence, we will be going with the 2-rolling-restart approach as described above. It requires the cluster operator to do 2 rolling restarts and set a new config but that should not be too burdensome for a major upgrade. This rolling-restart-compatible approach is a bit more messy/complicated code-wise so let us look a bit into the best way to do this. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14439) New/Improved Filesystem Abstractions
[ https://issues.apache.org/jira/browse/HBASE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-14439: Description: Ticket for work in progress on new FileSystem abstractions. Previously, we (Yahoo) submitted a ticket that would add support for humongous (1 million region+) tables via a hierarchical layout (HBASE-13991). However open source is moving in a similar but not identical direction in the future and so the patch will not be merged into open source. We will be working on a different patch now with folks from open source. It will create/add to 2 layers-- a path abstraction layer and a use-oriented abstraction layer. The path abstraction layer is epitomized by classes like FsUtils (and in the patch new classes like AFsLayout). The use oriented abstraction layer is epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build on the path abstraction layer and focus on 'doing things' (eg creating regions) and less on the gritty details like the paths. This work on abstracting and isolating the paths from the use cases will help Yahoo not diverge too much from open source with its internal 'Humongous' table hierarchical layout, while also helping open source move further towards the eventual goal of redoing the FS layout in a similar (but different) hierarchical layout later that focuses on data directory uniformity (unlike the humongous patch) and storing hierarchy in the meta table instead which enables new optimizations (see HBASE-14090.) Attached to this ticket is some work we've done at Yahoo so far that will be put into an open source HBase branch for further collaboration. The patch is not meant to be complete yet and is a work in progress. (Please wait on patch comments/reviews.) It also includes some Yahoo-specific 'humongous' layout code that will be removed before submission in open source. was: Ticket for work in progress on new FileSystem abstractions. Previously, we (Yahoo) submitted a ticket that would add support for humongous (1 million region+) tables via a hierarchical layout (HBASE-13991). However open source is moving in a similar but not identical direction in the future and so the patch will not be merged into open source. We will be working with Cloudera on a different patch now. It will create/add to 2 layers-- a path abstraction layer and a use-oriented abstraction layer. The path abstraction layer is epitomized by classes like FsUtils (and in the patch new classes like AFsLayout). The use oriented abstraction layer is epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build on the path abstraction layer and focus on 'doing things' (eg creating regions) and less on the gritty details like the paths. This work on abstracting and isolating the paths from the use cases will help Yahoo not diverge too much from open source with its internal 'Humongous' table hierarchical layout, while also helping open source move further towards the eventual goal of redoing the FS layout in a similar (but different) hierarchical layout later that focuses on data directory uniformity (unlike the humongous patch) and storing hierarchy in the meta table instead which enables new optimizations (see HBASE-14090.) Attached to this ticket is some work we've done at Yahoo so far that will be put into an open source HBase branch for further collaboration. The patch is not meant to be complete yet and is a work in progress. (Please wait on patch comments/reviews.) It also includes some Yahoo-specific 'humongous' layout code that will be removed before submission in open source. > New/Improved Filesystem Abstractions > > > Key: HBASE-14439 > URL: https://issues.apache.org/jira/browse/HBASE-14439 > Project: HBase > Issue Type: Sub-task >Reporter: Ben Lau >Assignee: Matteo Bertozzi > Attachments: abstraction.patch > > > Ticket for work in progress on new FileSystem abstractions. Previously, we > (Yahoo) submitted a ticket that would add support for humongous (1 million > region+) tables via a hierarchical layout (HBASE-13991). However open source > is moving in a similar but not identical direction in the future and so the > patch will not be merged into open source. > We will be working on a different patch now with folks from open source. It > will create/add to 2 layers-- a path abstraction layer and a use-oriented > abstraction layer. The path abstraction layer is epitomized by classes like > FsUtils (and in the patch new classes like AFsLayout). The use oriented > abstraction layer is epitomized by existing classes like > MasterFileSystem/HRegionFileSystem (and possibly new classes later) that > build on
[jira] [Created] (HBASE-14439) New Filesystem Abstraction Layer
Ben Lau created HBASE-14439: --- Summary: New Filesystem Abstraction Layer Key: HBASE-14439 URL: https://issues.apache.org/jira/browse/HBASE-14439 Project: HBase Issue Type: New Feature Reporter: Ben Lau Ticket for work in progress on new FileSystem abstractions. Previously, we (Yahoo) submitted a ticket that would add support for humongous (1 million region+) tables via a hierarchical layout (HBASE-13991). However open source is moving in a similar but not identical direction in the future and so the patch will not be merged. We will be working with Cloudera on a different patch now. It will create/add to 2 layers-- a path abstraction layer and a use-oriented abstraction layer. The path abstraction layer is epitomized by classes like FsUtils (and in the patch new classes like AFsLayout). The use oriented abstraction layer is epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build on the path abstraction layer and focus on 'doing things' (eg creating regions) and less on the gritty details like the paths. This work on abstracting and isolating the paths from the use cases will help Yahoo not diverge too much from open source with its internal 'Humongous' table hierarchical layout, while also helping open source move further towards the eventual goal of redoing the FS layout in a similar (but different) hierarchical layout later that focuses on data directory uniformity and storing hierarchy in the meta table instead (see HBASE-14090.) Attached to this ticket is some work we've done at Yahoo so far that will be put into an open source HBase branch for further collaboration. The patch is not meant to be complete yet and is a work in progress. (Please wait on patch comments/reviews.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14439) New Filesystem Abstraction Layer
[ https://issues.apache.org/jira/browse/HBASE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-14439: Attachment: abstraction.patch > New Filesystem Abstraction Layer > > > Key: HBASE-14439 > URL: https://issues.apache.org/jira/browse/HBASE-14439 > Project: HBase > Issue Type: New Feature >Reporter: Ben Lau > Attachments: abstraction.patch > > > Ticket for work in progress on new FileSystem abstractions. Previously, we > (Yahoo) submitted a ticket that would add support for humongous (1 million > region+) tables via a hierarchical layout (HBASE-13991). However open source > is moving in a similar but not identical direction in the future and so the > patch will not be merged. > We will be working with Cloudera on a different patch now. It will > create/add to 2 layers-- a path abstraction layer and a use-oriented > abstraction layer. The path abstraction layer is epitomized by classes like > FsUtils (and in the patch new classes like AFsLayout). The use oriented > abstraction layer is epitomized by existing classes like > MasterFileSystem/HRegionFileSystem (and possibly new classes later) that > build on the path abstraction layer and focus on 'doing things' (eg creating > regions) and less on the gritty details like the paths. > This work on abstracting and isolating the paths from the use cases will help > Yahoo not diverge too much from open source with its internal 'Humongous' > table hierarchical layout, while also helping open source move further > towards the eventual goal of redoing the FS layout in a similar (but > different) hierarchical layout later that focuses on data directory > uniformity and storing hierarchy in the meta table instead (see HBASE-14090.) > Attached to this ticket is some work we've done at Yahoo so far that will be > put into an open source HBase branch for further collaboration. The patch is > not meant to be complete yet and is a work in progress. (Please wait on > patch comments/reviews.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14439) New/Improved Filesystem Abstractions
[ https://issues.apache.org/jira/browse/HBASE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-14439: Description: Ticket for work in progress on new FileSystem abstractions. Previously, we (Yahoo) submitted a ticket that would add support for humongous (1 million region+) tables via a hierarchical layout (HBASE-13991). However open source is moving in a similar but not identical direction in the future and so the patch will not be merged into open source. We will be working with Cloudera on a different patch now. It will create/add to 2 layers-- a path abstraction layer and a use-oriented abstraction layer. The path abstraction layer is epitomized by classes like FsUtils (and in the patch new classes like AFsLayout). The use oriented abstraction layer is epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build on the path abstraction layer and focus on 'doing things' (eg creating regions) and less on the gritty details like the paths. This work on abstracting and isolating the paths from the use cases will help Yahoo not diverge too much from open source with its internal 'Humongous' table hierarchical layout, while also helping open source move further towards the eventual goal of redoing the FS layout in a similar (but different) hierarchical layout later that focuses on data directory uniformity (unlike the humongous patch) and storing hierarchy in the meta table instead which enables new optimizations (see HBASE-14090.) Attached to this ticket is some work we've done at Yahoo so far that will be put into an open source HBase branch for further collaboration. The patch is not meant to be complete yet and is a work in progress. (Please wait on patch comments/reviews.) was: Ticket for work in progress on new FileSystem abstractions. Previously, we (Yahoo) submitted a ticket that would add support for humongous (1 million region+) tables via a hierarchical layout (HBASE-13991). However open source is moving in a similar but not identical direction in the future and so the patch will not be merged. We will be working with Cloudera on a different patch now. It will create/add to 2 layers-- a path abstraction layer and a use-oriented abstraction layer. The path abstraction layer is epitomized by classes like FsUtils (and in the patch new classes like AFsLayout). The use oriented abstraction layer is epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build on the path abstraction layer and focus on 'doing things' (eg creating regions) and less on the gritty details like the paths. This work on abstracting and isolating the paths from the use cases will help Yahoo not diverge too much from open source with its internal 'Humongous' table hierarchical layout, while also helping open source move further towards the eventual goal of redoing the FS layout in a similar (but different) hierarchical layout later that focuses on data directory uniformity and storing hierarchy in the meta table instead (see HBASE-14090.) Attached to this ticket is some work we've done at Yahoo so far that will be put into an open source HBase branch for further collaboration. The patch is not meant to be complete yet and is a work in progress. (Please wait on patch comments/reviews.) Summary: New/Improved Filesystem Abstractions (was: New Filesystem Abstraction Layer) > New/Improved Filesystem Abstractions > > > Key: HBASE-14439 > URL: https://issues.apache.org/jira/browse/HBASE-14439 > Project: HBase > Issue Type: New Feature >Reporter: Ben Lau > Attachments: abstraction.patch > > > Ticket for work in progress on new FileSystem abstractions. Previously, we > (Yahoo) submitted a ticket that would add support for humongous (1 million > region+) tables via a hierarchical layout (HBASE-13991). However open source > is moving in a similar but not identical direction in the future and so the > patch will not be merged into open source. > We will be working with Cloudera on a different patch now. It will > create/add to 2 layers-- a path abstraction layer and a use-oriented > abstraction layer. The path abstraction layer is epitomized by classes like > FsUtils (and in the patch new classes like AFsLayout). The use oriented > abstraction layer is epitomized by existing classes like > MasterFileSystem/HRegionFileSystem (and possibly new classes later) that > build on the path abstraction layer and focus on 'doing things' (eg creating > regions) and less on the gritty details like the paths. > This work on abstracting and isolating the paths from the use cases will help > Yahoo not diverge too much from open source with its internal 'Humongous'
[jira] [Updated] (HBASE-14439) New/Improved Filesystem Abstractions
[ https://issues.apache.org/jira/browse/HBASE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-14439: Description: Ticket for work in progress on new FileSystem abstractions. Previously, we (Yahoo) submitted a ticket that would add support for humongous (1 million region+) tables via a hierarchical layout (HBASE-13991). However open source is moving in a similar but not identical direction in the future and so the patch will not be merged into open source. We will be working with Cloudera on a different patch now. It will create/add to 2 layers-- a path abstraction layer and a use-oriented abstraction layer. The path abstraction layer is epitomized by classes like FsUtils (and in the patch new classes like AFsLayout). The use oriented abstraction layer is epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build on the path abstraction layer and focus on 'doing things' (eg creating regions) and less on the gritty details like the paths. This work on abstracting and isolating the paths from the use cases will help Yahoo not diverge too much from open source with its internal 'Humongous' table hierarchical layout, while also helping open source move further towards the eventual goal of redoing the FS layout in a similar (but different) hierarchical layout later that focuses on data directory uniformity (unlike the humongous patch) and storing hierarchy in the meta table instead which enables new optimizations (see HBASE-14090.) Attached to this ticket is some work we've done at Yahoo so far that will be put into an open source HBase branch for further collaboration. The patch is not meant to be complete yet and is a work in progress. (Please wait on patch comments/reviews.) It also includes some Yahoo-specific 'humongous' layout code that will be removed before submission in open source. was: Ticket for work in progress on new FileSystem abstractions. Previously, we (Yahoo) submitted a ticket that would add support for humongous (1 million region+) tables via a hierarchical layout (HBASE-13991). However open source is moving in a similar but not identical direction in the future and so the patch will not be merged into open source. We will be working with Cloudera on a different patch now. It will create/add to 2 layers-- a path abstraction layer and a use-oriented abstraction layer. The path abstraction layer is epitomized by classes like FsUtils (and in the patch new classes like AFsLayout). The use oriented abstraction layer is epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build on the path abstraction layer and focus on 'doing things' (eg creating regions) and less on the gritty details like the paths. This work on abstracting and isolating the paths from the use cases will help Yahoo not diverge too much from open source with its internal 'Humongous' table hierarchical layout, while also helping open source move further towards the eventual goal of redoing the FS layout in a similar (but different) hierarchical layout later that focuses on data directory uniformity (unlike the humongous patch) and storing hierarchy in the meta table instead which enables new optimizations (see HBASE-14090.) Attached to this ticket is some work we've done at Yahoo so far that will be put into an open source HBase branch for further collaboration. The patch is not meant to be complete yet and is a work in progress. (Please wait on patch comments/reviews.) > New/Improved Filesystem Abstractions > > > Key: HBASE-14439 > URL: https://issues.apache.org/jira/browse/HBASE-14439 > Project: HBase > Issue Type: New Feature >Reporter: Ben Lau >Assignee: Matteo Bertozzi > Attachments: abstraction.patch > > > Ticket for work in progress on new FileSystem abstractions. Previously, we > (Yahoo) submitted a ticket that would add support for humongous (1 million > region+) tables via a hierarchical layout (HBASE-13991). However open source > is moving in a similar but not identical direction in the future and so the > patch will not be merged into open source. > We will be working with Cloudera on a different patch now. It will > create/add to 2 layers-- a path abstraction layer and a use-oriented > abstraction layer. The path abstraction layer is epitomized by classes like > FsUtils (and in the patch new classes like AFsLayout). The use oriented > abstraction layer is epitomized by existing classes like > MasterFileSystem/HRegionFileSystem (and possibly new classes later) that > build on the path abstraction layer and focus on 'doing things' (eg creating > regions) and less on the gritty details like the paths. > This work on
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744844#comment-14744844 ] Ben Lau commented on HBASE-14283: - Hey guys, I started looking into updating the HFile serialization to support reverse scans per previous comments. One thing that immediately struck me as being a possible problem is that the header sizes appear to be hardcoded into HConstants.java (HConstants.HFILEBLOCK_HEADER_SIZE), rather than being read from the HFile block header or HFile metadata itself. This seems to imply that if I add more fields to the header and then do a rolling restart to update all region servers to have my code, any old region server that hasn't updated yet and is processing the new HFiles will not realize the header is bigger now and that there is stuff they need to skip / ignore. This might necessitate a 2-step restart process with 2 rolling restarts. Restart 1 to update all RS to have the appropriate new reading code. Restart 2 will enable writes by setting an HBase config option (false by default) to start writing the new HFiles. Am I missing something and this 2-step rolling restart is not necessary for some reason? It seems unlikely people would find this process palatable but is there a better alternative? Alternatively I can turn this into a non-backwards compatible major version update instead of a minor version update and require a full cluster restart but that is kind of harsh in its own way. Opinions/thoughts? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737171#comment-14737171 ] Ben Lau commented on HBASE-14283: - Yes we would be changing the serialization for HFileBlock header. It would have a new field for the previous data block size, for block of the same type (same semantics as prevBlockOffset now). Any objections? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906704#comment-14906704 ] Ben Lau commented on HBASE-14283: - Hi guys, I have posted a new patch on review board. See https://reviews.apache.org/r/38720/. The patch adds support for HFileV4 and uses it to fix/optimize reverse scan. The patch is designed to be rolling-restartable in 2 phases, as discussed in an above comment on Sep 14. As currently posted, I think the patch has to go into HBase 2.0 since it changes HConstants which is marked @Stable. It turns out that assumptions about the header size and contents are hardcoded in several places, primarily HConstants.java. Let me know what you guys think of the patch. More eyes would be better because the changes are somewhat farther reaching than they sounded initially and the HFile format has a long history that I’m not as familiar with as most people around here. I also need to reach out to Facebook later for feedback since this change seems like it will affect their external Memcached block cache. I have marked TODO in the patch in several places where I will need to talk with Facebook. Also, since we are adding a new HFileV4, now would be a good time to fix anything else that is broken or add additional metadata for future optimizations that we missed out on in HFileV3. Someone could add more metadata after the patch stands on its own as a complete fix for reverse scans. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909063#comment-14909063 ] Ben Lau commented on HBASE-14283: - As Matteo mentions, we discussed this issue briefly with him and Stack about a week ago or so. [~mbertozzi] I thought we had asked if updating the major version would be fine and the answer was in the affirmative but I might've misunderstood or misunderstood the degree of affirmation. (IIRC the reason was something along the lines of cluster operators updating HFile.FORMAT_VERSION_KEY in their config being a desirable property of the 2nd rolling upgrade or something.) In any case though the reason we initially were going to do a 3.x but moved to a 4.0 was because (1) major versions (but not in HFiles?) generally denote level of backward compatibility and the new HFiles produced in this patch cannot be read by an HFileV3.X reader (2) the patch requires enough changes to assumptions in the serialization code (eg regarding header size or block cache) that it doesn't seem appropriate as a minor version change and (3) if we are following the rules we've set for ourselves, the changes in HConstants alone (annotated as Stable) mean this patch should be going into HBase 2.0 (admittedly rules can always be bent). Updating only the minor version because the format change currently fixes only 1 bug (as opposed to 10 bugs or adding a new feature) seems to be the wrong way to think of versioning, IMO. If our concern is that we would like a fix for this bug in 1.3 and not wait until 2.0, we could also commit a shorter term fix for 1.3 that just always reads the block header or does an optimistic read and falls back on reading the block header if the read fails from block size expectations (configurable, optimistic off by default). In combination with an expected size correction for index blocks (perhaps not for bloom filter blocks since that fix is a messy addition and also violates some API abstraction layers in StoreFile) it might be fine in most scenarios, especially if the cluster operator is allowed to change the inline block chunking config for the cluster that needs to do reverse scans. Within Yahoo internally we will probably go with a bandaid fix like this for now so that users can use reverse scans and still get 'ok' even if not max performance. (Also sub-100% performance is better than getting exceptions about block sizes ;) ) If people would be okay with dividing it up this way-- short term fix (no HFile changes) for 1.3 and a longer term fix (HFileV4) for HBase 2.0 I could create a separate ticket with the newest patch as a starting point for HFileV4 and submit another patch for this ticket that implements a configurable 'choose your tradeoff' fix as described in the previous paragraph. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906728#comment-14906728 ] Ben Lau commented on HBASE-14283: - Hi Nick, that sounds like a good idea, I will do that. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720844#comment-14720844 ] Ben Lau commented on HBASE-14283: - Here's a V2 of the patch that handles bloom filter blocks. It requires some interface changes that blur the line a bit between the StoreFile reader and the HFile reader which is not ideal but there isn't really any other way to fix this currently in a performant way for bloom filters. Let me know what you guys think. I have attached the patch to the ticket for review/feedback. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-14283: Attachment: HBASE-14283-v2.patch Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16052: Attachment: HBASE-16052-master.patch > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau > Attachments: HBASE-16052-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16052) Improve HBaseFsck Scalability
Ben Lau created HBASE-16052: --- Summary: Improve HBaseFsck Scalability Key: HBASE-16052 URL: https://issues.apache.org/jira/browse/HBASE-16052 Project: HBase Issue Type: Improvement Components: hbck Reporter: Ben Lau There are some problems with HBaseFsck that make it unnecessarily slow especially for large tables or clusters with many regions. This patch tries to fix the biggest bottlenecks and also include a couple of bug fixes for some of the race conditions caused by gathering and holding state about a live cluster that is no longer true by the time you use that state in Fsck processing. These race conditions cause Fsck to crash and become unusable on large clusters with lots of region splits/merges. Here are some scalability/performance problems in HBaseFsck and the changes the patch makes: - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and then discarding everything but the Paths, then passing the Paths to a PathFilter, and then having the filter look up the (previously discarded) FileStatuses of the paths again. This is actually worse than double I/O because the first lookup obtains a batch of FileStatuses while all the other lookups are individual RPCs performed sequentially. -- Avoid this by adding a FileStatusFilter so that filtering can happen directly on FileStatuses -- This performance bug affects more than Fsck, but also to some extent things like snapshots, hfile archival, etc. I didn't have time to look too deep into other things affected and didn't want to increase the scope of this ticket so I focus mostly on Fsck and make only a few improvements to other codepaths. The changes in this patch though should make it fairly easy to fix other code paths in later jiras if we feel there are some other features strongly impacted by this problem. - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of Fsck runtime) and the running time scales with the number of store files, yet the function is completely serial -- Make offlineReferenceFileRepair multithreaded - LoadHdfsRegionDirs() uses table-level concurrency, which is a big bottleneck if you have 1 large cluster with 1 very large table that has nearly all the regions -- Change loadHdfsRegionDirs() to region-level parallelism instead of table-level parallelism for operations. The changes benefit all clusters but are especially noticeable for large clusters with a few very large tables. On our version of 0.98 with the original patch we had a moderately sized production cluster with 2 (user) tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342282#comment-15342282 ] Ben Lau commented on HBASE-16052: - [~jmhsieh] and [~jxiang] Any comments on this patch? Just checking since you guys are listed as the owners for the HBaseFsck component on https://issues.apache.org/jira/browse/HBASE/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau > Attachments: HBASE-16052-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345499#comment-15345499 ] Ben Lau commented on HBASE-16052: - Added a comment about a refactor that may clean up / unify the FSUtils FileStatus filtering interface a bit at the expense of some minor object overhead. Let me know what you guys think on the reviewboard. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau > Attachments: HBASE-16052-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340148#comment-15340148 ] Ben Lau commented on HBASE-16052: - Reviewboard link: https://reviews.apache.org/r/48959/ Added annotation and javadoc for the AbstractFileStatusFilter. Fixed the whitespaces mentioned in the Hadoop QA bot's comment. Also re: Hadoop QA bot comment: I didn't add any new tests since it seems existing unit tests are sufficient as this patch optimizes common codepaths already exercised by the Fsck tests. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau > Attachments: HBASE-16052-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351422#comment-15351422 ] Ben Lau commented on HBASE-16052: - Thanks guys. The patch doesn't apply cleanly to branch-1 so I will upload a new patch for branch-1. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351586#comment-15351586 ] Ben Lau commented on HBASE-16052: - Uploaded the patch for branch-1. Re-ran the Fsck unit tests which passed. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16052: Attachment: HBASE-16052-v3-branch-1.patch > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14576) New HFile version for optimized reverse scans
[ https://issues.apache.org/jira/browse/HBASE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290115#comment-15290115 ] Ben Lau commented on HBASE-14576: - Hi [~syuanjiang] I'm not currently working on the hfile format revision. The (to my knowledge) mostly complete/correct patch I included before in HBASE-14283 is what I have still. I haven't had time to work on open source since then. I may have time this quarter though to come back to the patch and fix conflicts/address any issues people raise. Is this ticket blocking anything for you? > New HFile version for optimized reverse scans > - > > Key: HBASE-14576 > URL: https://issues.apache.org/jira/browse/HBASE-14576 > Project: HBase > Issue Type: Improvement >Reporter: Ben Lau >Assignee: Ben Lau > > A new ticket to finish the work from HBASE-14283, which will fix the > HFileReader seekBefore() previous block size calculation bug but make the > resulting reverse scan take more I/O than a forward scan. > Fixing the bug in the long term requires an HFile version bump, either major > or minor. We will put the previous block's size in the HFileBlock header > instead of trying to calculate it directly using block offset arithmetic. > Per [~anoop.hbase]'s suggestion, I created this ticket so that we can > separate the issue of fixing the bug (the responsibility of HBASE-14283) and > the issue of getting reverse scans to run quickly (the responsibility of this > ticket). It is also unlikely that this ticket will be backported to old > versions of HBase eg 0.98 whereas HBASE-14283 can be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16052: Attachment: HBASE-16052-0.98.v3-amendment.patch Seems it should just be this method call. [~tedyu] can you take a look? > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0, 0.98.21 > > Attachments: HBASE-16052-0.98.v3-amendment.patch, > HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, > HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386493#comment-15386493 ] Ben Lau edited comment on HBASE-16052 at 7/20/16 7:42 PM: -- Seems it should just be this method call. [~tedyu] can you take a look at the amendment patch I just attached? was (Author: benlau): Seems it should just be this method call. [~tedyu] can you take a look? > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0, 0.98.21 > > Attachments: HBASE-16052-0.98.v3-amendment.patch, > HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, > HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386372#comment-15386372 ] Ben Lau commented on HBASE-16052: - Looks like some of the methods don't exist on Hadoop 1.1. When I ran tests locally it was against Hadoop 2.X. Will fix. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0, 0.98.21 > > Attachments: HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, > HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16052: Status: Patch Available (was: In Progress) > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, > HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382689#comment-15382689 ] Ben Lau commented on HBASE-16052: - [~busbey] Thanks! > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, > HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382638#comment-15382638 ] Ben Lau commented on HBASE-16052: - [~tedyu] / [~syuanjiang] How does the 0.98 patch look? Is there a way for me to trigger the tests? Does uploading a file after the ticket is closed not trigger tests anymore? > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, > HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-16052 started by Ben Lau. --- > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, > HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau reopened HBASE-16052: - Re-opening to trigger test on 0.98 patch. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, > HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16052: Attachment: HBASE-16052-0.98.v3.patch Thanks [~tedyu] -- done. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, > HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384626#comment-15384626 ] Ben Lau commented on HBASE-16052: - Test failure looks unrelated to me. Reran locally and passed. Let me know if I missed anything but I don't think there are new Javadoc/findbugs warnings in the patch. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, > HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16052: Attachment: HBASE-16052-v3-0.98.patch Attached patch for 0.98. Let me know if this looks good [~te...@apache.org] and [~syuanjiang]. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, > HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365175#comment-15365175 ] Ben Lau commented on HBASE-16052: - Ah okay, good to know, thanks. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365169#comment-15365169 ] Ben Lau commented on HBASE-16052: - Hi Ted, I can probably get to it on Monday. Just to make sure I understand the process you guys were discussing, since this is an enhancement (not bug fix), this means it should go into 0.98? I thought 0.98 was mostly in a "bug fixes only" state? > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359848#comment-15359848 ] Ben Lau commented on HBASE-16052: - Okay let me know if you guys have a consensus for what other versions should be patched. Re: 0.98-- yes we tested the original version of this patch in 0.98. However if we want to patch 0.98 it probably makes more sense to apply a backport of the trunk patch than our original 0.98 patch. The trunk patch has some improvements that make the code cleaner on trunk but weren't that important on 0.98 originally (eg there are a lot more PathFilter classes in trunk so adding an abstract class to avoid too much code duplication became a no brainer). To keep the code from diverging too much it would make sense to backport from trunk if we decide to patch 0.98. I'll add a release note later based on the Jira description. Feel free to expand/shorten it. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability
[ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16052: Release Note: HBASE-16052 improves the performance and scalability of HBaseFsck, especially for large clusters with a small number of large tables. Searching for lingering reference files is now a multi-threaded operation. Loading HDFS region directory information is now multi-threaded at the region-level instead of the table-level to maximize concurrency. A performance bug in HBaseFsck that resulted in redundant I/O and RPCs was fixed by introducing a FileStatusFilter that filters FileStatus objects directly. > Improve HBaseFsck Scalability > - > > Key: HBASE-16052 > URL: https://issues.apache.org/jira/browse/HBASE-16052 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, > HBASE-16052-v3-master.patch > > > There are some problems with HBaseFsck that make it unnecessarily slow > especially for large tables or clusters with many regions. > This patch tries to fix the biggest bottlenecks and also include a couple of > bug fixes for some of the race conditions caused by gathering and holding > state about a live cluster that is no longer true by the time you use that > state in Fsck processing. These race conditions cause Fsck to crash and > become unusable on large clusters with lots of region splits/merges. > Here are some scalability/performance problems in HBaseFsck and the changes > the patch makes: > - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and > then discarding everything but the Paths, then passing the Paths to a > PathFilter, and then having the filter look up the (previously discarded) > FileStatuses of the paths again. This is actually worse than double I/O > because the first lookup obtains a batch of FileStatuses while all the other > lookups are individual RPCs performed sequentially. > -- Avoid this by adding a FileStatusFilter so that filtering can happen > directly on FileStatuses > -- This performance bug affects more than Fsck, but also to some extent > things like snapshots, hfile archival, etc. I didn't have time to look too > deep into other things affected and didn't want to increase the scope of this > ticket so I focus mostly on Fsck and make only a few improvements to other > codepaths. The changes in this patch though should make it fairly easy to > fix other code paths in later jiras if we feel there are some other features > strongly impacted by this problem. > - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of > Fsck runtime) and the running time scales with the number of store files, yet > the function is completely serial > -- Make offlineReferenceFileRepair multithreaded > - LoadHdfsRegionDirs() uses table-level concurrency, which is a big > bottleneck if you have 1 large cluster with 1 very large table that has > nearly all the regions > -- Change loadHdfsRegionDirs() to region-level parallelism instead of > table-level parallelism for operations. > The changes benefit all clusters but are especially noticeable for large > clusters with a few very large tables. On our version of 0.98 with the > original patch we had a moderately sized production cluster with 2 (user) > tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16662) Fix open POODLE vulnerabilities
[ https://issues.apache.org/jira/browse/HBASE-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16662: Status: Patch Available (was: Open) Formally submit patch file (not sure if I did this right). > Fix open POODLE vulnerabilities > --- > > Key: HBASE-16662 > URL: https://issues.apache.org/jira/browse/HBASE-16662 > Project: HBase > Issue Type: Bug > Components: REST, Thrift >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-16662-master.patch > > > We recently found a security issue in our HBase REST servers. The issue is a > variant of the POODLE vulnerability (https://en.wikipedia.org/wiki/POODLE) > and is present in the HBase Thrift server as well. It also appears to affect > the JMXListener coprocessor. The vulnerabilities probably affect all > versions of HBase that have the affected services. (If you don't use the > affected services with SSL then this ticket probably doesn't affect you). > Included is a patch to fix the known POODLE vulnerabilities in master. Let > us know if we missed any. From our end we only personally encountered the > HBase REST vulnerability. We do not use the Thrift server or JMXListener > coprocessor but discovered those problems after discussing the issue with > some of the HBase PMCs. > Coincidentally, Hadoop recently committed a SslSelectChannelConnectorSecure > which is more or less the same as one of the fixes in this patch. Hadoop > wasn't originally affected by the vulnerability in the > SslSelectChannelConnector, but about a month ago they committed HADOOP-12765 > which does use that class, so they added a SslSelectChannelConnectorSecure > class similar to this patch. Since this class is present in Hadoop 2.7.4+ > which hasn't been released yet, we will for now just include our own version > instead of depending on the Hadoop version. > After the patch is approved for master we can backport as necessary to older > versions of HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16662) Fix open POODLE vulnerabilities
Ben Lau created HBASE-16662: --- Summary: Fix open POODLE vulnerabilities Key: HBASE-16662 URL: https://issues.apache.org/jira/browse/HBASE-16662 Project: HBase Issue Type: Bug Components: REST, Thrift Reporter: Ben Lau Assignee: Ben Lau We recently found a security issue in our HBase REST servers. The issue is a variant of the POODLE vulnerability (https://en.wikipedia.org/wiki/POODLE) and is present in the HBase Thrift server as well. It also appears to affect the JMXListener coprocessor. The vulnerabilities probably affect all versions of HBase that have the affected services. (If you don't use the affected services with SSL then this ticket probably doesn't affect you). Included is a patch to fix the known POODLE vulnerabilities in master. Let us know if we missed any. From our end we only personally encountered the HBase REST vulnerability. We do not use the Thrift server or JMXListener coprocessor but discovered those problems after discussing the issue with some of the HBase PMCs. Coincidentally, Hadoop recently committed a SslSelectChannelConnectorSecure which is more or less the same as one of the fixes in this patch. Hadoop wasn't originally affected by the vulnerability in the SslSelectChannelConnector, but about a month ago they committed HADOOP-12765 which does use that class, so they added a SslSelectChannelConnectorSecure class similar to this patch. Since this class is present in Hadoop 2.7.4+ which hasn't been released yet, we will for now just include our own version instead of depending on the Hadoop version. After the patch is approved for master we can backport as necessary to older versions of HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16662) Fix open POODLE vulnerabilities
[ https://issues.apache.org/jira/browse/HBASE-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-16662: Attachment: HBASE-16662-master.patch > Fix open POODLE vulnerabilities > --- > > Key: HBASE-16662 > URL: https://issues.apache.org/jira/browse/HBASE-16662 > Project: HBase > Issue Type: Bug > Components: REST, Thrift >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-16662-master.patch > > > We recently found a security issue in our HBase REST servers. The issue is a > variant of the POODLE vulnerability (https://en.wikipedia.org/wiki/POODLE) > and is present in the HBase Thrift server as well. It also appears to affect > the JMXListener coprocessor. The vulnerabilities probably affect all > versions of HBase that have the affected services. (If you don't use the > affected services with SSL then this ticket probably doesn't affect you). > Included is a patch to fix the known POODLE vulnerabilities in master. Let > us know if we missed any. From our end we only personally encountered the > HBase REST vulnerability. We do not use the Thrift server or JMXListener > coprocessor but discovered those problems after discussing the issue with > some of the HBase PMCs. > Coincidentally, Hadoop recently committed a SslSelectChannelConnectorSecure > which is more or less the same as one of the fixes in this patch. Hadoop > wasn't originally affected by the vulnerability in the > SslSelectChannelConnector, but about a month ago they committed HADOOP-12765 > which does use that class, so they added a SslSelectChannelConnectorSecure > class similar to this patch. Since this class is present in Hadoop 2.7.4+ > which hasn't been released yet, we will for now just include our own version > instead of depending on the Hadoop version. > After the patch is approved for master we can backport as necessary to older > versions of HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure
[ https://issues.apache.org/jira/browse/HBASE-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau resolved HBASE-17720. - Resolution: Duplicate > Possible bug in FlushSnapshotSubprocedure > - > > Key: HBASE-17720 > URL: https://issues.apache.org/jira/browse/HBASE-17720 > Project: HBase > Issue Type: Bug > Components: dataloss, snapshots >Reporter: Ben Lau > > I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that > it does not appear to explicitly handle a DroppedSnapshotException. In the > primary codepath when flushing memstores, (see > MemStoreFlusher.flushRegion()), there is a try/catch for > DroppedSnapshotException that will abort the regionserver to replay WALs to > avoid data loss. I don't see this in FlushSnapshotSubProcedure. Is this an > accidental omission or is there a reason this isn't present? > I'm not too familiar with procedure V1 or V2. I assume it is the case that > if a participant dies that all other participants will terminate any > outstanding operations for the procedure? If so and if this lack of > RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed > naively otherwise I assume a failed flush on 1 region server could cause a > cascade of RS abortions on the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure
[ https://issues.apache.org/jira/browse/HBASE-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892973#comment-15892973 ] Ben Lau commented on HBASE-17720: - Ah, thanks Jerry. It looks like the fix was added in HBASE-13877 which was committed after the version of 0.98 we use. The scenario (failure in ITBLL) is exactly what I ran into. I had checked master to make sure this wasn't fixed, but I checked FlushSnapshotSubprocedure for the missing try/catch, not RegionServerSnapshotManager. So this is a dup of HBASE-13877. Will close, thanks Jerry. > Possible bug in FlushSnapshotSubprocedure > - > > Key: HBASE-17720 > URL: https://issues.apache.org/jira/browse/HBASE-17720 > Project: HBase > Issue Type: Bug > Components: dataloss, snapshots >Reporter: Ben Lau > > I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that > it does not appear to explicitly handle a DroppedSnapshotException. In the > primary codepath when flushing memstores, (see > MemStoreFlusher.flushRegion()), there is a try/catch for > DroppedSnapshotException that will abort the regionserver to replay WALs to > avoid data loss. I don't see this in FlushSnapshotSubProcedure. Is this an > accidental omission or is there a reason this isn't present? > I'm not too familiar with procedure V1 or V2. I assume it is the case that > if a participant dies that all other participants will terminate any > outstanding operations for the procedure? If so and if this lack of > RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed > naively otherwise I assume a failed flush on 1 region server could cause a > cascade of RS abortions on the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure
Ben Lau created HBASE-17720: --- Summary: Possible bug in FlushSnapshotSubprocedure Key: HBASE-17720 URL: https://issues.apache.org/jira/browse/HBASE-17720 Project: HBase Issue Type: Bug Components: dataloss, snapshots Reporter: Ben Lau I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that it does not appear to explicitly handle a DroppedSnapshotException. In the primary codepath when flushing memstores, (see MemStoreFlusher.flushRegion()), there is a try/catch for DroppedSnapshotException that will abort the regionserver to replay WALs to avoid data loss. I don't see this in FlushSnapshotSubProcedure. Is this an accidental omission or is there a reason this isn't present? I'm not too familiar with procedure V1 or V2. I assume it is the case that if a participant dies that all other participants will terminate any outstanding operations for the procedure? If so and if this lack of RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed naively otherwise I assume a failed flush on 1 region server could cause a cascade of RS abortions on the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
Ben Lau created HBASE-19989: --- Summary: READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly Key: HBASE-19989 URL: https://issues.apache.org/jira/browse/HBASE-19989 Project: HBase Issue Type: Bug Affects Versions: 1.4.1, 1.3.1 Reporter: Ben Lau Assignee: Ben Lau Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. [~thiruvel] and I noticed this is due to break statements being in the wrong place in AssignmentManager. This allows a race condition for example in which one of the regions being merged could be moved concurrently, resulting in the merge transaction failing and then double assignment and/or dataloss. This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-19989: Attachment: HBASE-19989.patch > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15911) NPE in AssignmentManager.onRegionTransition after Master restart
[ https://issues.apache.org/jira/browse/HBASE-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363161#comment-16363161 ] Ben Lau commented on HBASE-15911: - [~pankaj2461] [~mantonov] We recently ran into this and had to fix this as it was preventing our master from starting up. We would like to submit a suggested fix and test case if you guys do not have a patch yet. > NPE in AssignmentManager.onRegionTransition after Master restart > > > Key: HBASE-15911 > URL: https://issues.apache.org/jira/browse/HBASE-15911 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov >Assignee: Mikhail Antonov >Priority: Major > > 16/05/27 17:49:18 ERROR ipc.RpcServer: Unexpected throwable object > java.lang.NullPointerException > at > org.apache.hadoop.hbase.master.AssignmentManager.onRegionTransition(AssignmentManager.java:4364) > at > org.apache.hadoop.hbase.master.MasterRpcServices.reportRegionStateTransition(MasterRpcServices.java:1421) > at > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8623) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2239) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:116) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:137) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:112) > at java.lang.Thread.run(Thread.java:745) > I'm pretty sure I've seen it before and more than once, but never got to dig > in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363081#comment-16363081 ] Ben Lau commented on HBASE-19989: - Hi Ted, thanks for the feedback, I'm not sure a comment will be helpful since it comes down to 'if the break is here the code below doesn't run, so the break is not here' but I have added a comment anyway and re-added the ZKLess split/merge tests that were removed in branch-1. Let me know your thoughts, thanks. > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch, HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-19989: Attachment: (was: HBASE-19989.patch) > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-19989: Attachment: HBASE-19989.patch > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch, HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363168#comment-16363168 ] Ben Lau commented on HBASE-18282: - Hi guys, this ticket has been open for a while. Do you mind if we submit an internal patch + test we have for this? > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Assignee: Ashu Pachauri >Priority: Critical > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19995) Current Jetty 9 version in HBase master branch can memory leak under high traffic
Ben Lau created HBASE-19995: --- Summary: Current Jetty 9 version in HBase master branch can memory leak under high traffic Key: HBASE-19995 URL: https://issues.apache.org/jira/browse/HBASE-19995 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0 Reporter: Ben Lau There is a memory-leak in Jetty 9 that manifests whenever you hit the call queue limit in HBase REST. The memory-leak leaks both on-heap and off-heap objects permanently. It happens because whenever the call queue for Jetty server overflows, the task that is rejected runs a 'reject' method if it is a Rejectable to do any cleanup. This clean up is necessary to for example close the connection, deallocate any buffers, etc. Unfortunately, in Jetty 9, they implemented the 'reject' / cleanup method of the SelectChannelEndpoint as a non-blocking call that is not guaranteed to run. This was later fixed in Jetty 9.4 and later backported however the version of Jetty 9 pulled in HBase for REST comes before this fix. See [https://github.com/eclipse/jetty.project/issues/1804] and [https://github.com/apache/hbase/blob/master/pom.xml#L1416.] If we want to stay on 9.3.X we could update to [9.3.22.v20171030|https://mvnrepository.com/artifact/org.eclipse.jetty/jetty-server/9.3.22.v20171030] which is the latest version of 9.3. Thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-18282: Attachment: HBASE-18282-branch-2.patch HBASE-18282-branch-1.patch > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Priority: Critical > Attachments: HBASE-18282-branch-1.patch, HBASE-18282-branch-2.patch > > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364666#comment-16364666 ] Ben Lau commented on HBASE-18282: - Thanks [~apurtell]. It looks like this code is rewritten in master now so the bug is present only in branch-2 and branch-1. I will attach forward ports for those branches. > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Priority: Critical > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19995) Current Jetty 9 version in HBase master branch can memory leak under high traffic
[ https://issues.apache.org/jira/browse/HBASE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-19995: Attachment: HBASE-19995.patch > Current Jetty 9 version in HBase master branch can memory leak under high > traffic > - > > Key: HBASE-19995 > URL: https://issues.apache.org/jira/browse/HBASE-19995 > Project: HBase > Issue Type: Bug > Components: REST >Affects Versions: 2.0 >Reporter: Ben Lau >Priority: Major > Attachments: HBASE-19995.patch > > > There is a memory-leak in Jetty 9 that manifests whenever you hit the call > queue limit in HBase REST. The memory-leak leaks both on-heap and off-heap > objects permanently. It happens because whenever the call queue for Jetty > server overflows, the task that is rejected runs a 'reject' method if it is a > Rejectable to do any cleanup. This clean up is necessary to for example close > the connection, deallocate any buffers, etc. Unfortunately, in Jetty 9, they > implemented the 'reject' / cleanup method of the SelectChannelEndpoint as a > non-blocking call that is not guaranteed to run. This was later fixed in > Jetty 9.4 and later backported however the version of Jetty 9 pulled in HBase > for REST comes before this fix. See > [https://github.com/eclipse/jetty.project/issues/1804] and > [https://github.com/apache/hbase/blob/master/pom.xml#L1416.] > If we want to stay on 9.3.X we could update to > [9.3.22.v20171030|https://mvnrepository.com/artifact/org.eclipse.jetty/jetty-server/9.3.22.v20171030] > which is the latest version of 9.3. Thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365001#comment-16365001 ] Ben Lau commented on HBASE-18282: - [~yuzhih...@gmail.com] I wasn't sure when that sleep can be interrupted (probably never in practice?) so I just made it keep retrying but it probably makes more sense to just assume if an interrupt happens it's intended to stop the thread so I will interrupt and break out as suggested. > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Assignee: Ben Lau >Priority: Critical > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-18282-branch-1.patch, HBASE-18282-branch-2.patch > > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-18282: Attachment: HBASE-18282-branch-2-v2.patch HBASE-18282-branch-1-v2.patch > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Assignee: Ben Lau >Priority: Critical > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-18282-branch-1-v2.patch, > HBASE-18282-branch-1.patch, HBASE-18282-branch-2-v2.patch, > HBASE-18282-branch-2.patch > > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365010#comment-16365010 ] Ben Lau commented on HBASE-18282: - Hi [~yuzhih...@gmail.com], as explained earlier, I believe this bug to not exist in master (due to the relevant code being rewritten to use Java streams – see the base method that was missing a throw). It should only be present on 1.X and 2.X. Let me know if I missed something. > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Assignee: Ben Lau >Priority: Critical > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-18282-branch-1-v2.patch, > HBASE-18282-branch-1.patch, HBASE-18282-branch-2-v2.patch, > HBASE-18282-branch-2.patch > > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)