[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976020#comment-14976020 ] Ben Lau commented on HBASE-14283: - Reran the failures that looked relevant on the various branches, seems the tests are just unstable. When I get the time I'll try to restart the discussion about updating the HFile serialization for more efficient reverse scans in HBASE-14576. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975790#comment-14975790 ] Hudson commented on HBASE-14283: FAILURE: Integrated in HBase-1.2 #309 (See [https://builds.apache.org/job/HBase-1.2/309/]) HBASE-14283 Reverse scan doesnât work with HFile inline index/bloom (apurtell: rev 0fd37a5bffacef56cca593dc22fb9d86c6dd081f) * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975440#comment-14975440 ] Hudson commented on HBASE-14283: FAILURE: Integrated in HBase-1.1 #724 (See [https://builds.apache.org/job/HBase-1.1/724/]) HBASE-14283 Reverse scan doesnât work with HFile inline index/bloom (apurtell: rev 0db04a1705e5e8cc04cc9c010ddfc5612f60cfec) * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975111#comment-14975111 ] Andrew Purtell commented on HBASE-14283: [~benlau] I don't see more outstanding feedback and there's already a +1 here from Anoop. Let me check on some things locally and then commit if it looks good. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975101#comment-14975101 ] Ben Lau commented on HBASE-14283: - Thanks Andrew. Can we merge these patches or is there still something that needs to be done or reviewed? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975059#comment-14975059 ] Andrew Purtell commented on HBASE-14283: Ok, agreeable. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975489#comment-14975489 ] Ben Lau commented on HBASE-14283: - That's odd, something went wrong with the patch for 1.1 branch. That compiled fine for me but perhaps I overlooked something, or the patch became stale. I will take a look later tonight. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975523#comment-14975523 ] Andrew Purtell commented on HBASE-14283: I'll back mine out because the build is broken again (two CellUtil#matchingTimestamps) > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975584#comment-14975584 ] Ben Lau commented on HBASE-14283: - Ok thanks just wanted to confirm the other branches were fine as far as we know, eg we didnt swap 1.1 patch with 0.98 patch and need to fix/update 0.98 branch. Thanks. It looks like there are some test failures but I think they are not related to the patch. I will rerun the tests that fail tonight locally after they appear in this ticket. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975645#comment-14975645 ] Hudson commented on HBASE-14283: FAILURE: Integrated in HBase-0.98 #1168 (See [https://builds.apache.org/job/HBase-0.98/1168/]) HBASE-14283 Reverse scan doesnât work with HFile inline index/bloom (apurtell: rev 9f6d34bef3c1e066606530a58f6e9137d2515df3) * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975709#comment-14975709 ] Hudson commented on HBASE-14283: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1122 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1122/]) HBASE-14283 Reverse scan doesnât work with HFile inline index/bloom (apurtell: rev 9f6d34bef3c1e066606530a58f6e9137d2515df3) * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975668#comment-14975668 ] Hudson commented on HBASE-14283: SUCCESS: Integrated in HBase-1.3-IT #273 (See [https://builds.apache.org/job/HBase-1.3-IT/273/]) HBASE-14283 Reverse scan doesnât work with HFile inline index/bloom (apurtell: rev 288a27405206db38923fc730648d8f49367abf34) * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975751#comment-14975751 ] Hudson commented on HBASE-14283: FAILURE: Integrated in HBase-1.3 #310 (See [https://builds.apache.org/job/HBase-1.3/310/]) HBASE-14283 Reverse scan doesnât work with HFile inline index/bloom (apurtell: rev 288a27405206db38923fc730648d8f49367abf34) * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975520#comment-14975520 ] Enis Soztutar commented on HBASE-14283: --- Sorry, I was trying to commit HBASE-14682, and pushed the addendum in the process without checking here. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1497#comment-1497 ] Hudson commented on HBASE-14283: FAILURE: Integrated in HBase-1.0 #1101 (See [https://builds.apache.org/job/HBase-1.0/1101/]) HBASE-14283 Reverse scan doesnât work with HFile inline index/bloom (apurtell: rev 48a532875e31c15515fa6cb8320993a4efb3cae8) * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java * hbase-common/src/main/java/org/apache/hadoop/hbase/CellComparator.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975649#comment-14975649 ] Hudson commented on HBASE-14283: SUCCESS: Integrated in HBase-1.2-IT #244 (See [https://builds.apache.org/job/HBase-1.2-IT/244/]) HBASE-14283 Reverse scan doesn���t work with HFile inline index/bloom (apurtell: rev 0fd37a5bffacef56cca593dc22fb9d86c6dd081f) * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975738#comment-14975738 ] Hudson commented on HBASE-14283: FAILURE: Integrated in HBase-1.1 #725 (See [https://builds.apache.org/job/HBase-1.1/725/]) HBASE-14283 Reverse scan doesn���t work with HFile inline index/bloom (enis: rev e80b447354f211fc21c2bbc42a0d9bc000835e71) * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java Amend HBASE-14283 Reverse scan doesn���t work with HFile inline (apurtell: rev 9d6e96ee11265cf2dc950619042ef22cf4d9fa00) * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java Revert "Amend HBASE-14283 Reverse scan doesn���t work with HFile inline (apurtell: rev d5a1b276270a1d41f21badd5b85d9502f8f9f415) * hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975748#comment-14975748 ] Anoop Sam John commented on HBASE-14283: Thanks for the perseverance [~benlau].. Hope u will now work on the other Jira to handle it in better way wrt perf impact. Thanks... > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975516#comment-14975516 ] Andrew Purtell commented on HBASE-14283: Not a problem. This compiled for me too so I thought but there's a small missing static helper function in CellUtil. I'll commit it as an addendum now. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975570#comment-14975570 ] Ben Lau commented on HBASE-14283: - [~andrew.purt...@gmail.com] Was the wrong patch applied to 1.1 or am I misunderstanding something? So I looked at https://github.com/apache/hbase/commits/branch-1.1 specifically https://github.com/apache/hbase/commit/0db04a1705e5e8cc04cc9c010ddfc5612f60cfec and it is missing the CellUtil method that I had in my HBASE-14283-branch-1.1.patch attached. It's fine if this is fixed as an addendum but there's nothing else amiss right (i.e. the other branches have the right patches applied?) > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975579#comment-14975579 ] Hudson commented on HBASE-14283: FAILURE: Integrated in HBase-TRUNK #6963 (See [https://builds.apache.org/job/HBase-TRUNK/6963/]) HBASE-14283 Reverse scan doesn���t work with HFile inline index/bloom (apurtell: rev 5c56e239c3af22e1232681cceaed7bd96480ed92) * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975578#comment-14975578 ] Andrew Purtell commented on HBASE-14283: No worries Ben. Not sure how I can help. The other branches got the right patches. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974587#comment-14974587 ] Ben Lau commented on HBASE-14283: - Still there [~andrew.purt...@gmail.com]? Anyone else have comments/questions? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965794#comment-14965794 ] Ben Lau commented on HBASE-14283: - [~andrew.purt...@gmail.com] Is the above explanation agreeable? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959934#comment-14959934 ] Andrew Purtell commented on HBASE-14283: bq. Although this is too bad, it doesn’t really matter I think in practice, because the block of the last key will get cached in the BlockCache the first time we need to know the last key for that halfstore. So repeated calls later in the region server will not incur any overhead. Let me know if this addresses your concerns or not. Well... There are scanning workloads where the data being scanned won't fit into the blockcache and so the user would be calling Scan#setCacheBlocks(boolean) with 'false'. If we are only doing twice as many reads as before per block only for HalfStoreFileReader, then I think this is ok, because HalfStoreFileReader is only used between the time when a split takes place and when compaction on the daughters is complete, references are removed, and the parent is deleted. Please add comments indicating where we are taking on suboptimal behavior. If we will always issue two reads for a block in all cases, then that is really unfortunate and I'd ask that be revisited. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959998#comment-14959998 ] Ben Lau commented on HBASE-14283: - [~andrew.purt...@gmail.com] Let me know if I'm missing something, but I think there is more than 1 scan in play here. I think you're talking about an external hbase client scan. I'm talking about an internal hbase scan opened up by the regionserver and which we know for a fact caches the block. See the implementation of HalfStoreFileReader.getLastKey(), it is creating an internal scanner that does cache. Furthermore, the results of the method aren't cached in the Reader class (eg as a variable) and since the method is called repeatedly in the codebase it seems likely that the author's expectation was that the block cache would work correctly and make an internal cache for the file reader redundant. So not only does this scenario happen only for newly split regions but it only happens for the first time. I can add more comments to the patches if it is really necessary but there is already a comment in the code indicating that this fix is not performant and is meant to be updated by a later ticket whose jira # is listed. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956541#comment-14956541 ] Ben Lau commented on HBASE-14283: - [~lhofhansl] and [~anoop.hbase], I probably should’ve called it out explicitly but yes technically, it would affect forward scans (among other things), but I don’t think in any noticeable way. The patched code is called by HalfStoreFileReader.getLastKey(). This getLastKey() method is needed to figure out the split point of a region which is eg used in splitting a region. Extra IO op there but given that region splits don’t happen frequently nor at a very high rate when they do I think it is fine. The getLastKey() is also used by StoreFile.Reader for passesKeyRangeFilter() check to determine whether a store is applicable to a scan, both forward/reverse. It affects the initial scan creation as well as later next() RPC calls I think. Although this is too bad, it doesn’t really matter I think in practice, because the block of the last key will get cached in the BlockCache the first time we need to know the last key for that halfstore. So repeated calls later in the region server will not incur any overhead. Let me know if this addresses your concerns or not. Incidentally I think that caching is one of the primary reasons why there seems to be a decent # of people using reverse scan but almost no one reporting this bug— because caching often hides it, either completely or nondeterministically (requests succeeding on later retries). We had some problems initially reproducing this bug reliably because if we ran forward scans in a table concurrently with the reverse scans, it would cause the blocks to become cached and so certain key ranges that previously would’ve caused reverse scan to fail suddenly started working just fine. Re: Attaching the same patch, let me know if I'm doing this wrong but it sounds like all I should have to do is just upload the same patch file for master but with a different name and the QA tests will pick it up. I will do that. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956915#comment-14956915 ] Hadoop QA commented on HBASE-14283: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12766505/HBASE-14283-reupload-master.patch against master branch at commit 94bfe909aff9fd74cb1a5d0c3f9209a19704c6cf. ATTACHMENT ID: 12766505 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16003//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16003//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16003//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16003//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16003//console This message is automatically generated. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956263#comment-14956263 ] Anoop Sam John commented on HBASE-14283: No it is for backward scan only. [~benlau] confirm once if am wrong. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956254#comment-14956254 ] Lars Hofhansl commented on HBASE-14283: --- Am I understanding correctly that we already incur two reads now, even when we're scanning forward? If so, that seems unfortunate. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956228#comment-14956228 ] Ben Lau commented on HBASE-14283: - So anything I should change in the patches? How many +1's are needed? Does someone else need to +1? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956240#comment-14956240 ] Anoop Sam John commented on HBASE-14283: Attach the same patch once more for a clean QA run.. Then we can commit it.. Ted can you pls commit it after QA? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950185#comment-14950185 ] Anoop Sam John commented on HBASE-14283: +1 > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948230#comment-14948230 ] Ted Yu commented on HBASE-14283: lgtm QA report picked up hadoop tests. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948217#comment-14948217 ] Hadoop QA commented on HBASE-14283: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12765517/HBASE-14283-master.patch against master branch at commit 7e30436e3fa84525b85b05b9e23cb01b2ada7c12. ATTACHMENT ID: 12765517 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) at
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948049#comment-14948049 ] Ben Lau commented on HBASE-14283: - Hmmm, can't seem to edit Jira comments. Anyways, I attached short term patches to the ticket per discussion for all versions of HBase from 0.98 to master. The patches are mostly the same other than 0.98. There were a couple of utility methods that were missing/private in earlier versions of HBase 1.X that I backported or changed to public to mirror the master version of the patch. (Let me know if that isn't kosher for some reason.) I probably won't have time to update the patches this week but I'll look at feedback and make appropriate changes when I get the chance next week. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, > HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947873#comment-14947873 ] Ben Lau commented on HBASE-14283: - Alright I'll work on that. I created HBASE-14576 for the longer term fix, so this ticket will just be to implement the short term fix we described. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946672#comment-14946672 ] Anoop Sam John commented on HBASE-14283: I would say commit for all applicable branches including Trunk. The better fix wherever we need (trunk , branch-1) pls add a TODO in the patch with Jira# which will fix that. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946197#comment-14946197 ] Ben Lau commented on HBASE-14283: - Hey guys, sorry, I should be able to get back to this soon. Finishing up an unrelated project right now. I didn't know that minor versions in HFiles were also non-backwards compatible. That's one less reason then to make this a major version bump. If anyone has a strong preference for this fix to go into a V3.X I can change the patch to use minor version (eg for header size calculation) when I have time to do it. If not I'll leave it as V4 since it's a little simpler in the code as a major version bump. My original intention btw if it wasn't clear was that this wouldn't be the only change in a V4, just the first change that would go into a V4, whose format/contents is not yet meant to be final even when this patch is committed, i.e. V4 would be essentially a WIP with more changes suggested and implemented in other tickets and eventually released in HBase 2.0. [~anoop.hbase] I'm down for committing a short-term read-the-header-always fix for now and then discussing the longer term solution second. Which branches do you want the patch for? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943239#comment-14943239 ] Anoop Sam John commented on HBASE-14283: We can fix this bug now (in all applicable versions) by reading the header and then data. This 2 reads have perf impact but better no bug. Can we get that in first? For the 2.0 version (branch-1 also?) we can decide on how to add the new data to header (with major version bump or minor bump).. Fixing this bug as such we can give priority and get it done soon? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909362#comment-14909362 ] Andrew Purtell commented on HBASE-14283: When we were designing tags we accepted some limitations of HFile that later were problematic, specifically the issue of file and Cell serialized form bloat produced by adding the tag length, even if universally 0 for the file. We introduced a whole file optimization for this issue but clearly we'd have more opportunities to employ it if we could vary encoding strategy on a block-by-block basis. I thought about introducing an extensible pbufed block header. It didn't make sense for the tag serialization issue - we would trade one type of bloat for another, those additional header bytes will end up in the block cache - but if there are multiple use cases for it lined up, a new extensible pbufed 'block header' could be worthwhile. Would make future block level changes less likely to be incompatible changes too. Does the introduction of something like that require a major or minor version bump? I think so. I'd like to see us be more like semver with HFile versioning, and if we're on the same page about that, this is a major version bump because earlier versioned readers won't be able to handle the change. That is assuming > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909091#comment-14909091 ] Lars Hofhansl commented on HBASE-14283: --- bq. I know that Lars Hofhansl, during hbasecon, was suggesting format changes to group by qualifiers and similar. I've suggested many things over the years :) I think this was about storing keys and values separately in HFile is indexing each with the start position. So if now scan with a filter, we can slog through the filters pretty quickly and only reassemble those Cells that match. Or we can do fast aggregates over the value... Both need significant other plumbing as well. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909071#comment-14909071 ] Matteo Bertozzi commented on HBASE-14283: - I was making the point for a 2/3.x just because, people was already scared by the name v4 for the format. even if there are no real changes to the format itself aside few things. also we had already an "incompatibile" change in a minor version of hfile (the reason the minor version was introduced) and it was more or less the same thing as this one. In that case was adding a field to the header for checksum HBASE-5074. in any case i'm pretty sure we can get the changes in a 1.x too. off by default and people can decide if they care about that or not. more or less the same thing happened for the hfile v2.1 (checksum), v2.2 (protobufs) and v3 (tags). but that's not my call, the next RM for 1.3 will make the call. but you have a +1 for me to change the hfile format in 2.0 > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908964#comment-14908964 ] Andrew Purtell commented on HBASE-14283: We could do a v4 for 2.0. Should run this idea by [~mbertozzi] > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908994#comment-14908994 ] Matteo Bertozzi commented on HBASE-14283: - v4 probably is a bad name.. this is more a v2/3.x since (the original idea) was just a fix in the format to solve a bug. not a complete format redesign. everything should be the same aside adding one field on the level block. Ben was also suggesting a double rolling upgrade to migrate to that, more or less like we did for 3. I'm ok to have it only in 2.0, but to me this can also be done in a 1.3. just switch the name from v4 to 3.x :) if we want to extend the scope of redesigning the file-format, that will be another topic. I know that [~lhofhansl], during hbasecon, was suggesting format changes to group by qualifiers and similar. but that will not be just related to fixing this bug and it will probably be in 2.0 only I guess. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909063#comment-14909063 ] Ben Lau commented on HBASE-14283: - As Matteo mentions, we discussed this issue briefly with him and Stack about a week ago or so. [~mbertozzi] I thought we had asked if updating the major version would be fine and the answer was in the affirmative but I might've misunderstood or misunderstood the degree of affirmation. (IIRC the reason was something along the lines of cluster operators updating HFile.FORMAT_VERSION_KEY in their config being a desirable property of the 2nd rolling upgrade or something.) In any case though the reason we initially were going to do a 3.x but moved to a 4.0 was because (1) major versions (but not in HFiles?) generally denote level of backward compatibility and the new HFiles produced in this patch cannot be read by an HFileV3.X reader (2) the patch requires enough changes to assumptions in the serialization code (eg regarding header size or block cache) that it doesn't seem appropriate as a minor version change and (3) if we are following the rules we've set for ourselves, the changes in HConstants alone (annotated as Stable) mean this patch should be going into HBase 2.0 (admittedly rules can always be bent). Updating only the minor version because the format change currently fixes only 1 bug (as opposed to 10 bugs or adding a new feature) seems to be the wrong way to think of versioning, IMO. If our concern is that we would like a fix for this bug in 1.3 and not wait until 2.0, we could also commit a shorter term fix for 1.3 that just always reads the block header or does an optimistic read and falls back on reading the block header if the read fails from block size expectations (configurable, optimistic off by default). In combination with an expected size correction for index blocks (perhaps not for bloom filter blocks since that fix is a messy addition and also violates some API abstraction layers in StoreFile) it might be fine in most scenarios, especially if the cluster operator is allowed to change the inline block chunking config for the cluster that needs to do reverse scans. Within Yahoo internally we will probably go with a bandaid fix like this for now so that users can use reverse scans and still get 'ok' even if not max performance. (Also sub-100% performance is better than getting exceptions about block sizes ;) ) If people would be okay with dividing it up this way-- short term fix (no HFile changes) for 1.3 and a longer term fix (HFileV4) for HBase 2.0 I could create a separate ticket with the newest patch as a starting point for HFileV4 and submit another patch for this ticket that implements a configurable 'choose your tradeoff' fix as described in the previous paragraph. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906709#comment-14906709 ] Nick Dimiduk commented on HBASE-14283: -- HFileV4 would be a pretty drastic change. I suggest you raise a thread on dev to discuss what other things we might want to add/change with a v4. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906704#comment-14906704 ] Ben Lau commented on HBASE-14283: - Hi guys, I have posted a new patch on review board. See https://reviews.apache.org/r/38720/. The patch adds support for HFileV4 and uses it to fix/optimize reverse scan. The patch is designed to be rolling-restartable in 2 phases, as discussed in an above comment on Sep 14. As currently posted, I think the patch has to go into HBase 2.0 since it changes HConstants which is marked @Stable. It turns out that assumptions about the header size and contents are hardcoded in several places, primarily HConstants.java. Let me know what you guys think of the patch. More eyes would be better because the changes are somewhat farther reaching than they sounded initially and the HFile format has a long history that I’m not as familiar with as most people around here. I also need to reach out to Facebook later for feedback since this change seems like it will affect their external Memcached block cache. I have marked TODO in the patch in several places where I will need to talk with Facebook. Also, since we are adding a new HFileV4, now would be a good time to fix anything else that is broken or add additional metadata for future optimizations that we missed out on in HFileV3. Someone could add more metadata after the patch stands on its own as a complete fix for reverse scans. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906728#comment-14906728 ] Ben Lau commented on HBASE-14283: - Hi Nick, that sounds like a good idea, I will do that. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746484#comment-14746484 ] Ben Lau commented on HBASE-14283: - After talking to some committers in HBase, it seems that unless there is a very strong case / no viable alternative, all new patches to HBase should not require a full cluster restart. Hence, we will be going with the 2-rolling-restart approach as described above. It requires the cluster operator to do 2 rolling restarts and set a new config but that should not be too burdensome for a major upgrade. This rolling-restart-compatible approach is a bit more messy/complicated code-wise so let us look a bit into the best way to do this. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744844#comment-14744844 ] Ben Lau commented on HBASE-14283: - Hey guys, I started looking into updating the HFile serialization to support reverse scans per previous comments. One thing that immediately struck me as being a possible problem is that the header sizes appear to be hardcoded into HConstants.java (HConstants.HFILEBLOCK_HEADER_SIZE), rather than being read from the HFile block header or HFile metadata itself. This seems to imply that if I add more fields to the header and then do a rolling restart to update all region servers to have my code, any old region server that hasn't updated yet and is processing the new HFiles will not realize the header is bigger now and that there is stuff they need to skip / ignore. This might necessitate a 2-step restart process with 2 rolling restarts. Restart 1 to update all RS to have the appropriate new reading code. Restart 2 will enable writes by setting an HBase config option (false by default) to start writing the new HFiles. Am I missing something and this 2-step rolling restart is not necessary for some reason? It seems unlikely people would find this process palatable but is there a better alternative? Alternatively I can turn this into a non-backwards compatible major version update instead of a minor version update and require a full cluster restart but that is kind of harsh in its own way. Opinions/thoughts? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736662#comment-14736662 ] ramkrishna.s.vasudevan commented on HBASE-14283: Just to clarify, the prev block data size is what is missing here. So that is going to be added per block and this information is added to the hfile's metadata? So there is going to be a change in the HFileblock's SerDe format? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737171#comment-14737171 ] Ben Lau commented on HBASE-14283: - Yes we would be changing the serialization for HFileBlock header. It would have a new field for the previous data block size, for block of the same type (same semantics as prevBlockOffset now). Any objections? > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737207#comment-14737207 ] ramkrishna.s.vasudevan commented on HBASE-14283: Am fine. The HFile's metadata has to be used while reading the HFileblock. Can look at the patch once posted. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736047#comment-14736047 ] Anoop Sam John commented on HBASE-14283: Going through the details of this issue and the patch. We have the optimization of reading next block's header along with every block to avoid 2 reads (1st header and get data size and then do second read). This works well for forward read. In case of backward read, where we have to read the prev block, we tried to solve the case by calculating the prev blocks size from offset of 2 blocks (considering no other blocks in btw these 2 data blocks).. As per the bug, this assumption is not true. Good find. If we have the prev block data size also along with prev block offset, we were good. But we dont have that in current HFiles. When we dont know the data size of a block already, we are passing -1 so that it will read block by 2 reads. Seeing the patch it is very complex. And there are additional overhead also. Also it is not complete fix as it can not handle more than 2 level index block. So IMHO it will be better to avoid this kind of complex fix. We can do the fix in 2 steps. 1. Bump the minor version of the HFile and add the prev block size also along with offset. When this info is there in HFile we can safely use that and do read of prev block in one read. 2. For reading old files where this meta data is NOT available, just pass -1 and let it read in 2 steps. It is ok.. Any way once the patch is applied and over the run the older files will get compacted to new file and this will have the additional meta info. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736093#comment-14736093 ] Ben Lau commented on HBASE-14283: - Yep, we talked with Anoop and agree that the patch adds a lot of complexity for a fix that doesn't fix the issue 100%. The portion of the patch that is required to fix the bug for bloom filters is especially long. We thought to aim for a longer term fix later, but based on our discussion with Anoop it sounds like a backwards compatible, complete fix that adds the necessary metadata to HFile should not be too complicated/much work (eg does not involve creating new HFileReader implementation or other infrastructure). We will submit a new patch later with a final fix. We will keep the unit tests from the 1st patch since they are still applicable. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724040#comment-14724040 ] Ben Lau commented on HBASE-14283: - Correct me if I'm wrong but I think I can't do a generalBloomFilter == null check because there are 3 states not 2: (1) Tried to load filter and it exists (2) Tried to load filter and it does not exist (3) Have not tried to load filter yet. If we rely on the generalBloomFilter == null check we can't distinguish between (2) and (3) which means we would end up trying to reload the filter unnecessarily. {quote} Can information from FileInfo be included in the exception message to facilitate debugging ? {quote} What information from FileInfo should be provided? The point of the message (maybe it needs to be revised for clarity) is that we found an unexpected BloomFilter in the HFile-- unexpected because the HFile FileInfo metadata claims there is no bloom filter (type = NONE). I will clarify the msg a bit. I'll fix the other issues, thanks. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724056#comment-14724056 ] Ted Yu commented on HBASE-14283: Can a singleton of type BloomFilter be created so that we can use it to denote case 2 ? Meaning generalBloomFilter is initially null. When we find no bloom filter exists, assign generalBloomFilter the singleton. Thanks > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724069#comment-14724069 ] Ted Yu commented on HBASE-14283: Putting patch on reviewboard would make review easier. There're some typo's in current patch. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724104#comment-14724104 ] Ben Lau commented on HBASE-14283: - The other parts of the code consider BloomFilter to be a nullable type, not just here. I don't know if it makes sense to change that in this patch. It is a bit overkill to use a null object here and seems to increase complexity more than it eliminates currently (new class to eliminate a load flag), since unlike in some common use cases of having a null object, we can't avoid checking for the null object here. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724107#comment-14724107 ] Ben Lau commented on HBASE-14283: - Alright I'll put it on reviewboard, thanks. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724112#comment-14724112 ] Ted Yu commented on HBASE-14283: Alright, you can keep the checking as it is now. > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724357#comment-14724357 ] Ben Lau commented on HBASE-14283: - Reviewboard link with updated version of patch (v3): https://reviews.apache.org/r/37971/ > Reverse scan doesn’t work with HFile inline index/bloom blocks > -- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug >Reporter: Ben Lau >Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, > hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721255#comment-14721255 ] Ted Yu commented on HBASE-14283: {code} +throw new IllegalArgumentException(Data block offset was provided that is actually ++ an index or bloom block offset: + dataBlockOffset1); {code} I understand the meaning of above message. But strictly speaking, offset is just a numeric value. Itself wouldn't indicate whether the block is a data block or index / bloom block. Providing number of index blocks found in the exception message would be helpful: {code} + if (indexBlocksFound 1) { +throw new IllegalStateException(Found more than 1 block of this type between 2 ++ consecutive data blocks: + type); {code} {code} + public BloomFilter getGeneralBloomFilter(BloomType bloomFilterType) throws IOException { +if (generalBloomFilterLoaded) { + return generalBloomFilter; {code} Can generalBloomFilterLoaded be replaced with checking generalBloomFilter not being null ? Similar comment applies to deleteBloomFilterLoaded {code} +if (bloomFilterType == BloomType.NONE) { + throw new IOException(Valid bloom filter type not found in FileInfo); {code} Can information from FileInfo be included in the exception message to facilitate debugging ? Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720960#comment-14720960 ] Hadoop QA commented on HBASE-14283: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12753103/HBASE-14283-v2.patch against master branch at commit 0d06d8ddd0aa9aff7476fb6a7acd6af1d24ba3fc. ATTACHMENT ID: 12753103 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1849 checkstyle errors (more than the master's current 1846 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15322//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15322//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15322//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15322//console This message is automatically generated. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720844#comment-14720844 ] Ben Lau commented on HBASE-14283: - Here's a V2 of the patch that handles bloom filter blocks. It requires some interface changes that blur the line a bit between the StoreFile reader and the HFile reader which is not ideal but there isn't really any other way to fix this currently in a performant way for bloom filters. Let me know what you guys think. I have attached the patch to the ticket for review/feedback. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717095#comment-14717095 ] Ben Lau commented on HBASE-14283: - Thanks ramkrishna. I will take a crack at fixing the calculation in the presence of bloom filters later and see how an interface update looks. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716986#comment-14716986 ] ramkrishna.s.vasudevan commented on HBASE-14283: bq. Is it reasonable to change the HFile.Reader interface so that the HFile reader (instead of the higher level StoreFile reader) I think you can try this if you think that will help in the bloom area. Also it is an private interface so it is fine to change. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715147#comment-14715147 ] Ben Lau commented on HBASE-14283: - Hi Anoop. The problem isn't that we read a previous block and see that the block is not the expected type. prevBlockOffset guarantees that we can seek to the previous block of the same type as the current one. See the comments on HFileBlock.getPrevBlockOffset(). We are always seeking to the previous data block, we are simply not calculating how much to read correctly once we have seeked to that previous data block because our prev data block size calculation can include other blocks because of the layout of scannable section in HFileV2+. We need a way of knowing apriori what the size of the previous data block is. The method you describe is used in HFileReaderImpl.readNextDataBlock(). Note that the reason this method works is because this method can use the method curBlock.getNextBlockOnDiskSizeWithHeader(). We need something similar to that when seeking backwards in order to achieve optimal performance. Let me know if I misunderstood what you meant. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715179#comment-14715179 ] ramkrishna.s.vasudevan commented on HBASE-14283: [~benlau] Let me take a look at this tomorrow morning my time. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713600#comment-14713600 ] Anoop Sam John commented on HBASE-14283: We use below method to get the previous block public HFileBlock readBlock(long dataBlockOffset, long onDiskBlockSize, final boolean cacheBlock, boolean pread, final boolean isCompaction, boolean updateCacheMetrics, BlockType expectedBlockType, DataBlockEncoding expectedDataBlockEncoding) So there no BlockType check and looping? May be it will read a block and see that block is not the expected one and go to next block and check for type. In seek before case instead of going fwd we should be going backward in case the expected block type is matching with the cur block type. That way of solution will work? Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710911#comment-14710911 ] Hadoop QA commented on HBASE-14283: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752147/HBASE-14283.patch against master branch at commit d0873f5a8cc060adbc5a1ae0ed52b84a8942a868. ATTACHMENT ID: 12752147 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(testReverseScanRandomTable+encoding.name())); +return String.format(Failed to match on iteration %s between %s and %s, iteration, Bytes.toStringBinary(a), Bytes.toStringBinary(b)); ++ Bytes.toStringBinary(keys[i]) + ), true, scanner.seekBefore(new KeyValue.KeyOnlyKeyValue(keys[i]))); ++ Bytes.toStringBinary(keys[i]) + ), false, scanner.seekBefore(new KeyValue.KeyOnlyKeyValue(keys[i]))); {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15241//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15241//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15241//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15241//console This message is automatically generated. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711760#comment-14711760 ] Ben Lau commented on HBASE-14283: - Thanks [~ramkrishna.s.vasude...@gmail.com] for showing me the conventions for the patch process. The tests look like they passed, with 1 minor style comment. I can fix that but want to address the bloom filter blocks issue first. Who would be best for me to ask about it, how about you? Is it reasonable to change the HFile.Reader interface so that the HFile reader (instead of the higher level StoreFile reader) is in charge of deserializing and holding the bloom filter data structures? Can't see why not but maybe I'm missing something. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710583#comment-14710583 ] Ben Lau commented on HBASE-14283: - Anyone have any comments on this bug and the fix? Perhaps [~zjushch] (from HBASE-4811 which implemented reverse scan) or [~mikhail] (from HBASE-3857 which implemented HFileV2) or [~liyintang] (from HBASE-4532 which added the bloom filter method(s) I have a question about)? Or maybe one of them can tell me the person(s) who would be best suited to examine the bug and proposed fixes? Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707096#comment-14707096 ] Ben Lau commented on HBASE-14283: - In the patch I also added an extra unit test to TestFromClientside, that also tests reverse scan, but is unrelated to this bug, just as an additional test to strengthen the test suite. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707088#comment-14707088 ] Ben Lau commented on HBASE-14283: - We suspect the person in HBASE-13830 also ran into this bug, based on his similar exception, but can’t know for sure without more information from him. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)