[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-27 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976020#comment-14976020
 ] 

Ben Lau commented on HBASE-14283:
-

Reran the failures that looked relevant on the various branches, seems the 
tests are just unstable.  When I get the time I'll try to restart the 
discussion about updating the HFile serialization for more efficient reverse 
scans in HBASE-14576.  

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975790#comment-14975790
 ] 

Hudson commented on HBASE-14283:


FAILURE: Integrated in HBase-1.2 #309 (See 
[https://builds.apache.org/job/HBase-1.2/309/])
HBASE-14283 Reverse scan doesn’t work with HFile inline index/bloom 
(apurtell: rev 0fd37a5bffacef56cca593dc22fb9d86c6dd081f)
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975440#comment-14975440
 ] 

Hudson commented on HBASE-14283:


FAILURE: Integrated in HBase-1.1 #724 (See 
[https://builds.apache.org/job/HBase-1.1/724/])
HBASE-14283 Reverse scan doesn’t work with HFile inline index/bloom 
(apurtell: rev 0db04a1705e5e8cc04cc9c010ddfc5612f60cfec)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975111#comment-14975111
 ] 

Andrew Purtell commented on HBASE-14283:


[~benlau] I don't see more outstanding feedback and there's already a +1 here 
from Anoop. Let me check on some things locally and then commit if it looks 
good.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975101#comment-14975101
 ] 

Ben Lau commented on HBASE-14283:
-

Thanks Andrew.  Can we merge these patches or is there still something that 
needs to be done or reviewed?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975059#comment-14975059
 ] 

Andrew Purtell commented on HBASE-14283:


Ok, agreeable.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975489#comment-14975489
 ] 

Ben Lau commented on HBASE-14283:
-

That's odd, something went wrong with the patch for 1.1 branch.  That compiled 
fine for me but perhaps I overlooked something, or the patch became stale.  I 
will take a look later tonight.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975523#comment-14975523
 ] 

Andrew Purtell commented on HBASE-14283:


I'll back mine out because the build is broken again (two 
CellUtil#matchingTimestamps) 


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975584#comment-14975584
 ] 

Ben Lau commented on HBASE-14283:
-

Ok thanks just wanted to confirm the other branches were fine as far as we 
know, eg we didnt swap 1.1 patch with 0.98 patch and need to fix/update 0.98 
branch.  Thanks.  It looks like there are some test failures but I think they 
are not related to the patch.  I will rerun the tests that fail tonight locally 
after they appear in this ticket.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975645#comment-14975645
 ] 

Hudson commented on HBASE-14283:


FAILURE: Integrated in HBase-0.98 #1168 (See 
[https://builds.apache.org/job/HBase-0.98/1168/])
HBASE-14283 Reverse scan doesn’t work with HFile inline index/bloom 
(apurtell: rev 9f6d34bef3c1e066606530a58f6e9137d2515df3)
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975709#comment-14975709
 ] 

Hudson commented on HBASE-14283:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1122 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1122/])
HBASE-14283 Reverse scan doesn’t work with HFile inline index/bloom 
(apurtell: rev 9f6d34bef3c1e066606530a58f6e9137d2515df3)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975668#comment-14975668
 ] 

Hudson commented on HBASE-14283:


SUCCESS: Integrated in HBase-1.3-IT #273 (See 
[https://builds.apache.org/job/HBase-1.3-IT/273/])
HBASE-14283 Reverse scan doesn’t work with HFile inline index/bloom 
(apurtell: rev 288a27405206db38923fc730648d8f49367abf34)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975751#comment-14975751
 ] 

Hudson commented on HBASE-14283:


FAILURE: Integrated in HBase-1.3 #310 (See 
[https://builds.apache.org/job/HBase-1.3/310/])
HBASE-14283 Reverse scan doesn’t work with HFile inline index/bloom 
(apurtell: rev 288a27405206db38923fc730648d8f49367abf34)
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975520#comment-14975520
 ] 

Enis Soztutar commented on HBASE-14283:
---

Sorry, I was trying to commit HBASE-14682, and pushed the addendum in the 
process without checking here. 

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1497#comment-1497
 ] 

Hudson commented on HBASE-14283:


FAILURE: Integrated in HBase-1.0 #1101 (See 
[https://builds.apache.org/job/HBase-1.0/1101/])
HBASE-14283 Reverse scan doesn’t work with HFile inline index/bloom 
(apurtell: rev 48a532875e31c15515fa6cb8320993a4efb3cae8)
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellComparator.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975649#comment-14975649
 ] 

Hudson commented on HBASE-14283:


SUCCESS: Integrated in HBase-1.2-IT #244 (See 
[https://builds.apache.org/job/HBase-1.2-IT/244/])
HBASE-14283 Reverse scan doesn���t work with HFile inline index/bloom 
(apurtell: rev 0fd37a5bffacef56cca593dc22fb9d86c6dd081f)
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975738#comment-14975738
 ] 

Hudson commented on HBASE-14283:


FAILURE: Integrated in HBase-1.1 #725 (See 
[https://builds.apache.org/job/HBase-1.1/725/])
HBASE-14283 Reverse scan doesn���t work with HFile inline index/bloom (enis: 
rev e80b447354f211fc21c2bbc42a0d9bc000835e71)
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
Amend HBASE-14283 Reverse scan doesn���t work with HFile inline (apurtell: rev 
9d6e96ee11265cf2dc950619042ef22cf4d9fa00)
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
Revert "Amend HBASE-14283 Reverse scan doesn���t work with HFile inline 
(apurtell: rev d5a1b276270a1d41f21badd5b85d9502f8f9f415)
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975748#comment-14975748
 ] 

Anoop Sam John commented on HBASE-14283:


Thanks for the perseverance [~benlau].. Hope u will now work on the other Jira 
to handle it in better way wrt perf impact.  Thanks...

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975516#comment-14975516
 ] 

Andrew Purtell commented on HBASE-14283:


Not a problem. This compiled for me too so I thought but there's a small 
missing static helper function in CellUtil. I'll commit it as an addendum now.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975570#comment-14975570
 ] 

Ben Lau commented on HBASE-14283:
-

[~andrew.purt...@gmail.com] Was the wrong patch applied to 1.1 or am I 
misunderstanding something?  So I looked at 
https://github.com/apache/hbase/commits/branch-1.1 specifically 
https://github.com/apache/hbase/commit/0db04a1705e5e8cc04cc9c010ddfc5612f60cfec 
and it is missing the CellUtil method that I had in my 
HBASE-14283-branch-1.1.patch attached.  It's fine if this is fixed as an 
addendum but there's nothing else amiss right (i.e. the other branches have the 
right patches applied?)

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975579#comment-14975579
 ] 

Hudson commented on HBASE-14283:


FAILURE: Integrated in HBase-TRUNK #6963 (See 
[https://builds.apache.org/job/HBase-TRUNK/6963/])
HBASE-14283 Reverse scan doesn���t work with HFile inline index/bloom 
(apurtell: rev 5c56e239c3af22e1232681cceaed7bd96480ed92)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekBeforeWithInlineBlocks.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975578#comment-14975578
 ] 

Andrew Purtell commented on HBASE-14283:


No worries Ben. Not sure how I can help. The other branches got the right 
patches. 

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974587#comment-14974587
 ] 

Ben Lau commented on HBASE-14283:
-

Still there [~andrew.purt...@gmail.com]?  Anyone else have comments/questions?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-20 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965794#comment-14965794
 ] 

Ben Lau commented on HBASE-14283:
-

[~andrew.purt...@gmail.com] Is the above explanation agreeable?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-15 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959934#comment-14959934
 ] 

Andrew Purtell commented on HBASE-14283:


bq.  Although this is too bad, it doesn’t really matter I think in practice, 
because the block of the last key will get cached in the BlockCache the first 
time we need to know the last key for that halfstore.  So repeated calls later 
in the region server will not incur any overhead.  Let me know if this 
addresses your concerns or not.

Well... There are scanning workloads where the data being scanned won't fit 
into the blockcache and so the user would be calling 
Scan#setCacheBlocks(boolean) with 'false'.

If we are only doing twice as many reads as before per block only for 
HalfStoreFileReader, then I think this is ok, because HalfStoreFileReader is 
only used between the time when a split takes place and when compaction on the 
daughters is complete, references are removed, and the parent is deleted. 
Please add comments indicating where we are taking on suboptimal behavior. 

If we will always issue two reads for a block in all cases, then that is really 
unfortunate and I'd ask that be revisited.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-15 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959998#comment-14959998
 ] 

Ben Lau commented on HBASE-14283:
-

[~andrew.purt...@gmail.com] Let me know if I'm missing something, but I think 
there is more than 1 scan in play here.  I think you're talking about an 
external hbase client scan.  I'm talking about an internal hbase scan opened up 
by the regionserver and which we know for a fact caches the block.  See the 
implementation of HalfStoreFileReader.getLastKey(), it is creating an internal 
scanner that does cache.  Furthermore, the results of the method aren't cached 
in the Reader class (eg as a variable) and since the method is called 
repeatedly in the codebase it seems likely that the author's expectation was 
that the block cache would work correctly and make an internal cache for the 
file reader redundant.  So not only does this scenario happen only for newly 
split regions but it only happens for the first time.  I can add more comments 
to the patches if it is really necessary but there is already a comment in the 
code indicating that this fix is not performant and is meant to be updated by a 
later ticket whose jira # is listed.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-14 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956541#comment-14956541
 ] 

Ben Lau commented on HBASE-14283:
-

[~lhofhansl] and [~anoop.hbase], I probably should’ve called it out explicitly 
but yes technically, it would affect forward scans (among other things), but I 
don’t think in any noticeable way.  The patched code is called by 
HalfStoreFileReader.getLastKey().  

This getLastKey() method is needed to figure out the split point of a region 
which is eg used in splitting a region.  Extra IO op there but given that 
region splits don’t happen frequently nor at a very high rate when they do I 
think it is fine.  The getLastKey() is also used by StoreFile.Reader for 
passesKeyRangeFilter() check to determine whether a store is applicable to a 
scan, both forward/reverse.  It affects the initial scan creation as well as 
later next() RPC calls I think.  Although this is too bad, it doesn’t really 
matter I think in practice, because the block of the last key will get cached 
in the BlockCache the first time we need to know the last key for that 
halfstore.  So repeated calls later in the region server will not incur any 
overhead.  Let me know if this addresses your concerns or not.

Incidentally I think that caching is one of the primary reasons why there seems 
to be a decent # of people using reverse scan but almost no one reporting this 
bug— because caching often hides it, either completely or nondeterministically 
(requests succeeding on later retries).  We had some problems initially 
reproducing this bug reliably because if we ran forward scans in a table 
concurrently with the reverse scans, it would cause the blocks to become cached 
and so certain key ranges that previously would’ve caused reverse scan to fail 
suddenly started working just fine.

Re: Attaching the same patch, let me know if I'm doing this wrong but it sounds 
like all I should have to do is just upload the same patch file for master but 
with a different name and the QA tests will pick it up.  I will do that.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956915#comment-14956915
 ] 

Hadoop QA commented on HBASE-14283:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12766505/HBASE-14283-reupload-master.patch
  against master branch at commit 94bfe909aff9fd74cb1a5d0c3f9209a19704c6cf.
  ATTACHMENT ID: 12766505

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16003//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16003//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16003//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16003//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16003//console

This message is automatically generated.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for 

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-13 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956263#comment-14956263
 ] 

Anoop Sam John commented on HBASE-14283:


No it is for backward scan only.  [~benlau]  confirm once if am wrong.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956254#comment-14956254
 ] 

Lars Hofhansl commented on HBASE-14283:
---

Am I understanding correctly that we already incur two reads now, even when 
we're scanning forward? If so, that seems unfortunate.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-13 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956228#comment-14956228
 ] 

Ben Lau commented on HBASE-14283:
-

So anything I should change in the patches?  How many +1's are needed?  Does 
someone else need to +1?  

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-13 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956240#comment-14956240
 ] 

Anoop Sam John commented on HBASE-14283:


Attach the same patch once more for a clean QA run.. Then we can commit it..  
Ted can you pls commit it after QA? 

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-09 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950185#comment-14950185
 ] 

Anoop Sam John commented on HBASE-14283:


+1

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948230#comment-14948230
 ] 

Ted Yu commented on HBASE-14283:


lgtm

QA report picked up hadoop tests.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948217#comment-14948217
 ] 

Hadoop QA commented on HBASE-14283:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12765517/HBASE-14283-master.patch
  against master branch at commit 7e30436e3fa84525b85b05b9e23cb01b2ada7c12.
  ATTACHMENT ID: 12765517

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:164)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
at 

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-07 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948049#comment-14948049
 ] 

Ben Lau commented on HBASE-14283:
-

Hmmm, can't seem to edit Jira comments.  Anyways, I attached short term patches 
to the ticket per discussion for all versions of HBase from 0.98 to master.  
The patches are mostly the same other than 0.98.  There were a couple of 
utility methods that were missing/private in earlier versions of HBase 1.X that 
I backported or changed to public to mirror the master version of the patch.  
(Let me know if that isn't kosher for some reason.)  I probably won't have time 
to update the patches this week but I'll look at feedback and make appropriate 
changes when I get the chance next week.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-07 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947873#comment-14947873
 ] 

Ben Lau commented on HBASE-14283:
-

Alright I'll work on that.  I created HBASE-14576 for the longer term fix, so 
this ticket will just be to implement the short term fix we described.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-07 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946672#comment-14946672
 ] 

Anoop Sam John commented on HBASE-14283:


I would say commit for all applicable branches including Trunk.  The better fix 
wherever we need (trunk , branch-1) pls add a TODO in the patch with Jira# 
which will fix that.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-06 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946197#comment-14946197
 ] 

Ben Lau commented on HBASE-14283:
-

Hey guys, sorry, I should be able to get back to this soon.  Finishing up an 
unrelated project right now.  I didn't know that minor versions in HFiles were 
also non-backwards compatible.  That's one less reason then to make this a 
major version bump.  If anyone has a strong preference for this fix to go into 
a V3.X I can change the patch to use minor version (eg for header size 
calculation) when I have time to do it.  If not I'll leave it as V4 since it's 
a little simpler in the code as a major version bump.  My original intention 
btw if it wasn't clear was that this wouldn't be the only change in a V4, just 
the first change that would go into a V4, whose format/contents is not yet 
meant to be final even when this patch is committed, i.e. V4 would be 
essentially a WIP with more changes suggested and implemented in other tickets 
and eventually released in HBase 2.0.

[~anoop.hbase] I'm down for committing a short-term read-the-header-always fix 
for now and then discussing the longer term solution second.  Which branches do 
you want the patch for?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-05 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943239#comment-14943239
 ] 

Anoop Sam John commented on HBASE-14283:


We can fix this bug now (in all applicable versions) by reading the header and 
then data. This 2 reads have perf impact but better no bug. Can we get that in 
first?  For the 2.0 version (branch-1 also?)  we can decide on how to add the 
new data to header (with major version bump or minor bump)..  Fixing this bug 
as such we can give priority and get it done soon?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909362#comment-14909362
 ] 

Andrew Purtell commented on HBASE-14283:


When we were designing tags we accepted some limitations of HFile that later 
were problematic, specifically the issue of file and Cell serialized form bloat 
produced by adding the tag length, even if universally 0 for the file. We 
introduced a whole file optimization for this issue but clearly we'd have more 
opportunities to employ it if we could vary encoding strategy on a 
block-by-block basis. I thought about introducing an extensible pbufed block 
header. It didn't make sense for the tag serialization issue - we would trade 
one type of bloat for another, those additional header bytes will end up in the 
block cache - but if there are multiple use cases for it lined up, a new 
extensible pbufed 'block header' could be worthwhile. Would make future block 
level changes less likely to be incompatible changes too. 

Does the introduction of something like that require a major or minor version 
bump? I think so. I'd like to see us be more like semver with HFile versioning, 
and if we're on the same page about that, this is a major version bump because 
earlier versioned readers won't be able to handle the change. That is assuming 

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909091#comment-14909091
 ] 

Lars Hofhansl commented on HBASE-14283:
---

bq. I know that Lars Hofhansl, during hbasecon, was suggesting format changes 
to group by qualifiers and similar.

I've suggested many things over the years :)
I think this was about storing keys and values separately in HFile is indexing 
each with the start position. So if now scan with a filter, we can slog through 
the filters pretty quickly and only reassemble those Cells that match. Or we 
can do fast aggregates over the value... Both need significant other plumbing 
as well.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-25 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909071#comment-14909071
 ] 

Matteo Bertozzi commented on HBASE-14283:
-

I was making the point for a 2/3.x just because, people was already scared by 
the name v4 for the format. even if there are no real changes to the format 
itself aside few things. also we had already an "incompatibile" change in a 
minor version of hfile (the reason the minor version was introduced) and it was 
more or less the same thing as this one. In that case was adding a field to the 
header for checksum HBASE-5074.

in any case i'm pretty sure we can get the changes in a 1.x too. off by default 
and people can decide if they care about that  or not. more or less the same 
thing happened for the hfile v2.1 (checksum), v2.2 (protobufs) and v3 (tags). 
but that's not my call, the next RM for 1.3 will make the call. but you have a 
+1 for me to change the hfile format in 2.0


> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908964#comment-14908964
 ] 

Andrew Purtell commented on HBASE-14283:


We could do a v4 for 2.0. Should run this idea by [~mbertozzi] 

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-25 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908994#comment-14908994
 ] 

Matteo Bertozzi commented on HBASE-14283:
-

v4 probably is a bad name.. 
this is more a v2/3.x since (the original idea) was just a fix in the format to 
solve a bug.
not a complete format redesign. everything should be the same aside adding one 
field on the level block. 
Ben was also suggesting a double rolling upgrade to migrate to that, more or 
less like we did for 3.
I'm ok to have it only in 2.0, but to me this can also be done in a 1.3. just 
switch the name from v4 to 3.x :)

if we want to extend the scope of redesigning the file-format, that will be 
another topic.
I know that [~lhofhansl], during hbasecon, was suggesting format changes to 
group by qualifiers and similar.
but that will not be just related to fixing this bug and it will probably be in 
2.0 only I guess.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-25 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909063#comment-14909063
 ] 

Ben Lau commented on HBASE-14283:
-

As Matteo mentions, we discussed this issue briefly with him and Stack about a 
week ago or so.  [~mbertozzi] I thought we had asked if updating the major 
version would be fine and the answer was in the affirmative but I might've 
misunderstood or misunderstood the degree of affirmation.  (IIRC the reason was 
something along the lines of cluster operators updating 
HFile.FORMAT_VERSION_KEY in their config being a desirable property of the 2nd 
rolling upgrade or something.)

In any case though the reason we initially were going to do a 3.x but moved to 
a 4.0 was because (1) major versions (but not in HFiles?) generally denote 
level of backward compatibility and the new HFiles produced in this patch 
cannot be read by an HFileV3.X reader (2) the patch requires enough changes to 
assumptions in the serialization code (eg regarding header size or block cache) 
that it doesn't seem appropriate as a minor version change and (3) if we are 
following the rules we've set for ourselves, the changes in HConstants alone 
(annotated as Stable) mean this patch should be going into HBase 2.0 
(admittedly rules can always be bent).

Updating only the minor version because the format change currently fixes only 
1 bug (as opposed to 10 bugs or adding a new feature) seems to be the wrong way 
to think of versioning, IMO.  If our concern is that we would like a fix for 
this bug in 1.3 and not wait until 2.0, we could also commit a shorter term fix 
for 1.3 that just always reads the block header or does an optimistic read and 
falls back on reading the block header if the read fails from block size 
expectations (configurable, optimistic off by default).  In combination with an 
expected size correction for index blocks (perhaps not for bloom filter blocks 
since that fix is a messy addition and also violates some API abstraction 
layers in StoreFile) it might be fine in most scenarios, especially if the 
cluster operator is allowed to change the inline block chunking config for the 
cluster that needs to do reverse scans.  Within Yahoo internally we will 
probably go with a bandaid fix like this for now so that users can use reverse 
scans and still get 'ok' even if not max performance.  (Also sub-100% 
performance is better than getting exceptions about block sizes ;) )

If people would be okay with dividing it up this way-- short term fix (no HFile 
changes) for 1.3 and a longer term fix (HFileV4) for HBase 2.0 I could create a 
separate ticket with the newest patch as a starting point for HFileV4 and 
submit another patch for this ticket that implements a configurable 'choose 
your tradeoff' fix as described in the previous paragraph.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with 

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-24 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906709#comment-14906709
 ] 

Nick Dimiduk commented on HBASE-14283:
--

HFileV4 would be a pretty drastic change. I suggest you raise a thread on dev 
to discuss what other things we might want to add/change with a v4.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-24 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906704#comment-14906704
 ] 

Ben Lau commented on HBASE-14283:
-

Hi guys, I have posted a new patch on review board.  See 
https://reviews.apache.org/r/38720/.  The patch adds support for HFileV4 and 
uses it to fix/optimize reverse scan.  The patch is designed to be 
rolling-restartable in 2 phases, as discussed in an above comment on Sep 14.

As currently posted, I think the patch has to go into HBase 2.0 since it 
changes HConstants which is marked @Stable.  It turns out that assumptions 
about the header size and contents are hardcoded in several places, primarily 
HConstants.java.

Let me know what you guys think of the patch.  More eyes would be better 
because the changes are somewhat farther reaching than they sounded initially 
and the HFile format has a long history that I’m not as familiar with as most 
people around here.

I also need to reach out to Facebook later for feedback since this change seems 
like it will affect their external Memcached block cache.  I have marked TODO 
in the patch in several places where I will need to talk with Facebook.

Also, since we are adding a new HFileV4, now would be a good time to fix 
anything else that is broken or add additional metadata for future 
optimizations that we missed out on in HFileV3.  Someone could add more 
metadata after the patch stands on its own as a complete fix for reverse scans. 
 

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-24 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906728#comment-14906728
 ] 

Ben Lau commented on HBASE-14283:
-

Hi Nick, that sounds like a good idea, I will do that.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-15 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746484#comment-14746484
 ] 

Ben Lau commented on HBASE-14283:
-

After talking to some committers in HBase, it seems that unless there is a very 
strong case / no viable alternative, all new patches to HBase should not 
require a full cluster restart.  Hence, we will be going with the 
2-rolling-restart approach as described above.  It requires the cluster 
operator to do 2 rolling restarts and set a new config but that should not be 
too burdensome for a major upgrade.  This rolling-restart-compatible approach 
is a bit more messy/complicated code-wise so let us look a bit into the best 
way to do this.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-14 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744844#comment-14744844
 ] 

Ben Lau commented on HBASE-14283:
-

Hey guys, I started looking into updating the HFile serialization to support 
reverse scans per previous comments.  One thing that immediately struck me as 
being a possible problem is that the header sizes appear to be hardcoded into 
HConstants.java (HConstants.HFILEBLOCK_HEADER_SIZE), rather than being read 
from the HFile block header or HFile metadata itself.  

This seems to imply that if I add more fields to the header and then do a 
rolling restart to update all region servers to have my code, any old region 
server that hasn't updated yet and is processing the new HFiles will not 
realize the header is bigger now and that there is stuff they need to skip / 
ignore.  This might necessitate a 2-step restart process with 2 rolling 
restarts.  

Restart 1 to update all RS to have the appropriate new reading code.  Restart 2 
will enable writes by setting an HBase config option (false by default) to 
start writing the new HFiles.  Am I missing something and this 2-step rolling 
restart is not necessary for some reason?  It seems unlikely people would find 
this process palatable but is there a better alternative?  

Alternatively I can turn this into a non-backwards compatible major version 
update instead of a minor version update and require a full cluster restart but 
that is kind of harsh in its own way.  Opinions/thoughts?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736662#comment-14736662
 ] 

ramkrishna.s.vasudevan commented on HBASE-14283:


Just to clarify, the prev block data size is what is missing here.  So that is 
going to be added per block and this information is added to the hfile's 
metadata?  So there is going to be a change in the HFileblock's SerDe format?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-09 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737171#comment-14737171
 ] 

Ben Lau commented on HBASE-14283:
-

Yes we would be changing the serialization for HFileBlock header.  It would 
have a new field for the previous data block size, for block of the same type 
(same semantics as prevBlockOffset now).  Any objections?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737207#comment-14737207
 ] 

ramkrishna.s.vasudevan commented on HBASE-14283:


Am fine. The HFile's metadata has to be used while reading the HFileblock. Can 
look at the patch once posted.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736047#comment-14736047
 ] 

Anoop Sam John commented on HBASE-14283:


Going through the details of this issue and the patch. 
We have the optimization of reading next block's header along with every block 
to avoid 2 reads (1st header and get data size and then do second read). This 
works well for forward read.  In case of backward read, where we have to read 
the prev block, we tried to solve the case by calculating the prev blocks size 
from offset of 2 blocks (considering no other blocks in btw these 2 data 
blocks)..  As per the bug, this assumption is not true. Good find.  If we have 
the prev block data size also along with prev block offset, we were good.  But 
we dont have that in current HFiles.  When we dont know the data size of a 
block already, we are passing -1 so that it will read block by 2 reads.
Seeing the patch it is very complex. And there are additional overhead also.  
Also it is not complete fix as it can not handle more than 2 level index block. 
 So IMHO it will be better to avoid this kind of complex fix.  
We can do the fix in 2 steps.
1. Bump the minor version of the HFile and add the prev block size also along 
with offset.  When this info is there in HFile we can safely use that and do 
read of prev block in one read.
2. For reading old files where this meta data is NOT available, just pass -1 
and let it read in 2 steps. It is ok..  Any way once the patch is applied and 
over the run the older files will get compacted to new file and this will have 
the additional meta info.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-08 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736093#comment-14736093
 ] 

Ben Lau commented on HBASE-14283:
-

Yep, we talked with Anoop and agree that the patch adds a lot of complexity for 
a fix that doesn't fix the issue 100%.  The portion of the patch that is 
required to fix the bug for bloom filters is especially long.  We thought to 
aim for a longer term fix later, but based on our discussion with Anoop it 
sounds like a backwards compatible, complete fix that adds the necessary 
metadata to HFile should not be too complicated/much work (eg does not involve 
creating new HFileReader implementation or other infrastructure).  We will 
submit a new patch later with a final fix.  We will keep the unit tests from 
the 1st patch since they are still applicable.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724040#comment-14724040
 ] 

Ben Lau commented on HBASE-14283:
-

Correct me if I'm wrong but I think I can't do a generalBloomFilter == null 
check because there are 3 states not 2: (1) Tried to load filter and it exists 
(2) Tried to load filter and it does not exist (3) Have not tried to load 
filter yet.  If we rely on the generalBloomFilter == null check we can't 
distinguish between (2) and (3) which means we would end up trying to reload 
the filter unnecessarily.

{quote}
Can information from FileInfo be included in the exception message to 
facilitate debugging ?
{quote}

What information from FileInfo should be provided?  The point of the message 
(maybe it needs to be revised for clarity) is that we found an unexpected 
BloomFilter in the HFile-- unexpected because the HFile FileInfo metadata 
claims there is no bloom filter (type = NONE).  I will clarify the msg a bit.  

I'll fix the other issues, thanks.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724056#comment-14724056
 ] 

Ted Yu commented on HBASE-14283:


Can a singleton of type BloomFilter be created so that we can use it to denote 
case 2 ?
Meaning generalBloomFilter is initially null. When we find no bloom filter 
exists, assign generalBloomFilter the singleton.

Thanks

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724069#comment-14724069
 ] 

Ted Yu commented on HBASE-14283:


Putting patch on reviewboard would make review easier.

There're some typo's in current patch.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724104#comment-14724104
 ] 

Ben Lau commented on HBASE-14283:
-

The other parts of the code consider BloomFilter to be a nullable type, not 
just here.  I don't know if it makes sense to change that in this patch.  It is 
a bit overkill to use a null object here and seems to increase complexity more 
than it eliminates currently (new class to eliminate a load flag), since unlike 
in some common use cases of having a null object, we can't avoid checking for 
the null object here.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724107#comment-14724107
 ] 

Ben Lau commented on HBASE-14283:
-

Alright I'll put it on reviewboard, thanks.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724112#comment-14724112
 ] 

Ted Yu commented on HBASE-14283:


Alright, you can keep the checking as it is now.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724357#comment-14724357
 ] 

Ben Lau commented on HBASE-14283:
-

Reviewboard link with updated version of patch (v3): 
https://reviews.apache.org/r/37971/

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721255#comment-14721255
 ] 

Ted Yu commented on HBASE-14283:


{code}
+throw new IllegalArgumentException(Data block offset was provided 
that is actually 
++ an index or bloom block offset:  + dataBlockOffset1);
{code}
I understand the meaning of above message. But strictly speaking, offset is 
just a numeric value. Itself wouldn't indicate whether the block is a data 
block or index / bloom block.

Providing number of index blocks found in the exception message would be 
helpful:
{code}
+  if (indexBlocksFound  1) {
+throw new IllegalStateException(Found more than 1 block of this type 
between 2 
++ consecutive data blocks:  + type);
{code}
{code}
+  public BloomFilter getGeneralBloomFilter(BloomType bloomFilterType) throws 
IOException {
+if (generalBloomFilterLoaded) {
+  return generalBloomFilter;
{code}
Can generalBloomFilterLoaded be replaced with checking generalBloomFilter not 
being null ? Similar comment applies to deleteBloomFilterLoaded
{code}
+if (bloomFilterType == BloomType.NONE) {
+  throw new IOException(Valid bloom filter type not found in FileInfo);
{code}
Can information from FileInfo be included in the exception message to 
facilitate debugging ?




 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
 hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720960#comment-14720960
 ] 

Hadoop QA commented on HBASE-14283:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753103/HBASE-14283-v2.patch
  against master branch at commit 0d06d8ddd0aa9aff7476fb6a7acd6af1d24ba3fc.
  ATTACHMENT ID: 12753103

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1849 checkstyle errors (more than the master's current 1846 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15322//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15322//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15322//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15322//console

This message is automatically generated.

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
 hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-28 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720844#comment-14720844
 ] 

Ben Lau commented on HBASE-14283:
-

Here's a V2 of the patch that handles bloom filter blocks.  It requires some 
interface changes that blur the line a bit between the StoreFile reader and the 
HFile reader which is not ideal but there isn't really any other way to fix 
this currently in a performant way for bloom filters.  Let me know what you 
guys think.  I have attached the patch to the ticket for review/feedback.  

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-27 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717095#comment-14717095
 ] 

Ben Lau commented on HBASE-14283:
-

Thanks ramkrishna.  I will take a crack at fixing the calculation in the 
presence of bloom filters later and see how an interface update looks.  

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-27 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716986#comment-14716986
 ] 

ramkrishna.s.vasudevan commented on HBASE-14283:


bq. Is it reasonable to change the HFile.Reader interface so that the HFile 
reader (instead of the higher level StoreFile reader)
I think you can try this if you think that will help in the bloom area.  Also 
it is an private interface so it is fine to change. 

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-26 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715147#comment-14715147
 ] 

Ben Lau commented on HBASE-14283:
-

Hi Anoop.  The problem isn't that we read a previous block and see that the 
block is not the expected type.  prevBlockOffset guarantees that we can seek to 
the previous block of the same type as the current one.  See the comments on 
HFileBlock.getPrevBlockOffset().  We are always seeking to the previous data 
block, we are simply not calculating how much to read correctly once we have 
seeked to that previous data block because our prev data block size calculation 
can include other blocks because of the layout of scannable section in 
HFileV2+.  We need a way of knowing apriori what the size of the previous data 
block is.  The method you describe is used in 
HFileReaderImpl.readNextDataBlock().  Note that the reason this method works is 
because this method can use the method 
curBlock.getNextBlockOnDiskSizeWithHeader().  We need something similar to that 
when seeking backwards in order to achieve optimal performance.  Let me know if 
I misunderstood what you meant.  

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-26 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715179#comment-14715179
 ] 

ramkrishna.s.vasudevan commented on HBASE-14283:


[~benlau]
Let me take a look at this tomorrow morning my time. 

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-26 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713600#comment-14713600
 ] 

Anoop Sam John commented on HBASE-14283:


We use below method to get the previous block
public HFileBlock readBlock(long dataBlockOffset, long onDiskBlockSize,
  final boolean cacheBlock, boolean pread, final boolean isCompaction,
  boolean updateCacheMetrics, BlockType expectedBlockType,
  DataBlockEncoding expectedDataBlockEncoding)

So there no BlockType  check and looping?  May be it will read a block and see 
that block is not the expected one and go to next block and check for type. In 
seek before case instead of going fwd we should be going backward in case the 
expected block type is matching with the cur block type. That way of solution 
will work?

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710911#comment-14710911
 ] 

Hadoop QA commented on HBASE-14283:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12752147/HBASE-14283.patch
  against master branch at commit d0873f5a8cc060adbc5a1ae0ed52b84a8942a868.
  ATTACHMENT ID: 12752147

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  HTableDescriptor htd = new 
HTableDescriptor(TableName.valueOf(testReverseScanRandomTable+encoding.name()));
+return String.format(Failed to match on iteration %s between %s and %s, 
iteration, Bytes.toStringBinary(a), Bytes.toStringBinary(b));
++ Bytes.toStringBinary(keys[i]) + ), true, scanner.seekBefore(new 
KeyValue.KeyOnlyKeyValue(keys[i])));
++ Bytes.toStringBinary(keys[i]) + ), false, scanner.seekBefore(new 
KeyValue.KeyOnlyKeyValue(keys[i])));

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15241//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15241//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15241//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15241//console

This message is automatically generated.

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index 

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-25 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711760#comment-14711760
 ] 

Ben Lau commented on HBASE-14283:
-

Thanks [~ramkrishna.s.vasude...@gmail.com] for showing me the conventions for 
the patch process.  The tests look like they passed, with 1 minor style 
comment.  I can fix that but want to address the bloom filter blocks issue 
first.  Who would be best for me to ask about it, how about you?  Is it 
reasonable to change the HFile.Reader interface so that the HFile reader 
(instead of the higher level StoreFile reader) is in charge of deserializing 
and holding the bloom filter data structures?  Can't see why not but maybe I'm 
missing something.

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-24 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710583#comment-14710583
 ] 

Ben Lau commented on HBASE-14283:
-

Anyone have any comments on this bug and the fix?  Perhaps [~zjushch] (from 
HBASE-4811 which implemented reverse scan) or [~mikhail] (from HBASE-3857 which 
implemented HFileV2) or [~liyintang] (from HBASE-4532 which added the bloom 
filter method(s) I have a question about)?  Or maybe one of them can tell me 
the person(s) who would be best suited to examine the bug and proposed fixes?

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-21 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707096#comment-14707096
 ] 

Ben Lau commented on HBASE-14283:
-

In the patch I also added an extra unit test to TestFromClientside, that also 
tests reverse scan, but is unrelated to this bug, just as an additional test to 
strengthen the test suite.  

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-21 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707088#comment-14707088
 ] 

Ben Lau commented on HBASE-14283:
-

We suspect the person in HBASE-13830 also ran into this bug, based on his 
similar exception, but can’t know for sure without more information from him.

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)