[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14283:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.98.16
   1.1.3
   1.0.3
   1.3.0
   1.2.0
   2.0.0
   Status: Resolved  (was: Patch Available)

Pushed to 0.98 and up.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14283:
--
Attachment: hbase-14283_add.patch

No worries. Just pushed this small addendum to branch-1.1.  

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-14 Thread Ben Lau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14283:

Attachment: HBASE-14283-reupload-master.patch

Attached a new patch for master, same as the previous patch but with 'reupload' 
in the name..

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-07 Thread Ben Lau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14283:

Attachment: HBASE-14283-master.patch
HBASE-14283-branch-1.patch
HBASE-14283-branch-1.2.patch
HBASE-14283-branch-1.1.patch
HBASE-14283-branch-1.0.patch
HBASE-14283-0.98.patch

Short term patches for this bug per discussion.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-28 Thread Ben Lau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14283:

Attachment: HBASE-14283-v2.patch

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
 hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-14283:
---
Attachment: HBASE-14283.patch

Attaching the patch for QA.  
[~benlau]
Suggest to rename the patch based on the JIRA id.  Will take a look at the 
patch ASAP.  Thanks for the patch.  Lets see what the QA says. 

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-14283:
---
Status: Patch Available  (was: Open)

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-21 Thread Ben Lau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14283:

Attachment: hfile-seek-before.patch

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)