from:"Ben Lau \(JIRA\)"

[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-30 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609514#comment-14609514
 ] 

Ben Lau commented on HBASE-13991:
-

Hi guys, thanks for all the feedback.  I think we agree that it may make sense 
to switchover entirely to the new layout instead of making it optional.  Let me 
get back to you guys, I need to talk with Francis some more about the other 
suggestions.  Thanks guys.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-07-06 Thread Ben Lau (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615813#comment-14615813
]

Ben Lau commented on HBASE-13991:
-

Hi guys, hope you had a happy 4th of July.

We would like to do something akin to Lars’ last idea. That is, we will have
code to support both the old layout and the new layout, but it will be on a per
HBase cluster basis. You will be able to migrate a cluster entirely to the
hierarchical layout or leave it on the old layout.

This approach has the following pros:
- If HBase users do not need/want the new layout, they will not have to do an
offline upgrade in order to use new HBase code. The alternative is to make an
online upgrade for the hierarchical layout, but this would require some very
messy changes to the codebase and also be tricky to test fully.
- HBase code will not have to ‘detect’ whether tables/paths/regions are
hierarchical or not. The master or region server can simply look at the root
table at startup and use that to determine if the cluster has migrated to the
hierarchical layout. This single source of truth would make code less ugly
since you don’t need to do in-context per-region/path checks in different parts
of the codebase.

What do you guys think about this approach?

Hierarchical Layout for Humongous Tables

Key: HBASE-13991
URL: https://issues.apache.org/jira/browse/HBASE-13991
Project: HBase
Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606793#comment-14606793
 ] 

Ben Lau commented on HBASE-13991:
-

Sure.  Created a reviewboard request here: https://reviews.apache.org/r/36029/

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-13991:

Attachment: HumongousTableDoc.pdf

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-13991:

Attachment: HBASE-13991-master.patch

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-13991:

Description: 
Add support for humongous tables via a hierarchical layout for regions on 
filesystem.  

Credit for most of this code goes to Huaiyu Zhu.  

  was:
Add support for humongous tables via a hierarchical layout for regions on 
filesystem.  




 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)

Ben Lau created HBASE-13991:
---

 Summary: Hierarchical Layout for Humongous Tables
 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau


Add support for humongous tables via a hierarchical layout for regions on 
filesystem.  





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-21 Thread Ben Lau (JIRA)

Ben Lau created HBASE-14283:
---

 Summary: Reverse scan doesn’t work with HFile inline index/bloom 
blocks
 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch

Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
level index blocks.  The reason is because the seekBefore() call calculates the 
previous data block’s size by assuming data blocks are contiguous which is not 
the case in HFile V2 and beyond.

Attached is a first cut patch (targeting 
bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
(1) a unit test which exposes the bug and demonstrates failures for both inline 
bloom blocks and inline index blocks
(2) a proposed fix for inline index blocks that does not require a new HFile 
version change, but is only performant for 1 and 2-level indexes and not 3+.  
3+ requires an HFile format update for optimal performance.

This patch does not fix the bloom filter blocks bug.  But the fix should be 
similar to the case of inline index blocks.  The reason I haven’t made the 
change yet is I want to confirm that you guys would be fine with me revising 
the HFile.Reader interface.

Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
HFileReader class doesn’t have a reference to the bloom filters (and hence 
their indices) and only constructs the IO streams and hence has no way to know 
where the bloom blocks are in the HFile.  It seems that the HFile.Reader bloom 
method comments state that they “know nothing about how that metadata is 
structured” but I do not know if that is a requirement of the abstraction 
(why?) or just an incidental current property. 

We would like to do 3 things with community approval:
(1) Update the HFile.Reader interface and implementation to contain and return 
BloomFilters directly rather than unstructured IO streams
(2) Merge the fixes for index blocks and bloom blocks into open source
(3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
field in the block header in the next HFile version, so that seekBefore() calls 
can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-21 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707096#comment-14707096
 ] 

Ben Lau commented on HBASE-14283:
-

In the patch I also added an extra unit test to TestFromClientside, that also 
tests reverse scan, but is unrelated to this bug, just as an additional test to 
strengthen the test suite.  

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-21 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707088#comment-14707088
 ] 

Ben Lau commented on HBASE-14283:
-

We suspect the person in HBASE-13830 also ran into this bug, based on his 
similar exception, but can’t know for sure without more information from him.

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-21 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14283:

Attachment: hfile-seek-before.patch

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13830) Hbase REVERSED may throw Exception sometimes

2015-08-21 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707111#comment-14707111
 ] 

Ben Lau commented on HBASE-13830:
-

Possibly the same bug as in HBASE-14283, or related.

 Hbase REVERSED may throw Exception sometimes
 

 Key: HBASE-13830
 URL: https://issues.apache.org/jira/browse/HBASE-13830
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.1
Reporter: ryan.jin

 run a scan at hbase shell command.
 {code}
 scan 
 'analytics_access',{ENDROW='9223370603647713262-flume01.hadoop-10.32.117.111-373563509',LIMIT=10,REVERSED=true}
 {code}
 will throw exception
 {code}
 java.io.IOException: java.io.IOException: Could not seekToPreviousRow 
 StoreFileScanner[HFileScanner for reader 
 reader=hdfs://nameservice1/hbase/data/default/analytics_access/a54c47c568c00dd07f9d92cfab1accc7/cf/2e3a107e9fec4930859e992b61fb22f6,
  compression=lzo, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
 [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] 
 [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], 
 firstKey=9223370603542781142-flume01.hadoop-10.32.117.111-378180911/cf:key/1433311994702/Put,
  
 lastKey=9223370603715515112-flume01.hadoop-10.32.117.111-370923552/cf:timestamp/1433139261951/Put,
  avgKeyLen=80, avgValueLen=115, entries=43544340, length=1409247455, 
 cur=9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0]
  to key 
 9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:448)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.seekToPreviousRow(ReversedKeyValueHeap.java:88)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToPreviousRow(ReversedStoreScanner.java:128)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToNextRow(ReversedStoreScanner.java:88)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:503)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3866)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3946)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3814)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3805)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3136)
   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: On-disk size without header provided is 
 47701, but block header contains 10134. Block offset: -1, data starts with: 
 DATABLK*\x00\x00'\x96\x00\x01\x00\x04\x00\x00\x00\x005\x96^\xD2\x01\x00\x00@\x00\x00\x00'
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:451)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock.access$400(HFileBlock.java:87)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1466)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:569)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:413)
   ... 17 more
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method) ~[na:1.6.0_65]
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  ~[na:1.6.0_65]
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  ~[na:1.6.0_65]
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 
 ~[na:1.6.0_65]
   at

[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606521#comment-14606521
 ] 

Ben Lau commented on HBASE-13991:
-

The HBase master doesn't write much to disk compared to region servers/data 
nodes.  Yes, this change is mostly to the filesystem layout.  We may use the 
'humongous' flag for other things in the future but currently it is only used 
to determine the layout.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-13991:

Description: 
Add support for humongous tables via a hierarchical layout for regions on 
filesystem.  

Credit for most of this code goes to Huaiyu Zhu.  

Latest version of the patch is available on the review board: 
https://reviews.apache.org/r/36029/

  was:
Add support for humongous tables via a hierarchical layout for regions on 
filesystem.  

Credit for most of this code goes to Huaiyu Zhu.  


 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606940#comment-14606940
 ] 

Ben Lau commented on HBASE-13991:
-

I don't think we had one planned but if there were enough parties interested in 
a tool we could probably make one.  It would probably not be too hard (unless 
I'm missing something) for a table that can be taken offline for a short period 
of time.  

On a side note there were some new conflicts in master since I created the 
patch so I have fixed them and re-uploaded a new patch in the reviewboard.  
I'll treat the reviewboard as the holder of the current version of the patch 
and leave the attachment in this ticket as the 1st draft submission.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-07-28 Thread Ben Lau (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben Lau updated HBASE-13991:

Description:
Add support for humongous tables via a hierarchical layout for regions on
filesystem.

Credit for most of this code goes to Huaiyu Zhu.

Latest version of the patch is available on the review board:
https://reviews.apache.org/r/36029/

Known limitation of this patch: It does not deal with HFileLinks, which means
that humongous tables, as implemented in this patch, would not support
snapshots or timeline consistent region replicas feature in their current form.
This could be addressed later but we had decided not to implement HFileLink
support for the first version.

was:
Add support for humongous tables via a hierarchical layout for regions on
filesystem.

Credit for most of this code goes to Huaiyu Zhu.

Latest version of the patch is available on the review board:
https://reviews.apache.org/r/36029/

Hierarchical Layout for Humongous Tables

Add support for humongous tables via a hierarchical layout for regions on
filesystem.
Credit for most of this code goes to Huaiyu Zhu.
Latest version of the patch is available on the review board:
https://reviews.apache.org/r/36029/
Known limitation of this patch: It does not deal with HFileLinks, which means
that humongous tables, as implemented in this patch, would not support
snapshots or timeline consistent region replicas feature in their current
form. This could be addressed later but we had decided not to implement
HFileLink support for the first version.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13830) Hbase REVERSED may throw Exception sometimes

2015-08-03 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652873#comment-14652873
 ] 

Ben Lau commented on HBASE-13830:
-

Hey Ryan, do you have more information on this bug.  We are interested in using 
the reverse scan feature at Yahoo and would like to clear up any known bugs 
before internal users take it up for production use.  If you had for example an 
independent program and/or data that could be used to reproduce this issue, we 
would like to see it.  If you cannot reproduce the bug anymore, we'd like to 
know anything else you remember, like the version of HDFS, any custom patches 
you had on your version of HBase, the table schema at the time (eg any 
particular block encodings), etc.

 Hbase REVERSED may throw Exception sometimes
 

 Key: HBASE-13830
 URL: https://issues.apache.org/jira/browse/HBASE-13830
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.1
Reporter: ryan.jin

 run a scan at hbase shell command.
 {code}
 scan 
 'analytics_access',{ENDROW='9223370603647713262-flume01.hadoop-10.32.117.111-373563509',LIMIT=10,REVERSED=true}
 {code}
 will throw exception
 {code}
 java.io.IOException: java.io.IOException: Could not seekToPreviousRow 
 StoreFileScanner[HFileScanner for reader 
 reader=hdfs://nameservice1/hbase/data/default/analytics_access/a54c47c568c00dd07f9d92cfab1accc7/cf/2e3a107e9fec4930859e992b61fb22f6,
  compression=lzo, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
 [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] 
 [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], 
 firstKey=9223370603542781142-flume01.hadoop-10.32.117.111-378180911/cf:key/1433311994702/Put,
  
 lastKey=9223370603715515112-flume01.hadoop-10.32.117.111-370923552/cf:timestamp/1433139261951/Put,
  avgKeyLen=80, avgValueLen=115, entries=43544340, length=1409247455, 
 cur=9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0]
  to key 
 9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:448)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.seekToPreviousRow(ReversedKeyValueHeap.java:88)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToPreviousRow(ReversedStoreScanner.java:128)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToNextRow(ReversedStoreScanner.java:88)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:503)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3866)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3946)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3814)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3805)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3136)
   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: On-disk size without header provided is 
 47701, but block header contains 10134. Block offset: -1, data starts with: 
 DATABLK*\x00\x00'\x96\x00\x01\x00\x04\x00\x00\x00\x005\x96^\xD2\x01\x00\x00@\x00\x00\x00'
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:451)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock.access$400(HFileBlock.java:87)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1466)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:569)
   at

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-27 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976020#comment-14976020
 ] 

Ben Lau commented on HBASE-14283:
-

Reran the failures that looked relevant on the various branches, seems the 
tests are just unstable.  When I get the time I'll try to restart the 
discussion about updating the HFile serialization for more efficient reverse 
scans in HBASE-14576.  

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975101#comment-14975101
 ] 

Ben Lau commented on HBASE-14283:
-

Thanks Andrew.  Can we merge these patches or is there still something that 
needs to be done or reviewed?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975489#comment-14975489
 ] 

Ben Lau commented on HBASE-14283:
-

That's odd, something went wrong with the patch for 1.1 branch.  That compiled 
fine for me but perhaps I overlooked something, or the patch became stale.  I 
will take a look later tonight.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975584#comment-14975584
 ] 

Ben Lau commented on HBASE-14283:
-

Ok thanks just wanted to confirm the other branches were fine as far as we 
know, eg we didnt swap 1.1 patch with 0.98 patch and need to fix/update 0.98 
branch.  Thanks.  It looks like there are some test failures but I think they 
are not related to the patch.  I will rerun the tests that fail tonight locally 
after they appear in this ticket.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975570#comment-14975570
 ] 

Ben Lau commented on HBASE-14283:
-

[~andrew.purt...@gmail.com] Was the wrong patch applied to 1.1 or am I 
misunderstanding something?  So I looked at 
https://github.com/apache/hbase/commits/branch-1.1 specifically 
https://github.com/apache/hbase/commit/0db04a1705e5e8cc04cc9c010ddfc5612f60cfec 
and it is missing the CellUtil method that I had in my 
HBASE-14283-branch-1.1.patch attached.  It's fine if this is fixed as an 
addendum but there's nothing else amiss right (i.e. the other branches have the 
right patches applied?)

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hbase-14283_add.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-26 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974587#comment-14974587
 ] 

Ben Lau commented on HBASE-14283:
-

Still there [~andrew.purt...@gmail.com]?  Anyone else have comments/questions?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-13 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956228#comment-14956228
 ] 

Ben Lau commented on HBASE-14283:
-

So anything I should change in the patches?  How many +1's are needed?  Does 
someone else need to +1?  

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-14 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956541#comment-14956541
 ] 

Ben Lau commented on HBASE-14283:
-

[~lhofhansl] and [~anoop.hbase], I probably should’ve called it out explicitly 
but yes technically, it would affect forward scans (among other things), but I 
don’t think in any noticeable way.  The patched code is called by 
HalfStoreFileReader.getLastKey().  

This getLastKey() method is needed to figure out the split point of a region 
which is eg used in splitting a region.  Extra IO op there but given that 
region splits don’t happen frequently nor at a very high rate when they do I 
think it is fine.  The getLastKey() is also used by StoreFile.Reader for 
passesKeyRangeFilter() check to determine whether a store is applicable to a 
scan, both forward/reverse.  It affects the initial scan creation as well as 
later next() RPC calls I think.  Although this is too bad, it doesn’t really 
matter I think in practice, because the block of the last key will get cached 
in the BlockCache the first time we need to know the last key for that 
halfstore.  So repeated calls later in the region server will not incur any 
overhead.  Let me know if this addresses your concerns or not.

Incidentally I think that caching is one of the primary reasons why there seems 
to be a decent # of people using reverse scan but almost no one reporting this 
bug— because caching often hides it, either completely or nondeterministically 
(requests succeeding on later retries).  We had some problems initially 
reproducing this bug reliably because if we ran forward scans in a table 
concurrently with the reverse scans, it would cause the blocks to become cached 
and so certain key ranges that previously would’ve caused reverse scan to fail 
suddenly started working just fine.

Re: Attaching the same patch, let me know if I'm doing this wrong but it sounds 
like all I should have to do is just upload the same patch file for master but 
with a different name and the QA tests will pick it up.  I will do that.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-14 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14283:

Attachment: HBASE-14283-reupload-master.patch

Attached a new patch for master, same as the previous patch but with 'reupload' 
in the name..

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-07 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948049#comment-14948049
 ] 

Ben Lau commented on HBASE-14283:
-

Hmmm, can't seem to edit Jira comments.  Anyways, I attached short term patches 
to the ticket per discussion for all versions of HBase from 0.98 to master.  
The patches are mostly the same other than 0.98.  There were a couple of 
utility methods that were missing/private in earlier versions of HBase 1.X that 
I backported or changed to public to mirror the master version of the patch.  
(Let me know if that isn't kosher for some reason.)  I probably won't have time 
to update the patches this week but I'll look at feedback and make appropriate 
changes when I get the chance next week.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-07 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947873#comment-14947873
 ] 

Ben Lau commented on HBASE-14283:
-

Alright I'll work on that.  I created HBASE-14576 for the longer term fix, so 
this ticket will just be to implement the short term fix we described.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14576) New HFile version for optimized reverse scans

2015-10-07 Thread Ben Lau (JIRA)

Ben Lau created HBASE-14576:
---

 Summary: New HFile version for optimized reverse scans
 Key: HBASE-14576
 URL: https://issues.apache.org/jira/browse/HBASE-14576
 Project: HBase
  Issue Type: Improvement
Reporter: Ben Lau
Assignee: Ben Lau


A new ticket to finish the work from HBASE-14283, which will fix the 
HFileReader seekBefore() previous block size calculation bug but make the 
resulting reverse scan take more I/O than a forward scan.  

Fixing the bug in the long term requires an HFile version bump, either major or 
minor.  We will put the previous block's size in the HFileBlock header instead 
of trying to calculate it directly using block offset arithmetic.  

Per [~anoop.hbase]'s suggestion, I created this ticket so that we can separate 
the issue of fixing the bug (the responsibility of HBASE-14283) and the issue 
of getting reverse scans to run quickly (the responsibility of this ticket).  
It is also unlikely that this ticket will be backported to old versions of 
HBase eg 0.98 whereas HBASE-14283 can be.  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-07 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14283:

Attachment: HBASE-14283-master.patch
HBASE-14283-branch-1.patch
HBASE-14283-branch-1.2.patch
HBASE-14283-branch-1.1.patch
HBASE-14283-branch-1.0.patch
HBASE-14283-0.98.patch

Short term patches for this bug per discussion.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-15 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959998#comment-14959998
 ] 

Ben Lau commented on HBASE-14283:
-

[~andrew.purt...@gmail.com] Let me know if I'm missing something, but I think 
there is more than 1 scan in play here.  I think you're talking about an 
external hbase client scan.  I'm talking about an internal hbase scan opened up 
by the regionserver and which we know for a fact caches the block.  See the 
implementation of HalfStoreFileReader.getLastKey(), it is creating an internal 
scanner that does cache.  Furthermore, the results of the method aren't cached 
in the Reader class (eg as a variable) and since the method is called 
repeatedly in the codebase it seems likely that the author's expectation was 
that the block cache would work correctly and make an internal cache for the 
file reader redundant.  So not only does this scenario happen only for newly 
split regions but it only happens for the first time.  I can add more comments 
to the patches if it is really necessary but there is already a comment in the 
code indicating that this fix is not performant and is meant to be updated by a 
later ticket whose jira # is listed.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-20 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965794#comment-14965794
 ] 

Ben Lau commented on HBASE-14283:
-

[~andrew.purt...@gmail.com] Is the above explanation agreeable?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, 
> HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-06 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946197#comment-14946197
 ] 

Ben Lau commented on HBASE-14283:
-

Hey guys, sorry, I should be able to get back to this soon.  Finishing up an 
unrelated project right now.  I didn't know that minor versions in HFiles were 
also non-backwards compatible.  That's one less reason then to make this a 
major version bump.  If anyone has a strong preference for this fix to go into 
a V3.X I can change the patch to use minor version (eg for header size 
calculation) when I have time to do it.  If not I'll leave it as V4 since it's 
a little simpler in the code as a major version bump.  My original intention 
btw if it wasn't clear was that this wouldn't be the only change in a V4, just 
the first change that would go into a V4, whose format/contents is not yet 
meant to be final even when this patch is committed, i.e. V4 would be 
essentially a WIP with more changes suggested and implemented in other tickets 
and eventually released in HBase 2.0.

[~anoop.hbase] I'm down for committing a short-term read-the-header-always fix 
for now and then discussing the longer term solution second.  Which branches do 
you want the patch for?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-07-09 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621502#comment-14621502
 ] 

Ben Lau commented on HBASE-13991:
-

Hi stack, we did consider doing an online migration from the old format to the 
new one, but it would require messy changes to the codebase and be tricky to 
test fully. That's because whenever you access a region's contents you would 
have to test for both the humongous and non-humongous path contents instead of 
just using what you know it to be. Also there's a lot more going on during an 
online migration, regions can be moving, splitting, recovering from normal 
cluster operation and testing that an online migration works in all cases would 
be tricky. This can be ameliorated to some extent by making the migration 
'mostly online', i.e. offlining regions, migrating them, then re-opening them.  
For Yahoo’s use case, an online migration is not necessary but if the community 
really needs it we could look into it.  

[~toffer] can comment more, but I believe we would prefer to insert the buckets 
under the table directory for now and gradually transition later to reworking 
meta to be the source of table/region association information, creating a 
uniform/non-table oriented data directory, etc.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-26 Thread Ben Lau (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715147#comment-14715147
]

Ben Lau commented on HBASE-14283:
-

Hi Anoop. The problem isn't that we read a previous block and see that the
block is not the expected type. prevBlockOffset guarantees that we can seek to
the previous block of the same type as the current one. See the comments on
HFileBlock.getPrevBlockOffset(). We are always seeking to the previous data
block, we are simply not calculating how much to read correctly once we have
seeked to that previous data block because our prev data block size calculation
can include other blocks because of the layout of scannable section in
HFileV2+. We need a way of knowing apriori what the size of the previous data
block is. The method you describe is used in
HFileReaderImpl.readNextDataBlock(). Note that the reason this method works is
because this method can use the method
curBlock.getNextBlockOnDiskSizeWithHeader(). We need something similar to that
when seeking backwards in order to achieve optimal performance. Let me know if
I misunderstood what you meant.

Reverse scan doesn’t work with HFile inline index/bloom blocks
--

Key: HBASE-14283
URL: https://issues.apache.org/jira/browse/HBASE-14283
Project: HBase
Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
Attachments: HBASE-14283.patch, hfile-seek-before.patch

Reverse scans do not work if an HFile contains inline bloom blocks or leaf
level index blocks. The reason is because the seekBefore() call calculates
the previous data block’s size by assuming data blocks are contiguous which
is not the case in HFile V2 and beyond.
Attached is a first cut patch (targeting
bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
(1) a unit test which exposes the bug and demonstrates failures for both
inline bloom blocks and inline index blocks
(2) a proposed fix for inline index blocks that does not require a new HFile
version change, but is only performant for 1 and 2-level indexes and not 3+.
3+ requires an HFile format update for optimal performance.
This patch does not fix the bloom filter blocks bug. But the fix should be
similar to the case of inline index blocks. The reason I haven’t made the
change yet is I want to confirm that you guys would be fine with me revising
the HFile.Reader interface.
Specifically, these 2 functions (getGeneralBloomFilterMetadata and
getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the
HFileReader class doesn’t have a reference to the bloom filters (and hence
their indices) and only constructs the IO streams and hence has no way to
know where the bloom blocks are in the HFile. It seems that the HFile.Reader
bloom method comments state that they “know nothing about how that metadata
is structured” but I do not know if that is a requirement of the abstraction
(why?) or just an incidental current property.
We would like to do 3 things with community approval:
(1) Update the HFile.Reader interface and implementation to contain and
return BloomFilters directly rather than unstructured IO streams
(2) Merge the fixes for index blocks and bloom blocks into open source
(3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’
field in the block header in the next HFile version, so that seekBefore()
calls can not only be correct but performant in all cases.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-27 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717095#comment-14717095
 ] 

Ben Lau commented on HBASE-14283:
-

Thanks ramkrishna.  I will take a crack at fixing the calculation in the 
presence of bloom filters later and see how an interface update looks.  

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-24 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710583#comment-14710583
 ] 

Ben Lau commented on HBASE-14283:
-

Anyone have any comments on this bug and the fix?  Perhaps [~zjushch] (from 
HBASE-4811 which implemented reverse scan) or [~mikhail] (from HBASE-3857 which 
implemented HFileV2) or [~liyintang] (from HBASE-4532 which added the bloom 
filter method(s) I have a question about)?  Or maybe one of them can tell me 
the person(s) who would be best suited to examine the bug and proposed fixes?

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-25 Thread Ben Lau (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711760#comment-14711760
]

Ben Lau commented on HBASE-14283:
-

Thanks [~ramkrishna.s.vasude...@gmail.com] for showing me the conventions for
the patch process. The tests look like they passed, with 1 minor style
comment. I can fix that but want to address the bloom filter blocks issue
first. Who would be best for me to ask about it, how about you? Is it
reasonable to change the HFile.Reader interface so that the HFile reader
(instead of the higher level StoreFile reader) is in charge of deserializing
and holding the bloom filter data structures? Can't see why not but maybe I'm
missing something.

Reverse scan doesn’t work with HFile inline index/bloom blocks
--

Key: HBASE-14283
URL: https://issues.apache.org/jira/browse/HBASE-14283
Project: HBase
Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
Attachments: HBASE-14283.patch, hfile-seek-before.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724040#comment-14724040
 ] 

Ben Lau commented on HBASE-14283:
-

Correct me if I'm wrong but I think I can't do a generalBloomFilter == null 
check because there are 3 states not 2: (1) Tried to load filter and it exists 
(2) Tried to load filter and it does not exist (3) Have not tried to load 
filter yet.  If we rely on the generalBloomFilter == null check we can't 
distinguish between (2) and (3) which means we would end up trying to reload 
the filter unnecessarily.

{quote}
Can information from FileInfo be included in the exception message to 
facilitate debugging ?
{quote}

What information from FileInfo should be provided?  The point of the message 
(maybe it needs to be revised for clarity) is that we found an unexpected 
BloomFilter in the HFile-- unexpected because the HFile FileInfo metadata 
claims there is no bloom filter (type = NONE).  I will clarify the msg a bit.  

I'll fix the other issues, thanks.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724104#comment-14724104
 ] 

Ben Lau commented on HBASE-14283:
-

The other parts of the code consider BloomFilter to be a nullable type, not 
just here.  I don't know if it makes sense to change that in this patch.  It is 
a bit overkill to use a null object here and seems to increase complexity more 
than it eliminates currently (new class to eliminate a load flag), since unlike 
in some common use cases of having a null object, we can't avoid checking for 
the null object here.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724107#comment-14724107
 ] 

Ben Lau commented on HBASE-14283:
-

Alright I'll put it on reviewboard, thanks.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-31 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724357#comment-14724357
 ] 

Ben Lau commented on HBASE-14283:
-

Reviewboard link with updated version of patch (v3): 
https://reviews.apache.org/r/37971/

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-08 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736093#comment-14736093
 ] 

Ben Lau commented on HBASE-14283:
-

Yep, we talked with Anoop and agree that the patch adds a lot of complexity for 
a fix that doesn't fix the issue 100%.  The portion of the patch that is 
required to fix the bug for bloom filters is especially long.  We thought to 
aim for a longer term fix later, but based on our discussion with Anoop it 
sounds like a backwards compatible, complete fix that adds the necessary 
metadata to HFile should not be too complicated/much work (eg does not involve 
creating new HFileReader implementation or other infrastructure).  We will 
submit a new patch later with a final fix.  We will keep the unit tests from 
the 1st patch since they are still applicable.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-15 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746484#comment-14746484
 ] 

Ben Lau commented on HBASE-14283:
-

After talking to some committers in HBase, it seems that unless there is a very 
strong case / no viable alternative, all new patches to HBase should not 
require a full cluster restart.  Hence, we will be going with the 
2-rolling-restart approach as described above.  It requires the cluster 
operator to do 2 rolling restarts and set a new config but that should not be 
too burdensome for a major upgrade.  This rolling-restart-compatible approach 
is a bit more messy/complicated code-wise so let us look a bit into the best 
way to do this.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14439) New/Improved Filesystem Abstractions

2015-09-15 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14439:

Description: 
Ticket for work in progress on new FileSystem abstractions.  Previously, we 
(Yahoo) submitted a ticket that would add support for humongous (1 million 
region+) tables via a hierarchical layout (HBASE-13991).  However open source 
is moving in a similar but not identical direction in the future and so the 
patch will not be merged into open source.

We will be working on a different patch now with folks from open source.  It 
will create/add to 2 layers-- a path abstraction layer and a use-oriented 
abstraction layer.  The path abstraction layer is epitomized by classes like 
FsUtils (and in the patch new classes like AFsLayout).  The use oriented 
abstraction layer is epitomized by existing classes like 
MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build 
on the path abstraction layer and focus on 'doing things' (eg creating regions) 
and less on the gritty details like the paths.

This work on abstracting and isolating the paths from the use cases will help 
Yahoo not diverge too much from open source with its internal 'Humongous' table 
hierarchical layout, while also helping open source move further towards the 
eventual goal of redoing the FS layout in a similar (but different) 
hierarchical layout later that focuses on data directory uniformity (unlike the 
humongous patch) and storing hierarchy in the meta table instead which enables 
new optimizations (see HBASE-14090.)

Attached to this ticket is some work we've done at Yahoo so far that will be 
put into an open source HBase branch for further collaboration.  The patch is 
not meant to be complete yet and is a work in progress.  (Please wait on patch 
comments/reviews.)  It also includes some Yahoo-specific 'humongous' layout 
code that will be removed before submission in open source.

  was:
Ticket for work in progress on new FileSystem abstractions.  Previously, we 
(Yahoo) submitted a ticket that would add support for humongous (1 million 
region+) tables via a hierarchical layout (HBASE-13991).  However open source 
is moving in a similar but not identical direction in the future and so the 
patch will not be merged into open source.

We will be working with Cloudera on a different patch now.  It will create/add 
to 2 layers-- a path abstraction layer and a use-oriented abstraction layer.  
The path abstraction layer is epitomized by classes like FsUtils (and in the 
patch new classes like AFsLayout).  The use oriented abstraction layer is 
epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and 
possibly new classes later) that build on the path abstraction layer and focus 
on 'doing things' (eg creating regions) and less on the gritty details like the 
paths.

This work on abstracting and isolating the paths from the use cases will help 
Yahoo not diverge too much from open source with its internal 'Humongous' table 
hierarchical layout, while also helping open source move further towards the 
eventual goal of redoing the FS layout in a similar (but different) 
hierarchical layout later that focuses on data directory uniformity (unlike the 
humongous patch) and storing hierarchy in the meta table instead which enables 
new optimizations (see HBASE-14090.)

Attached to this ticket is some work we've done at Yahoo so far that will be 
put into an open source HBase branch for further collaboration.  The patch is 
not meant to be complete yet and is a work in progress.  (Please wait on patch 
comments/reviews.)  It also includes some Yahoo-specific 'humongous' layout 
code that will be removed before submission in open source.


> New/Improved Filesystem Abstractions
> 
>
> Key: HBASE-14439
> URL: https://issues.apache.org/jira/browse/HBASE-14439
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ben Lau
>Assignee: Matteo Bertozzi
> Attachments: abstraction.patch
>
>
> Ticket for work in progress on new FileSystem abstractions.  Previously, we 
> (Yahoo) submitted a ticket that would add support for humongous (1 million 
> region+) tables via a hierarchical layout (HBASE-13991).  However open source 
> is moving in a similar but not identical direction in the future and so the 
> patch will not be merged into open source.
> We will be working on a different patch now with folks from open source.  It 
> will create/add to 2 layers-- a path abstraction layer and a use-oriented 
> abstraction layer.  The path abstraction layer is epitomized by classes like 
> FsUtils (and in the patch new classes like AFsLayout).  The use oriented 
> abstraction layer is epitomized by existing classes like 
> MasterFileSystem/HRegionFileSystem (and possibly new classes later) that 
> build on

[jira] [Created] (HBASE-14439) New Filesystem Abstraction Layer

2015-09-15 Thread Ben Lau (JIRA)

Ben Lau created HBASE-14439:
---

 Summary: New Filesystem Abstraction Layer
 Key: HBASE-14439
 URL: https://issues.apache.org/jira/browse/HBASE-14439
 Project: HBase
  Issue Type: New Feature
Reporter: Ben Lau


Ticket for work in progress on new FileSystem abstractions.  Previously, we 
(Yahoo) submitted a ticket that would add support for humongous (1 million 
region+) tables via a hierarchical layout (HBASE-13991).  However open source 
is moving in a similar but not identical direction in the future and so the 
patch will not be merged.

We will be working with Cloudera on a different patch now.  It will create/add 
to 2 layers-- a path abstraction layer and a use-oriented abstraction layer.  
The path abstraction layer is epitomized by classes like FsUtils (and in the 
patch new classes like AFsLayout).  The use oriented abstraction layer is 
epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and 
possibly new classes later) that build on the path abstraction layer and focus 
on 'doing things' (eg creating regions) and less on the gritty details like the 
paths.

This work on abstracting and isolating the paths from the use cases will help 
Yahoo not diverge too much from open source with its internal 'Humongous' table 
hierarchical layout, while also helping open source move further towards the 
eventual goal of redoing the FS layout in a similar (but different) 
hierarchical layout later that focuses on data directory uniformity and storing 
hierarchy in the meta table instead (see HBASE-14090.)

Attached to this ticket is some work we've done at Yahoo so far that will be 
put into an open source HBase branch for further collaboration.  The patch is 
not meant to be complete yet and is a work in progress.  (Please wait on patch 
comments/reviews.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14439) New Filesystem Abstraction Layer

2015-09-15 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14439:

Attachment: abstraction.patch

> New Filesystem Abstraction Layer
> 
>
> Key: HBASE-14439
> URL: https://issues.apache.org/jira/browse/HBASE-14439
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ben Lau
> Attachments: abstraction.patch
>
>
> Ticket for work in progress on new FileSystem abstractions.  Previously, we 
> (Yahoo) submitted a ticket that would add support for humongous (1 million 
> region+) tables via a hierarchical layout (HBASE-13991).  However open source 
> is moving in a similar but not identical direction in the future and so the 
> patch will not be merged.
> We will be working with Cloudera on a different patch now.  It will 
> create/add to 2 layers-- a path abstraction layer and a use-oriented 
> abstraction layer.  The path abstraction layer is epitomized by classes like 
> FsUtils (and in the patch new classes like AFsLayout).  The use oriented 
> abstraction layer is epitomized by existing classes like 
> MasterFileSystem/HRegionFileSystem (and possibly new classes later) that 
> build on the path abstraction layer and focus on 'doing things' (eg creating 
> regions) and less on the gritty details like the paths.
> This work on abstracting and isolating the paths from the use cases will help 
> Yahoo not diverge too much from open source with its internal 'Humongous' 
> table hierarchical layout, while also helping open source move further 
> towards the eventual goal of redoing the FS layout in a similar (but 
> different) hierarchical layout later that focuses on data directory 
> uniformity and storing hierarchy in the meta table instead (see HBASE-14090.)
> Attached to this ticket is some work we've done at Yahoo so far that will be 
> put into an open source HBase branch for further collaboration.  The patch is 
> not meant to be complete yet and is a work in progress.  (Please wait on 
> patch comments/reviews.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14439) New/Improved Filesystem Abstractions

2015-09-15 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14439:

Description: 
Ticket for work in progress on new FileSystem abstractions.  Previously, we 
(Yahoo) submitted a ticket that would add support for humongous (1 million 
region+) tables via a hierarchical layout (HBASE-13991).  However open source 
is moving in a similar but not identical direction in the future and so the 
patch will not be merged into open source.

We will be working with Cloudera on a different patch now.  It will create/add 
to 2 layers-- a path abstraction layer and a use-oriented abstraction layer.  
The path abstraction layer is epitomized by classes like FsUtils (and in the 
patch new classes like AFsLayout).  The use oriented abstraction layer is 
epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and 
possibly new classes later) that build on the path abstraction layer and focus 
on 'doing things' (eg creating regions) and less on the gritty details like the 
paths.

This work on abstracting and isolating the paths from the use cases will help 
Yahoo not diverge too much from open source with its internal 'Humongous' table 
hierarchical layout, while also helping open source move further towards the 
eventual goal of redoing the FS layout in a similar (but different) 
hierarchical layout later that focuses on data directory uniformity (unlike the 
humongous patch) and storing hierarchy in the meta table instead which enables 
new optimizations (see HBASE-14090.)

Attached to this ticket is some work we've done at Yahoo so far that will be 
put into an open source HBase branch for further collaboration.  The patch is 
not meant to be complete yet and is a work in progress.  (Please wait on patch 
comments/reviews.)

  was:
Ticket for work in progress on new FileSystem abstractions.  Previously, we 
(Yahoo) submitted a ticket that would add support for humongous (1 million 
region+) tables via a hierarchical layout (HBASE-13991).  However open source 
is moving in a similar but not identical direction in the future and so the 
patch will not be merged.

We will be working with Cloudera on a different patch now.  It will create/add 
to 2 layers-- a path abstraction layer and a use-oriented abstraction layer.  
The path abstraction layer is epitomized by classes like FsUtils (and in the 
patch new classes like AFsLayout).  The use oriented abstraction layer is 
epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and 
possibly new classes later) that build on the path abstraction layer and focus 
on 'doing things' (eg creating regions) and less on the gritty details like the 
paths.

This work on abstracting and isolating the paths from the use cases will help 
Yahoo not diverge too much from open source with its internal 'Humongous' table 
hierarchical layout, while also helping open source move further towards the 
eventual goal of redoing the FS layout in a similar (but different) 
hierarchical layout later that focuses on data directory uniformity and storing 
hierarchy in the meta table instead (see HBASE-14090.)

Attached to this ticket is some work we've done at Yahoo so far that will be 
put into an open source HBase branch for further collaboration.  The patch is 
not meant to be complete yet and is a work in progress.  (Please wait on patch 
comments/reviews.)

Summary: New/Improved Filesystem Abstractions  (was: New Filesystem 
Abstraction Layer)

> New/Improved Filesystem Abstractions
> 
>
> Key: HBASE-14439
> URL: https://issues.apache.org/jira/browse/HBASE-14439
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ben Lau
> Attachments: abstraction.patch
>
>
> Ticket for work in progress on new FileSystem abstractions.  Previously, we 
> (Yahoo) submitted a ticket that would add support for humongous (1 million 
> region+) tables via a hierarchical layout (HBASE-13991).  However open source 
> is moving in a similar but not identical direction in the future and so the 
> patch will not be merged into open source.
> We will be working with Cloudera on a different patch now.  It will 
> create/add to 2 layers-- a path abstraction layer and a use-oriented 
> abstraction layer.  The path abstraction layer is epitomized by classes like 
> FsUtils (and in the patch new classes like AFsLayout).  The use oriented 
> abstraction layer is epitomized by existing classes like 
> MasterFileSystem/HRegionFileSystem (and possibly new classes later) that 
> build on the path abstraction layer and focus on 'doing things' (eg creating 
> regions) and less on the gritty details like the paths.
> This work on abstracting and isolating the paths from the use cases will help 
> Yahoo not diverge too much from open source with its internal 'Humongous'

[jira] [Updated] (HBASE-14439) New/Improved Filesystem Abstractions

2015-09-15 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14439:

Description: 
Ticket for work in progress on new FileSystem abstractions.  Previously, we 
(Yahoo) submitted a ticket that would add support for humongous (1 million 
region+) tables via a hierarchical layout (HBASE-13991).  However open source 
is moving in a similar but not identical direction in the future and so the 
patch will not be merged into open source.

We will be working with Cloudera on a different patch now.  It will create/add 
to 2 layers-- a path abstraction layer and a use-oriented abstraction layer.  
The path abstraction layer is epitomized by classes like FsUtils (and in the 
patch new classes like AFsLayout).  The use oriented abstraction layer is 
epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and 
possibly new classes later) that build on the path abstraction layer and focus 
on 'doing things' (eg creating regions) and less on the gritty details like the 
paths.

This work on abstracting and isolating the paths from the use cases will help 
Yahoo not diverge too much from open source with its internal 'Humongous' table 
hierarchical layout, while also helping open source move further towards the 
eventual goal of redoing the FS layout in a similar (but different) 
hierarchical layout later that focuses on data directory uniformity (unlike the 
humongous patch) and storing hierarchy in the meta table instead which enables 
new optimizations (see HBASE-14090.)

Attached to this ticket is some work we've done at Yahoo so far that will be 
put into an open source HBase branch for further collaboration.  The patch is 
not meant to be complete yet and is a work in progress.  (Please wait on patch 
comments/reviews.)  It also includes some Yahoo-specific 'humongous' layout 
code that will be removed before submission in open source.

  was:
Ticket for work in progress on new FileSystem abstractions.  Previously, we 
(Yahoo) submitted a ticket that would add support for humongous (1 million 
region+) tables via a hierarchical layout (HBASE-13991).  However open source 
is moving in a similar but not identical direction in the future and so the 
patch will not be merged into open source.

We will be working with Cloudera on a different patch now.  It will create/add 
to 2 layers-- a path abstraction layer and a use-oriented abstraction layer.  
The path abstraction layer is epitomized by classes like FsUtils (and in the 
patch new classes like AFsLayout).  The use oriented abstraction layer is 
epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and 
possibly new classes later) that build on the path abstraction layer and focus 
on 'doing things' (eg creating regions) and less on the gritty details like the 
paths.

This work on abstracting and isolating the paths from the use cases will help 
Yahoo not diverge too much from open source with its internal 'Humongous' table 
hierarchical layout, while also helping open source move further towards the 
eventual goal of redoing the FS layout in a similar (but different) 
hierarchical layout later that focuses on data directory uniformity (unlike the 
humongous patch) and storing hierarchy in the meta table instead which enables 
new optimizations (see HBASE-14090.)

Attached to this ticket is some work we've done at Yahoo so far that will be 
put into an open source HBase branch for further collaboration.  The patch is 
not meant to be complete yet and is a work in progress.  (Please wait on patch 
comments/reviews.)


> New/Improved Filesystem Abstractions
> 
>
> Key: HBASE-14439
> URL: https://issues.apache.org/jira/browse/HBASE-14439
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ben Lau
>Assignee: Matteo Bertozzi
> Attachments: abstraction.patch
>
>
> Ticket for work in progress on new FileSystem abstractions.  Previously, we 
> (Yahoo) submitted a ticket that would add support for humongous (1 million 
> region+) tables via a hierarchical layout (HBASE-13991).  However open source 
> is moving in a similar but not identical direction in the future and so the 
> patch will not be merged into open source.
> We will be working with Cloudera on a different patch now.  It will 
> create/add to 2 layers-- a path abstraction layer and a use-oriented 
> abstraction layer.  The path abstraction layer is epitomized by classes like 
> FsUtils (and in the patch new classes like AFsLayout).  The use oriented 
> abstraction layer is epitomized by existing classes like 
> MasterFileSystem/HRegionFileSystem (and possibly new classes later) that 
> build on the path abstraction layer and focus on 'doing things' (eg creating 
> regions) and less on the gritty details like the paths.
> This work on

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-14 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744844#comment-14744844
 ] 

Ben Lau commented on HBASE-14283:
-

Hey guys, I started looking into updating the HFile serialization to support 
reverse scans per previous comments.  One thing that immediately struck me as 
being a possible problem is that the header sizes appear to be hardcoded into 
HConstants.java (HConstants.HFILEBLOCK_HEADER_SIZE), rather than being read 
from the HFile block header or HFile metadata itself.  

This seems to imply that if I add more fields to the header and then do a 
rolling restart to update all region servers to have my code, any old region 
server that hasn't updated yet and is processing the new HFiles will not 
realize the header is bigger now and that there is stuff they need to skip / 
ignore.  This might necessitate a 2-step restart process with 2 rolling 
restarts.  

Restart 1 to update all RS to have the appropriate new reading code.  Restart 2 
will enable writes by setting an HBase config option (false by default) to 
start writing the new HFiles.  Am I missing something and this 2-step rolling 
restart is not necessary for some reason?  It seems unlikely people would find 
this process palatable but is there a better alternative?  

Alternatively I can turn this into a non-backwards compatible major version 
update instead of a minor version update and require a full cluster restart but 
that is kind of harsh in its own way.  Opinions/thoughts?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-09 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737171#comment-14737171
 ] 

Ben Lau commented on HBASE-14283:
-

Yes we would be changing the serialization for HFileBlock header.  It would 
have a new field for the previous data block size, for block of the same type 
(same semantics as prevBlockOffset now).  Any objections?

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-24 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906704#comment-14906704
 ] 

Ben Lau commented on HBASE-14283:
-

Hi guys, I have posted a new patch on review board.  See 
https://reviews.apache.org/r/38720/.  The patch adds support for HFileV4 and 
uses it to fix/optimize reverse scan.  The patch is designed to be 
rolling-restartable in 2 phases, as discussed in an above comment on Sep 14.

As currently posted, I think the patch has to go into HBase 2.0 since it 
changes HConstants which is marked @Stable.  It turns out that assumptions 
about the header size and contents are hardcoded in several places, primarily 
HConstants.java.

Let me know what you guys think of the patch.  More eyes would be better 
because the changes are somewhat farther reaching than they sounded initially 
and the HFile format has a long history that I’m not as familiar with as most 
people around here.

I also need to reach out to Facebook later for feedback since this change seems 
like it will affect their external Memcached block cache.  I have marked TODO 
in the patch in several places where I will need to talk with Facebook.

Also, since we are adding a new HFileV4, now would be a good time to fix 
anything else that is broken or add additional metadata for future 
optimizations that we missed out on in HFileV3.  Someone could add more 
metadata after the patch stands on its own as a complete fix for reverse scans. 
 

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-25 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909063#comment-14909063
 ] 

Ben Lau commented on HBASE-14283:
-

As Matteo mentions, we discussed this issue briefly with him and Stack about a 
week ago or so.  [~mbertozzi] I thought we had asked if updating the major 
version would be fine and the answer was in the affirmative but I might've 
misunderstood or misunderstood the degree of affirmation.  (IIRC the reason was 
something along the lines of cluster operators updating 
HFile.FORMAT_VERSION_KEY in their config being a desirable property of the 2nd 
rolling upgrade or something.)

In any case though the reason we initially were going to do a 3.x but moved to 
a 4.0 was because (1) major versions (but not in HFiles?) generally denote 
level of backward compatibility and the new HFiles produced in this patch 
cannot be read by an HFileV3.X reader (2) the patch requires enough changes to 
assumptions in the serialization code (eg regarding header size or block cache) 
that it doesn't seem appropriate as a minor version change and (3) if we are 
following the rules we've set for ourselves, the changes in HConstants alone 
(annotated as Stable) mean this patch should be going into HBase 2.0 
(admittedly rules can always be bent).

Updating only the minor version because the format change currently fixes only 
1 bug (as opposed to 10 bugs or adding a new feature) seems to be the wrong way 
to think of versioning, IMO.  If our concern is that we would like a fix for 
this bug in 1.3 and not wait until 2.0, we could also commit a shorter term fix 
for 1.3 that just always reads the block header or does an optimistic read and 
falls back on reading the block header if the read fails from block size 
expectations (configurable, optimistic off by default).  In combination with an 
expected size correction for index blocks (perhaps not for bloom filter blocks 
since that fix is a messy addition and also violates some API abstraction 
layers in StoreFile) it might be fine in most scenarios, especially if the 
cluster operator is allowed to change the inline block chunking config for the 
cluster that needs to do reverse scans.  Within Yahoo internally we will 
probably go with a bandaid fix like this for now so that users can use reverse 
scans and still get 'ok' even if not max performance.  (Also sub-100% 
performance is better than getting exceptions about block sizes ;) )

If people would be okay with dividing it up this way-- short term fix (no HFile 
changes) for 1.3 and a longer term fix (HFileV4) for HBase 2.0 I could create a 
separate ticket with the newest patch as a starting point for HFileV4 and 
submit another patch for this ticket that implements a configurable 'choose 
your tradeoff' fix as described in the previous paragraph.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-09-24 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906728#comment-14906728
 ] 

Ben Lau commented on HBASE-14283:
-

Hi Nick, that sounds like a good idea, I will do that.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-28 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720844#comment-14720844
 ] 

Ben Lau commented on HBASE-14283:
-

Here's a V2 of the patch that handles bloom filter blocks.  It requires some 
interface changes that blur the line a bit between the StoreFile reader and the 
HFile reader which is not ideal but there isn't really any other way to fix 
this currently in a performant way for bloom filters.  Let me know what you 
guys think.  I have attached the patch to the ticket for review/feedback.  

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283.patch, hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-08-28 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-14283:

Attachment: HBASE-14283-v2.patch

 Reverse scan doesn’t work with HFile inline index/bloom blocks
 --

 Key: HBASE-14283
 URL: https://issues.apache.org/jira/browse/HBASE-14283
 Project: HBase
  Issue Type: Bug
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
 hfile-seek-before.patch


 Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
 level index blocks.  The reason is because the seekBefore() call calculates 
 the previous data block’s size by assuming data blocks are contiguous which 
 is not the case in HFile V2 and beyond.
 Attached is a first cut patch (targeting 
 bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
 (1) a unit test which exposes the bug and demonstrates failures for both 
 inline bloom blocks and inline index blocks
 (2) a proposed fix for inline index blocks that does not require a new HFile 
 version change, but is only performant for 1 and 2-level indexes and not 3+.  
 3+ requires an HFile format update for optimal performance.
 This patch does not fix the bloom filter blocks bug.  But the fix should be 
 similar to the case of inline index blocks.  The reason I haven’t made the 
 change yet is I want to confirm that you guys would be fine with me revising 
 the HFile.Reader interface.
 Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
 getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
 HFileReader class doesn’t have a reference to the bloom filters (and hence 
 their indices) and only constructs the IO streams and hence has no way to 
 know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
 bloom method comments state that they “know nothing about how that metadata 
 is structured” but I do not know if that is a requirement of the abstraction 
 (why?) or just an incidental current property. 
 We would like to do 3 things with community approval:
 (1) Update the HFile.Reader interface and implementation to contain and 
 return BloomFilters directly rather than unstructured IO streams
 (2) Merge the fixes for index blocks and bloom blocks into open source
 (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
 field in the block header in the next HFile version, so that seekBefore() 
 calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability

2016-06-16 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16052:

Attachment: HBASE-16052-master.patch

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
> Attachments: HBASE-16052-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16052) Improve HBaseFsck Scalability

2016-06-16 Thread Ben Lau (JIRA)

Ben Lau created HBASE-16052:
---

 Summary: Improve HBaseFsck Scalability
 Key: HBASE-16052
 URL: https://issues.apache.org/jira/browse/HBASE-16052
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Ben Lau


There are some problems with HBaseFsck that make it unnecessarily slow 
especially for large tables or clusters with many regions.  

This patch tries to fix the biggest bottlenecks and also include a couple of 
bug fixes for some of the race conditions caused by gathering and holding state 
about a live cluster that is no longer true by the time you use that state in 
Fsck processing.  These race conditions cause Fsck to crash and become unusable 
on large clusters with lots of region splits/merges.

Here are some scalability/performance problems in HBaseFsck and the changes the 
patch makes:
- Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and then 
discarding everything but the Paths, then passing the Paths to a PathFilter, 
and then having the filter look up the (previously discarded) FileStatuses of 
the paths again.  This is actually worse than double I/O because the first 
lookup obtains a batch of FileStatuses while all the other lookups are 
individual RPCs performed sequentially.
-- Avoid this by adding a FileStatusFilter so that filtering can happen 
directly on FileStatuses
-- This performance bug affects more than Fsck, but also to some extent things 
like snapshots, hfile archival, etc.  I didn't have time to look too deep into 
other things affected and didn't want to increase the scope of this ticket so I 
focus mostly on Fsck and make only a few improvements to other codepaths.  The 
changes in this patch though should make it fairly easy to fix other code paths 
in later jiras if we feel there are some other features strongly impacted by 
this problem.  
- OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
Fsck runtime) and the running time scales with the number of store files, yet 
the function is completely serial
-- Make offlineReferenceFileRepair multithreaded
- LoadHdfsRegionDirs() uses table-level concurrency, which is a big bottleneck 
if you have 1 large cluster with 1 very large table that has nearly all the 
regions
-- Change loadHdfsRegionDirs() to region-level parallelism instead of 
table-level parallelism for operations.

The changes benefit all clusters but are especially noticeable for large 
clusters with a few very large tables.  On our version of 0.98 with the 
original patch we had a moderately sized production cluster with 2 (user) 
tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-06-21 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342282#comment-15342282
 ] 

Ben Lau commented on HBASE-16052:
-

[~jmhsieh] and [~jxiang] Any comments on this patch?  Just checking since you 
guys are listed as the owners for the HBaseFsck component on 
https://issues.apache.org/jira/browse/HBASE/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
> Attachments: HBASE-16052-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-06-22 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345499#comment-15345499
 ] 

Ben Lau commented on HBASE-16052:
-

Added a comment about a refactor that may clean up / unify the FSUtils 
FileStatus filtering interface a bit at the expense of some minor object 
overhead.  Let me know what you guys think on the reviewboard.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
> Attachments: HBASE-16052-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-06-20 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340148#comment-15340148
 ] 

Ben Lau commented on HBASE-16052:
-

Reviewboard link: https://reviews.apache.org/r/48959/

Added annotation and javadoc for the AbstractFileStatusFilter.

Fixed the whitespaces mentioned in the Hadoop QA bot's comment.  

Also re: Hadoop QA bot comment: I didn't add any new tests since it seems 
existing unit tests are sufficient as this patch optimizes common codepaths 
already exercised by the Fsck tests.


> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
> Attachments: HBASE-16052-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-06-27 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351422#comment-15351422
 ] 

Ben Lau commented on HBASE-16052:
-

Thanks guys.  The patch doesn't apply cleanly to branch-1 so I will upload a 
new patch for branch-1.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-06-27 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351586#comment-15351586
 ] 

Ben Lau commented on HBASE-16052:
-

Uploaded the patch for branch-1.  Re-ran the Fsck unit tests which passed.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability

2016-06-27 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16052:

Attachment: HBASE-16052-v3-branch-1.patch

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14576) New HFile version for optimized reverse scans

2016-05-18 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290115#comment-15290115
 ] 

Ben Lau commented on HBASE-14576:
-

Hi [~syuanjiang] I'm not currently working on the hfile format revision.  The 
(to my knowledge) mostly complete/correct patch I included before in 
HBASE-14283 is what I have still.  I haven't had time to work on open source 
since then.  I may have time this quarter though to come back to the patch and 
fix conflicts/address any issues people raise.  Is this ticket blocking 
anything for you?

> New HFile version for optimized reverse scans
> -
>
> Key: HBASE-14576
> URL: https://issues.apache.org/jira/browse/HBASE-14576
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ben Lau
>Assignee: Ben Lau
>
> A new ticket to finish the work from HBASE-14283, which will fix the 
> HFileReader seekBefore() previous block size calculation bug but make the 
> resulting reverse scan take more I/O than a forward scan.  
> Fixing the bug in the long term requires an HFile version bump, either major 
> or minor.  We will put the previous block's size in the HFileBlock header 
> instead of trying to calculate it directly using block offset arithmetic.  
> Per [~anoop.hbase]'s suggestion, I created this ticket so that we can 
> separate the issue of fixing the bug (the responsibility of HBASE-14283) and 
> the issue of getting reverse scans to run quickly (the responsibility of this 
> ticket).  It is also unlikely that this ticket will be backported to old 
> versions of HBase eg 0.98 whereas HBASE-14283 can be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-20 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16052:

Attachment: HBASE-16052-0.98.v3-amendment.patch

Seems it should just be this method call.  [~tedyu] can you take a look?  

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0, 0.98.21
>
> Attachments: HBASE-16052-0.98.v3-amendment.patch, 
> HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, 
> HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-20 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386493#comment-15386493
 ] 

Ben Lau edited comment on HBASE-16052 at 7/20/16 7:42 PM:
--

Seems it should just be this method call.  [~tedyu] can you take a look at the 
amendment patch I just attached?  


was (Author: benlau):
Seems it should just be this method call.  [~tedyu] can you take a look?  

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0, 0.98.21
>
> Attachments: HBASE-16052-0.98.v3-amendment.patch, 
> HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, 
> HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-20 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386372#comment-15386372
 ] 

Ben Lau commented on HBASE-16052:
-

Looks like some of the methods don't exist on Hadoop 1.1.  When I ran tests 
locally it was against Hadoop 2.X.  Will fix.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0, 0.98.21
>
> Attachments: HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, 
> HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-18 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16052:

Status: Patch Available  (was: In Progress)

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, 
> HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-18 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382689#comment-15382689
 ] 

Ben Lau commented on HBASE-16052:
-

[~busbey] Thanks!  

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, 
> HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-18 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382638#comment-15382638
 ] 

Ben Lau commented on HBASE-16052:
-

[~tedyu] / [~syuanjiang] How does the 0.98 patch look?  Is there a way for me 
to trigger the tests?  Does uploading a file after the ticket is closed not 
trigger tests anymore?  

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, 
> HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-18 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-16052 started by Ben Lau.
---
> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, 
> HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-18 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau reopened HBASE-16052:
-

Re-opening to trigger test on 0.98 patch.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, 
> HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-18 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16052:

Attachment: HBASE-16052-0.98.v3.patch

Thanks [~tedyu] -- done.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, 
> HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-19 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384626#comment-15384626
 ] 

Ben Lau commented on HBASE-16052:
-

Test failure looks unrelated to me.  Reran locally and passed.  Let me know if 
I missed anything but I don't think there are new Javadoc/findbugs warnings in 
the patch.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-0.98.v3.patch, HBASE-16052-master.patch, 
> HBASE-16052-v3-0.98.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-12 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16052:

Attachment: HBASE-16052-v3-0.98.patch

Attached patch for 0.98.  Let me know if this looks good [~te...@apache.org] 
and [~syuanjiang].

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-0.98.patch, 
> HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-06 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365175#comment-15365175
 ] 

Ben Lau commented on HBASE-16052:
-

Ah okay, good to know, thanks.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-06 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365169#comment-15365169
 ] 

Ben Lau commented on HBASE-16052:
-

Hi Ted, I can probably get to it on Monday. Just to make sure I understand the 
process you guys were discussing, since this is an enhancement (not bug fix), 
this means it should go into 0.98?  I thought 0.98 was mostly in a "bug fixes 
only" state?  

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-01 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359848#comment-15359848
 ] 

Ben Lau commented on HBASE-16052:
-

Okay let me know if you guys have a consensus for what other versions should be 
patched.  Re: 0.98-- yes we tested the original version of this patch in 0.98.  
However if we want to patch 0.98 it probably makes more sense to apply a 
backport of the trunk patch than our original 0.98 patch.  The trunk patch has 
some improvements that make the code cleaner on trunk but weren't that 
important on 0.98 originally (eg there are a lot more PathFilter classes in 
trunk so adding an abstract class to avoid too much code duplication became a 
no brainer).  To keep the code from diverging too much it would make sense to 
backport from trunk if we decide to patch 0.98.  I'll add a release note later 
based on the Jira description.  Feel free to expand/shorten it.

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16052) Improve HBaseFsck Scalability

2016-07-01 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16052:

Release Note: 
HBASE-16052 improves the performance and scalability of HBaseFsck, especially 
for large clusters with a small number of large tables.  

Searching for lingering reference files is now a multi-threaded operation.  
Loading HDFS region directory information is now multi-threaded at the 
region-level instead of the table-level to maximize concurrency.  A performance 
bug in HBaseFsck that resulted in redundant I/O and RPCs was fixed by 
introducing a FileStatusFilter that filters FileStatus objects directly.  

> Improve HBaseFsck Scalability
> -
>
> Key: HBASE-16052
> URL: https://issues.apache.org/jira/browse/HBASE-16052
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Ben Lau
>Assignee: Ben Lau
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, 
> HBASE-16052-v3-master.patch
>
>
> There are some problems with HBaseFsck that make it unnecessarily slow 
> especially for large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of 
> bug fixes for some of the race conditions caused by gathering and holding 
> state about a live cluster that is no longer true by the time you use that 
> state in Fsck processing.  These race conditions cause Fsck to crash and 
> become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes 
> the patch makes:
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and 
> then discarding everything but the Paths, then passing the Paths to a 
> PathFilter, and then having the filter look up the (previously discarded) 
> FileStatuses of the paths again.  This is actually worse than double I/O 
> because the first lookup obtains a batch of FileStatuses while all the other 
> lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen 
> directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent 
> things like snapshots, hfile archival, etc.  I didn't have time to look too 
> deep into other things affected and didn't want to increase the scope of this 
> ticket so I focus mostly on Fsck and make only a few improvements to other 
> codepaths.  The changes in this patch though should make it fairly easy to 
> fix other code paths in later jiras if we feel there are some other features 
> strongly impacted by this problem.  
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
> Fsck runtime) and the running time scales with the number of store files, yet 
> the function is completely serial
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big 
> bottleneck if you have 1 large cluster with 1 very large table that has 
> nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of 
> table-level parallelism for operations.
> The changes benefit all clusters but are especially noticeable for large 
> clusters with a few very large tables.  On our version of 0.98 with the 
> original patch we had a moderately sized production cluster with 2 (user) 
> tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16662) Fix open POODLE vulnerabilities

2016-09-21 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16662:

Status: Patch Available  (was: Open)

Formally submit patch file (not sure if I did this right).

> Fix open POODLE vulnerabilities
> ---
>
> Key: HBASE-16662
> URL: https://issues.apache.org/jira/browse/HBASE-16662
> Project: HBase
>  Issue Type: Bug
>  Components: REST, Thrift
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-16662-master.patch
>
>
> We recently found a security issue in our HBase REST servers.  The issue is a 
> variant of the POODLE vulnerability (https://en.wikipedia.org/wiki/POODLE) 
> and is present in the HBase Thrift server as well.  It also appears to affect 
> the JMXListener coprocessor.  The vulnerabilities probably affect all 
> versions of HBase that have the affected services.  (If you don't use the 
> affected services with SSL then this ticket probably doesn't affect you).
> Included is a patch to fix the known POODLE vulnerabilities in master.  Let 
> us know if we missed any.  From our end we only personally encountered the 
> HBase REST vulnerability.  We do not use the Thrift server or JMXListener 
> coprocessor but discovered those problems after discussing the issue with 
> some of the HBase PMCs.
> Coincidentally, Hadoop recently committed a SslSelectChannelConnectorSecure 
> which is more or less the same as one of the fixes in this patch.  Hadoop 
> wasn't originally affected by the vulnerability in the 
> SslSelectChannelConnector, but about a month ago they committed HADOOP-12765 
> which does use that class, so they added a SslSelectChannelConnectorSecure 
> class similar to this patch.  Since this class is present in Hadoop 2.7.4+ 
> which hasn't been released yet, we will for now just include our own version 
> instead of depending on the Hadoop version.
> After the patch is approved for master we can backport as necessary to older 
> versions of HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16662) Fix open POODLE vulnerabilities

2016-09-20 Thread Ben Lau (JIRA)

Ben Lau created HBASE-16662:
---

 Summary: Fix open POODLE vulnerabilities
 Key: HBASE-16662
 URL: https://issues.apache.org/jira/browse/HBASE-16662
 Project: HBase
  Issue Type: Bug
  Components: REST, Thrift
Reporter: Ben Lau
Assignee: Ben Lau


We recently found a security issue in our HBase REST servers.  The issue is a 
variant of the POODLE vulnerability (https://en.wikipedia.org/wiki/POODLE) and 
is present in the HBase Thrift server as well.  It also appears to affect the 
JMXListener coprocessor.  The vulnerabilities probably affect all versions of 
HBase that have the affected services.  (If you don't use the affected services 
with SSL then this ticket probably doesn't affect you).

Included is a patch to fix the known POODLE vulnerabilities in master.  Let us 
know if we missed any.  From our end we only personally encountered the HBase 
REST vulnerability.  We do not use the Thrift server or JMXListener coprocessor 
but discovered those problems after discussing the issue with some of the HBase 
PMCs.

Coincidentally, Hadoop recently committed a SslSelectChannelConnectorSecure 
which is more or less the same as one of the fixes in this patch.  Hadoop 
wasn't originally affected by the vulnerability in the 
SslSelectChannelConnector, but about a month ago they committed HADOOP-12765 
which does use that class, so they added a SslSelectChannelConnectorSecure 
class similar to this patch.  Since this class is present in Hadoop 2.7.4+ 
which hasn't been released yet, we will for now just include our own version 
instead of depending on the Hadoop version.

After the patch is approved for master we can backport as necessary to older 
versions of HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16662) Fix open POODLE vulnerabilities

2016-09-20 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-16662:

Attachment: HBASE-16662-master.patch

> Fix open POODLE vulnerabilities
> ---
>
> Key: HBASE-16662
> URL: https://issues.apache.org/jira/browse/HBASE-16662
> Project: HBase
>  Issue Type: Bug
>  Components: REST, Thrift
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-16662-master.patch
>
>
> We recently found a security issue in our HBase REST servers.  The issue is a 
> variant of the POODLE vulnerability (https://en.wikipedia.org/wiki/POODLE) 
> and is present in the HBase Thrift server as well.  It also appears to affect 
> the JMXListener coprocessor.  The vulnerabilities probably affect all 
> versions of HBase that have the affected services.  (If you don't use the 
> affected services with SSL then this ticket probably doesn't affect you).
> Included is a patch to fix the known POODLE vulnerabilities in master.  Let 
> us know if we missed any.  From our end we only personally encountered the 
> HBase REST vulnerability.  We do not use the Thrift server or JMXListener 
> coprocessor but discovered those problems after discussing the issue with 
> some of the HBase PMCs.
> Coincidentally, Hadoop recently committed a SslSelectChannelConnectorSecure 
> which is more or less the same as one of the fixes in this patch.  Hadoop 
> wasn't originally affected by the vulnerability in the 
> SslSelectChannelConnector, but about a month ago they committed HADOOP-12765 
> which does use that class, so they added a SslSelectChannelConnectorSecure 
> class similar to this patch.  Since this class is present in Hadoop 2.7.4+ 
> which hasn't been released yet, we will for now just include our own version 
> instead of depending on the Hadoop version.
> After the patch is approved for master we can backport as necessary to older 
> versions of HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure

2017-03-02 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau resolved HBASE-17720.
-
Resolution: Duplicate

> Possible bug in FlushSnapshotSubprocedure
> -
>
> Key: HBASE-17720
> URL: https://issues.apache.org/jira/browse/HBASE-17720
> Project: HBase
>  Issue Type: Bug
>  Components: dataloss, snapshots
>Reporter: Ben Lau
>
> I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that 
> it does not appear to explicitly handle a DroppedSnapshotException.  In the 
> primary codepath when flushing memstores, (see 
> MemStoreFlusher.flushRegion()), there is a try/catch for 
> DroppedSnapshotException that will abort the regionserver to replay WALs to 
> avoid data loss.  I don't see this in FlushSnapshotSubProcedure.  Is this an 
> accidental omission or is there a reason this isn't present?  
> I'm not too familiar with procedure V1 or V2.  I assume it is the case that 
> if a participant dies that all other participants will terminate any 
> outstanding operations for the procedure?  If so and if this lack of 
> RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed 
> naively otherwise I assume a failed flush on 1 region server could cause a 
> cascade of RS abortions on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure

2017-03-02 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892973#comment-15892973
 ] 

Ben Lau commented on HBASE-17720:
-

Ah, thanks Jerry.  It looks like the fix was added in HBASE-13877 which was 
committed after the version of 0.98 we use.  The scenario (failure in ITBLL) is 
exactly what I ran into.  I had checked master to make sure this wasn't fixed, 
but I checked FlushSnapshotSubprocedure for the missing try/catch, not 
RegionServerSnapshotManager.  So this is a dup of HBASE-13877.  Will close, 
thanks Jerry.

> Possible bug in FlushSnapshotSubprocedure
> -
>
> Key: HBASE-17720
> URL: https://issues.apache.org/jira/browse/HBASE-17720
> Project: HBase
>  Issue Type: Bug
>  Components: dataloss, snapshots
>Reporter: Ben Lau
>
> I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that 
> it does not appear to explicitly handle a DroppedSnapshotException.  In the 
> primary codepath when flushing memstores, (see 
> MemStoreFlusher.flushRegion()), there is a try/catch for 
> DroppedSnapshotException that will abort the regionserver to replay WALs to 
> avoid data loss.  I don't see this in FlushSnapshotSubProcedure.  Is this an 
> accidental omission or is there a reason this isn't present?  
> I'm not too familiar with procedure V1 or V2.  I assume it is the case that 
> if a participant dies that all other participants will terminate any 
> outstanding operations for the procedure?  If so and if this lack of 
> RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed 
> naively otherwise I assume a failed flush on 1 region server could cause a 
> cascade of RS abortions on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure

2017-03-02 Thread Ben Lau (JIRA)

Ben Lau created HBASE-17720:
---

 Summary: Possible bug in FlushSnapshotSubprocedure
 Key: HBASE-17720
 URL: https://issues.apache.org/jira/browse/HBASE-17720
 Project: HBase
  Issue Type: Bug
  Components: dataloss, snapshots
Reporter: Ben Lau


I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that 
it does not appear to explicitly handle a DroppedSnapshotException.  In the 
primary codepath when flushing memstores, (see MemStoreFlusher.flushRegion()), 
there is a try/catch for DroppedSnapshotException that will abort the 
regionserver to replay WALs to avoid data loss.  I don't see this in 
FlushSnapshotSubProcedure.  Is this an accidental omission or is there a reason 
this isn't present?  

I'm not too familiar with procedure V1 or V2.  I assume it is the case that if 
a participant dies that all other participants will terminate any outstanding 
operations for the procedure?  If so and if this lack of RS.abort() for 
DroppedSnapshotException is a bug, then it can't be fixed naively otherwise I 
assume a failed flush on 1 region server could cause a cascade of RS abortions 
on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly

2018-02-12 Thread Ben Lau (JIRA)

Ben Lau created HBASE-19989:
---

 Summary: READY_TO_MERGE and READY_TO_SPLIT do not update region 
state correctly
 Key: HBASE-19989
 URL: https://issues.apache.org/jira/browse/HBASE-19989
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.1, 1.3.1
Reporter: Ben Lau
Assignee: Ben Lau


Region state transitions do not work correctly for READY_TO_MERGE/SPLIT.  
[~thiruvel] and I noticed this is due to break statements being in the wrong 
place in AssignmentManager.  This allows a race condition for example in which 
one of the regions being merged could be moved concurrently, resulting in the 
merge transaction failing and then double assignment and/or dataloss.  This bug 
appears to only affect branch-1 (for example 1.3 and 1.4) and not branch-2 as 
the relevant code in AM has since been rewritten.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly

2018-02-12 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-19989:

Attachment: HBASE-19989.patch

> READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
> --
>
> Key: HBASE-19989
> URL: https://issues.apache.org/jira/browse/HBASE-19989
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Ben Lau
>Assignee: Ben Lau
>Priority: Major
> Attachments: HBASE-19989.patch
>
>
> Region state transitions do not work correctly for READY_TO_MERGE/SPLIT.  
> [~thiruvel] and I noticed this is due to break statements being in the wrong 
> place in AssignmentManager.  This allows a race condition for example in 
> which one of the regions being merged could be moved concurrently, resulting 
> in the merge transaction failing and then double assignment and/or dataloss.  
> This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not 
> branch-2 as the relevant code in AM has since been rewritten.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-15911) NPE in AssignmentManager.onRegionTransition after Master restart

2018-02-13 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363161#comment-16363161
 ] 

Ben Lau commented on HBASE-15911:
-

[~pankaj2461] [~mantonov] We recently ran into this and had to fix this as it 
was preventing our master from starting up.  We would like to submit a 
suggested fix and test case if you guys do not have a patch yet.

> NPE in AssignmentManager.onRegionTransition after Master restart
> 
>
> Key: HBASE-15911
> URL: https://issues.apache.org/jira/browse/HBASE-15911
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>Assignee: Mikhail Antonov
>Priority: Major
>
> 16/05/27 17:49:18 ERROR ipc.RpcServer: Unexpected throwable object 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.onRegionTransition(AssignmentManager.java:4364)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.reportRegionStateTransition(MasterRpcServices.java:1421)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8623)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2239)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:116)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:137)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:112)
>   at java.lang.Thread.run(Thread.java:745)
> I'm pretty sure I've seen it before and more than once, but never got to dig 
> in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly

2018-02-13 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363081#comment-16363081
 ] 

Ben Lau commented on HBASE-19989:
-

Hi Ted, thanks for the feedback, I'm not sure a comment will be helpful since 
it comes down to 'if the break is here the code below doesn't run, so the break 
is not here' but I have added a comment anyway and re-added the ZKLess 
split/merge tests that were removed in branch-1.  Let me know your thoughts, 
thanks.

> READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
> --
>
> Key: HBASE-19989
> URL: https://issues.apache.org/jira/browse/HBASE-19989
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Ben Lau
>Assignee: Ben Lau
>Priority: Major
> Attachments: HBASE-19989.patch, HBASE-19989.patch
>
>
> Region state transitions do not work correctly for READY_TO_MERGE/SPLIT.  
> [~thiruvel] and I noticed this is due to break statements being in the wrong 
> place in AssignmentManager.  This allows a race condition for example in 
> which one of the regions being merged could be moved concurrently, resulting 
> in the merge transaction failing and then double assignment and/or dataloss.  
> This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not 
> branch-2 as the relevant code in AM has since been rewritten.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly

2018-02-13 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-19989:

Attachment: (was: HBASE-19989.patch)

> READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
> --
>
> Key: HBASE-19989
> URL: https://issues.apache.org/jira/browse/HBASE-19989
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Ben Lau
>Assignee: Ben Lau
>Priority: Major
> Attachments: HBASE-19989.patch
>
>
> Region state transitions do not work correctly for READY_TO_MERGE/SPLIT.  
> [~thiruvel] and I noticed this is due to break statements being in the wrong 
> place in AssignmentManager.  This allows a race condition for example in 
> which one of the regions being merged could be moved concurrently, resulting 
> in the merge transaction failing and then double assignment and/or dataloss.  
> This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not 
> branch-2 as the relevant code in AM has since been rewritten.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly

2018-02-13 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-19989:

Attachment: HBASE-19989.patch

> READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
> --
>
> Key: HBASE-19989
> URL: https://issues.apache.org/jira/browse/HBASE-19989
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Ben Lau
>Assignee: Ben Lau
>Priority: Major
> Attachments: HBASE-19989.patch, HBASE-19989.patch
>
>
> Region state transitions do not work correctly for READY_TO_MERGE/SPLIT.  
> [~thiruvel] and I noticed this is due to break statements being in the wrong 
> place in AssignmentManager.  This allows a race condition for example in 
> which one of the regions being merged could be moved concurrently, resulting 
> in the merge transaction failing and then double assignment and/or dataloss.  
> This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not 
> branch-2 as the relevant code in AM has since been rewritten.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException

2018-02-13 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363168#comment-16363168
 ] 

Ben Lau commented on HBASE-18282:
-

Hi guys, this ticket has been open for a while.  Do you mind if we submit an 
internal patch + test we have for this?

> ReplicationLogCleaner can delete WALs not yet replicated in case of a 
> KeeperException
> -
>
> Key: HBASE-18282
> URL: https://issues.apache.org/jira/browse/HBASE-18282
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>Priority: Critical
>
> ReplicationStateZKBase#getListOfReplicators does not rethrow a 
> KeeperException and returns null in such a case. ReplicationLogCleaner just 
> assumes that there are no replicators and deletes everything.
> ReplicationStateZKBase:
> {code:java}
> public List getListOfReplicators() {
> List result = null;
> try {
>   result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode);
> } catch (KeeperException e) {
>   this.abortable.abort("Failed to get list of replicators", e);
> }
> return result;
>   }
> {code}
> ReplicationLogCleaner:
> {code:java}
> private Set loadWALsFromQueues() throws KeeperException {
> for (int retry = 0; ; retry++) {
>   int v0 = replicationQueues.getQueuesZNodeCversion();
>   List rss = replicationQueues.getListOfReplicators();
>   if (rss == null) {
> LOG.debug("Didn't find any region server that replicates, won't 
> prevent any deletions.");
> return ImmutableSet.of();
>   }
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19995) Current Jetty 9 version in HBase master branch can memory leak under high traffic

2018-02-13 Thread Ben Lau (JIRA)

Ben Lau created HBASE-19995:
---

 Summary: Current Jetty 9 version in HBase master branch can memory 
leak under high traffic
 Key: HBASE-19995
 URL: https://issues.apache.org/jira/browse/HBASE-19995
 Project: HBase
  Issue Type: Bug
  Components: REST
Affects Versions: 2.0
Reporter: Ben Lau


There is a memory-leak in Jetty 9 that manifests whenever you hit the call 
queue limit in HBase REST.  The memory-leak leaks both on-heap and off-heap 
objects permanently.  It happens because whenever the call queue for Jetty 
server overflows, the task that is rejected runs a 'reject' method if it is a 
Rejectable to do any cleanup. This clean up is necessary to for example close 
the connection, deallocate any buffers, etc. Unfortunately, in Jetty 9, they 
implemented the 'reject' / cleanup method of the SelectChannelEndpoint as a 
non-blocking call that is not guaranteed to run.  This was later fixed in Jetty 
9.4 and later backported however the version of Jetty 9 pulled in HBase for 
REST comes before this fix.  See 
[https://github.com/eclipse/jetty.project/issues/1804] and 
[https://github.com/apache/hbase/blob/master/pom.xml#L1416.]

If we want to stay on 9.3.X we could update to 
[9.3.22.v20171030|https://mvnrepository.com/artifact/org.eclipse.jetty/jetty-server/9.3.22.v20171030]
 which is the latest version of 9.3.  Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException

2018-02-14 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-18282:

Attachment: HBASE-18282-branch-2.patch
HBASE-18282-branch-1.patch

> ReplicationLogCleaner can delete WALs not yet replicated in case of a 
> KeeperException
> -
>
> Key: HBASE-18282
> URL: https://issues.apache.org/jira/browse/HBASE-18282
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1
>Reporter: Ashu Pachauri
>Priority: Critical
> Attachments: HBASE-18282-branch-1.patch, HBASE-18282-branch-2.patch
>
>
> ReplicationStateZKBase#getListOfReplicators does not rethrow a 
> KeeperException and returns null in such a case. ReplicationLogCleaner just 
> assumes that there are no replicators and deletes everything.
> ReplicationStateZKBase:
> {code:java}
> public List getListOfReplicators() {
> List result = null;
> try {
>   result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode);
> } catch (KeeperException e) {
>   this.abortable.abort("Failed to get list of replicators", e);
> }
> return result;
>   }
> {code}
> ReplicationLogCleaner:
> {code:java}
> private Set loadWALsFromQueues() throws KeeperException {
> for (int retry = 0; ; retry++) {
>   int v0 = replicationQueues.getQueuesZNodeCversion();
>   List rss = replicationQueues.getListOfReplicators();
>   if (rss == null) {
> LOG.debug("Didn't find any region server that replicates, won't 
> prevent any deletions.");
> return ImmutableSet.of();
>   }
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException

2018-02-14 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364666#comment-16364666
 ] 

Ben Lau commented on HBASE-18282:
-

Thanks [~apurtell].  It looks like this code is rewritten in master now so the 
bug is present only in branch-2 and branch-1.  I will attach forward ports for 
those branches.

> ReplicationLogCleaner can delete WALs not yet replicated in case of a 
> KeeperException
> -
>
> Key: HBASE-18282
> URL: https://issues.apache.org/jira/browse/HBASE-18282
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1
>Reporter: Ashu Pachauri
>Priority: Critical
>
> ReplicationStateZKBase#getListOfReplicators does not rethrow a 
> KeeperException and returns null in such a case. ReplicationLogCleaner just 
> assumes that there are no replicators and deletes everything.
> ReplicationStateZKBase:
> {code:java}
> public List getListOfReplicators() {
> List result = null;
> try {
>   result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode);
> } catch (KeeperException e) {
>   this.abortable.abort("Failed to get list of replicators", e);
> }
> return result;
>   }
> {code}
> ReplicationLogCleaner:
> {code:java}
> private Set loadWALsFromQueues() throws KeeperException {
> for (int retry = 0; ; retry++) {
>   int v0 = replicationQueues.getQueuesZNodeCversion();
>   List rss = replicationQueues.getListOfReplicators();
>   if (rss == null) {
> LOG.debug("Didn't find any region server that replicates, won't 
> prevent any deletions.");
> return ImmutableSet.of();
>   }
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19995) Current Jetty 9 version in HBase master branch can memory leak under high traffic

2018-02-14 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-19995:

Attachment: HBASE-19995.patch

> Current Jetty 9 version in HBase master branch can memory leak under high 
> traffic
> -
>
> Key: HBASE-19995
> URL: https://issues.apache.org/jira/browse/HBASE-19995
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.0
>Reporter: Ben Lau
>Priority: Major
> Attachments: HBASE-19995.patch
>
>
> There is a memory-leak in Jetty 9 that manifests whenever you hit the call 
> queue limit in HBase REST.  The memory-leak leaks both on-heap and off-heap 
> objects permanently.  It happens because whenever the call queue for Jetty 
> server overflows, the task that is rejected runs a 'reject' method if it is a 
> Rejectable to do any cleanup. This clean up is necessary to for example close 
> the connection, deallocate any buffers, etc. Unfortunately, in Jetty 9, they 
> implemented the 'reject' / cleanup method of the SelectChannelEndpoint as a 
> non-blocking call that is not guaranteed to run.  This was later fixed in 
> Jetty 9.4 and later backported however the version of Jetty 9 pulled in HBase 
> for REST comes before this fix.  See 
> [https://github.com/eclipse/jetty.project/issues/1804] and 
> [https://github.com/apache/hbase/blob/master/pom.xml#L1416.]
> If we want to stay on 9.3.X we could update to 
> [9.3.22.v20171030|https://mvnrepository.com/artifact/org.eclipse.jetty/jetty-server/9.3.22.v20171030]
>  which is the latest version of 9.3.  Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException

2018-02-14 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365001#comment-16365001
 ] 

Ben Lau commented on HBASE-18282:
-

[~yuzhih...@gmail.com] I wasn't sure when that sleep can be interrupted 
(probably never in practice?) so I just made it keep retrying but it probably 
makes more sense to just assume if an interrupt happens it's intended to stop 
the thread so I will interrupt and break out as suggested.

> ReplicationLogCleaner can delete WALs not yet replicated in case of a 
> KeeperException
> -
>
> Key: HBASE-18282
> URL: https://issues.apache.org/jira/browse/HBASE-18282
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1
>Reporter: Ashu Pachauri
>Assignee: Ben Lau
>Priority: Critical
> Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2
>
> Attachments: HBASE-18282-branch-1.patch, HBASE-18282-branch-2.patch
>
>
> ReplicationStateZKBase#getListOfReplicators does not rethrow a 
> KeeperException and returns null in such a case. ReplicationLogCleaner just 
> assumes that there are no replicators and deletes everything.
> ReplicationStateZKBase:
> {code:java}
> public List getListOfReplicators() {
> List result = null;
> try {
>   result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode);
> } catch (KeeperException e) {
>   this.abortable.abort("Failed to get list of replicators", e);
> }
> return result;
>   }
> {code}
> ReplicationLogCleaner:
> {code:java}
> private Set loadWALsFromQueues() throws KeeperException {
> for (int retry = 0; ; retry++) {
>   int v0 = replicationQueues.getQueuesZNodeCversion();
>   List rss = replicationQueues.getListOfReplicators();
>   if (rss == null) {
> LOG.debug("Didn't find any region server that replicates, won't 
> prevent any deletions.");
> return ImmutableSet.of();
>   }
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException

2018-02-14 Thread Ben Lau (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau updated HBASE-18282:

Attachment: HBASE-18282-branch-2-v2.patch
HBASE-18282-branch-1-v2.patch

> ReplicationLogCleaner can delete WALs not yet replicated in case of a 
> KeeperException
> -
>
> Key: HBASE-18282
> URL: https://issues.apache.org/jira/browse/HBASE-18282
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1
>Reporter: Ashu Pachauri
>Assignee: Ben Lau
>Priority: Critical
> Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2
>
> Attachments: HBASE-18282-branch-1-v2.patch, 
> HBASE-18282-branch-1.patch, HBASE-18282-branch-2-v2.patch, 
> HBASE-18282-branch-2.patch
>
>
> ReplicationStateZKBase#getListOfReplicators does not rethrow a 
> KeeperException and returns null in such a case. ReplicationLogCleaner just 
> assumes that there are no replicators and deletes everything.
> ReplicationStateZKBase:
> {code:java}
> public List getListOfReplicators() {
> List result = null;
> try {
>   result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode);
> } catch (KeeperException e) {
>   this.abortable.abort("Failed to get list of replicators", e);
> }
> return result;
>   }
> {code}
> ReplicationLogCleaner:
> {code:java}
> private Set loadWALsFromQueues() throws KeeperException {
> for (int retry = 0; ; retry++) {
>   int v0 = replicationQueues.getQueuesZNodeCversion();
>   List rss = replicationQueues.getListOfReplicators();
>   if (rss == null) {
> LOG.debug("Didn't find any region server that replicates, won't 
> prevent any deletions.");
> return ImmutableSet.of();
>   }
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException

2018-02-14 Thread Ben Lau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365010#comment-16365010
 ] 

Ben Lau commented on HBASE-18282:
-

Hi [~yuzhih...@gmail.com], as explained earlier, I believe this bug to not 
exist in master (due to the relevant code being rewritten to use Java streams – 
see the base method that was missing a throw).  It should only be present on 
1.X and 2.X.  Let me know if I missed something.

> ReplicationLogCleaner can delete WALs not yet replicated in case of a 
> KeeperException
> -
>
> Key: HBASE-18282
> URL: https://issues.apache.org/jira/browse/HBASE-18282
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1
>Reporter: Ashu Pachauri
>Assignee: Ben Lau
>Priority: Critical
> Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2
>
> Attachments: HBASE-18282-branch-1-v2.patch, 
> HBASE-18282-branch-1.patch, HBASE-18282-branch-2-v2.patch, 
> HBASE-18282-branch-2.patch
>
>
> ReplicationStateZKBase#getListOfReplicators does not rethrow a 
> KeeperException and returns null in such a case. ReplicationLogCleaner just 
> assumes that there are no replicators and deletes everything.
> ReplicationStateZKBase:
> {code:java}
> public List getListOfReplicators() {
> List result = null;
> try {
>   result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode);
> } catch (KeeperException e) {
>   this.abortable.abort("Failed to get list of replicators", e);
> }
> return result;
>   }
> {code}
> ReplicationLogCleaner:
> {code:java}
> private Set loadWALsFromQueues() throws KeeperException {
> for (int retry = 0; ; retry++) {
>   int v0 = replicationQueues.getQueuesZNodeCversion();
>   List rss = replicationQueues.getListOfReplicators();
>   if (rss == null) {
> LOG.debug("Didn't find any region server that replicates, won't 
> prevent any deletions.");
> return ImmutableSet.of();
>   }
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 142 matches

Mail list logo