Ben Lau created HBASE-16052:
-------------------------------

             Summary: Improve HBaseFsck Scalability
                 Key: HBASE-16052
                 URL: https://issues.apache.org/jira/browse/HBASE-16052
             Project: HBase
          Issue Type: Improvement
          Components: hbck
            Reporter: Ben Lau


There are some problems with HBaseFsck that make it unnecessarily slow 
especially for large tables or clusters with many regions.  

This patch tries to fix the biggest bottlenecks and also include a couple of 
bug fixes for some of the race conditions caused by gathering and holding state 
about a live cluster that is no longer true by the time you use that state in 
Fsck processing.  These race conditions cause Fsck to crash and become unusable 
on large clusters with lots of region splits/merges.

Here are some scalability/performance problems in HBaseFsck and the changes the 
patch makes:
- Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and then 
discarding everything but the Paths, then passing the Paths to a PathFilter, 
and then having the filter look up the (previously discarded) FileStatuses of 
the paths again.  This is actually worse than double I/O because the first 
lookup obtains a batch of FileStatuses while all the other lookups are 
individual RPCs performed sequentially.
-- Avoid this by adding a FileStatusFilter so that filtering can happen 
directly on FileStatuses
-- This performance bug affects more than Fsck, but also to some extent things 
like snapshots, hfile archival, etc.  I didn't have time to look too deep into 
other things affected and didn't want to increase the scope of this ticket so I 
focus mostly on Fsck and make only a few improvements to other codepaths.  The 
changes in this patch though should make it fairly easy to fix other code paths 
in later jiras if we feel there are some other features strongly impacted by 
this problem.  
- OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of 
Fsck runtime) and the running time scales with the number of store files, yet 
the function is completely serial
-- Make offlineReferenceFileRepair multithreaded
- LoadHdfsRegionDirs() uses table-level concurrency, which is a big bottleneck 
if you have 1 large cluster with 1 very large table that has nearly all the 
regions
-- Change loadHdfsRegionDirs() to region-level parallelism instead of 
table-level parallelism for operations.

The changes benefit all clusters but are especially noticeable for large 
clusters with a few very large tables.  On our version of 0.98 with the 
original patch we had a moderately sized production cluster with 2 (user) 
tables and ~160k regions where HBaseFsck went from taking 18 min to 5 minutes.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to