Yuki Tawara created HBASE-20361:
-----------------------------------

             Summary: Non-succesisve TableInputSplits may wrongly be merged by 
auto balancing feature
                 Key: HBASE-20361
                 URL: https://issues.apache.org/jira/browse/HBASE-20361
             Project: HBase
          Issue Type: Bug
          Components: mapreduce
            Reporter: Yuki Tawara


TableInputFormatBase class offers users a mechanism to exclude specific splits 
from returned list of TableInputFormatBase#getSplits through 
TableInputFormatBase#includeRegionInSplit.
It also offers users a feature called "auto balancing" to mitigate data skew by 
splitting large splits and merging small splits.
If a user overrides TableInputFormatBase#includeRegionInSplit, i th split and 
i+1 th split may not be successive(i th split's end key is smaller than i+1 th 
split's start key).
If he or she further enable auto balancing feature, non-successive splits can 
be merged, which means excluded splits between merged non-successive splits 
"revive".

To avoid such cases, we should not merge non-successive splits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to