Yuki Tawara created HBASE-20361: ----------------------------------- Summary: Non-succesisve TableInputSplits may wrongly be merged by auto balancing feature Key: HBASE-20361 URL: https://issues.apache.org/jira/browse/HBASE-20361 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Yuki Tawara
TableInputFormatBase class offers users a mechanism to exclude specific splits from returned list of TableInputFormatBase#getSplits through TableInputFormatBase#includeRegionInSplit. It also offers users a feature called "auto balancing" to mitigate data skew by splitting large splits and merging small splits. If a user overrides TableInputFormatBase#includeRegionInSplit, i th split and i+1 th split may not be successive(i th split's end key is smaller than i+1 th split's start key). If he or she further enable auto balancing feature, non-successive splits can be merged, which means excluded splits between merged non-successive splits "revive". To avoid such cases, we should not merge non-successive splits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)