This is regarding the fix that was incorporated in HIVE-6888
<https://issues.apache.org/jira/browse/HIVE-6888> (commit
<https://github.com/apache/hive/commit/eb9fece245a4a529ce3d400a580c55f0c2180785>
).

The fix was issued because the MapWork objects were being leaked due to
having multiple AMs. However, there are cases when this fix clears gWorkMap
prematurely and it is populated (and cleared) again. For example, when
HiveInputFormat.getSplits() is called from HiveSplitGenerator.initialize().

Here, gWorkMap is cleared when getSplits() is called, and populated again
when splitGrouper.generateGroupedSplits() is called. gWorkMap is finally
cleared in the 'finally' block of HiveSplitGenerator.initialize().

In our codebase, we do some modification to MapWork in the getSplits()
call, and those changes are negated when clearMapWork() is called inside
HiveInputFormat.getSplits(). I'm wondering if this call is really required?

Reply via email to