BELUGA BEHR created HIVE-21071:
----------------------------------

             Summary: Improve getInputSummary
                 Key: HIVE-21071
                 URL: https://issues.apache.org/jira/browse/HIVE-21071
             Project: Hive
          Issue Type: Improvement
          Components: HiveServer2
    Affects Versions: 3.1.1, 3.0.0, 4.0.0
            Reporter: BELUGA BEHR


There is a global lock in the {{getInptSummary}} code, so it is important that 
it be fast.  The current implementation has quite a bit of overhead that can be 
re-engineered.

For example, the current implementation keeps a map of File Path to 
ContentSummary object.  This map is populated by several threads concurrently. 
The method then loops through the map, in a single thread, at the end to add up 
all of the ContentSummary objects and ignores the paths.  The code can be be 
re-engineered to not use a map, or a collection at all, to store the results 
and instead just keep a running tally.  By keeping a tally, there is no O(n) 
operation at the end to perform the addition.

There are other things can be improved.  The method returns an object which is 
never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to