Attila Magyar created HIVE-22411:
------------------------------------

             Summary: Performance degradation on single row inserts
                 Key: HIVE-22411
                 URL: https://issues.apache.org/jira/browse/HIVE-22411
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: Attila Magyar
            Assignee: Attila Magyar
             Fix For: 4.0.0
         Attachments: Screen Shot 2019-10-17 at 8.40.50 PM.png

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. For this it traverses the directory recursively. 
During the recursion for each path a separateĀ listStatus call is executed. In 
the end the more delta directory you have the more time it takes to calculate 
the statistics.

Therefore insertion time goes up linearly:

!Screen Shot 2019-10-17 at 8.40.50 PM.png|width=601,height=436!

The fix is to useĀ fs.listFiles(path, /*recursive*/ true) instead the 
handcrafter recursive method/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to