Rajesh Balamohan created HIVE-24663:
---------------------------------------

             Summary: Batch process in ColStatsProcessor
                 Key: HIVE-24663
                 URL: https://issues.apache.org/jira/browse/HIVE-24663
             Project: Hive
          Issue Type: Improvement
            Reporter: Rajesh Balamohan


When large number of partitions (>20K) are processed, ColStatsProcessor runs 
into DB issues. 

{{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
and in some cases postgres stops processing. 

It would be good to introduce small batches for stats gathering in 
ColStatsProcessor instead of bulk update.

Ref: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to