Andrew Sherman created HIVE-17935:
-------------------------------------

             Summary: Turn on hive.optimize.sort.dynamic.partition by default
                 Key: HIVE-17935
                 URL: https://issues.apache.org/jira/browse/HIVE-17935
             Project: Hive
          Issue Type: Bug
            Reporter: Andrew Sherman
            Assignee: Andrew Sherman


The config option hive.optimize.sort.dynamic.partition is an optimization for 
Hive’s dynamic partitioning feature. It was originally implemented in 
[HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this 
optimization, the dynamic partition columns and bucketing columns (in case of 
bucketed tables) are sorted before being fed to the reducers. Since the 
partitioning and bucketing columns are sorted, each reducer can keep only one 
record writer open at any time thereby reducing the memory pressure on the 
reducers. There were some early problems with this optimization and it was 
disabled by default in HiveConf in 
[HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then setting 
hive.optimize.sort.dynamic.partition=true has been used to solve problems where 
dynamic partitioning produces with (1) too many small files on HDFS, which is 
bad for the cluster and can increase overhead for future Hive queries over 
those partitions, and (2) OOM issues in the map tasks because it trying to 
simultaneously write to 100 different files. 

It now seems that the feature is probably mature enough that it can be enabled 
by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to