Hi, I am running on 10 node cluster hdp 2.2. Using tez and yarn. hive version is 0.14
I have a 90 milion row table stroed in a plain text csv 10GB text file. When trying to insert into an orc partitioned table using the statement: "insert overwrite table 2h2 partition (dt) select *,TIME_STAMP from 2h_tmp;" dt is the dynamic partition key. Tez alloactes only one reducer to the job which results in a 6 hour run. I expect about 120 partions to be created . How can I increase number of reducers to speed up this job? Is this related to https://issues.apache.org/jira/browse/HIVE-7158 , it is marked as resolved for hive 0.14 I am running with default values hive.tez.auto.reducer.parallelism Default Value: false Added In: Hive 0.14.0 with HIVE-7158 hive.tez.max.partition.factor Default Value: 2 Added In: Hive 0.14.0 with HIVE-7158 hive.tez.min.partition.factor Default Value: 0.25 Added In: Hive 0.14.0 with HIVE-7158 and hive.exec.dynamic.partition=true; hive.exec.dynamic.partition.mode=nonstrict;
