[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718413#comment-16718413 ] Hive QA commented on HIVE-17935: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908851/HIVE-17935.8.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15270/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15270/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15270/ Messages: {noformat} This message was trimmed, see log for full details + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-15270/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2018-12-12 02:54:29.157 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at b650083 HIVE-16100: Dynamic Sorted Partition optimizer loses sibling operators (Vineet Garg, Gopal V reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at b650083 HIVE-16100: Dynamic Sorted Partition optimizer loses sibling operators (Vineet Garg, Gopal V reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2018-12-12 02:54:29.717 + rm -rf ../yetus_PreCommit-HIVE-Build-15270 + mkdir ../yetus_PreCommit-HIVE-Build-15270 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-15270 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-15270/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out:61 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/load_dyn_part10.q.out:49 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/load_dyn_part10.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out:79 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/load_dyn_part3.q.out:47 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/load_dyn_part3.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/load_dyn_part4.q.out:57 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/load_dyn_part4.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/load_dyn_part5.q.out:34 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/load_dyn_part5.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out:53 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/load_dyn_part9.q.out:49 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/load_dyn_part9.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/orc_merge2.q.out:37 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/orc_merge2.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/stats2.q.out:19 Falling back to three-way merge... Applied patch to 'ql/src/test/results/clientpositive/spark/stats2.q.out' with conflicts. error: patch failed: ql/src/test/results/clientpositive/spark/union14.q.out:122 Falling back to three-
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718295#comment-16718295 ] Vineet Garg commented on HIVE-17935: [~asherman] Since now this optimization is turned on by default (HIVE-20703 & HIVE-20915) I don't believe we need this JIRA anymore. Is it ok to close it? > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Priority: Major > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch, > HIVE-17935.3.patch, HIVE-17935.4.patch, HIVE-17935.5.patch, > HIVE-17935.6.patch, HIVE-17935.7.patch, HIVE-17935.8.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) OOM issues in the map tasks because it > trying to simultaneously write to 100 different files. > It now seems that the feature is probably mature enough that it can be > enabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349763#comment-16349763 ] Hive QA commented on HIVE-17935: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908851/HIVE-17935.8.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 92 failed/errored test(s), 12965 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_part] (batchId=16) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_1] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_2] (batchId=84) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] (batchId=14) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_partitioned] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_partial] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[implicit_cast_during_insert] (batchId=51) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_into6] (batchId=72) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=81) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part10] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part14] (batchId=90) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part1] (batchId=85) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part3] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part4] (batchId=63) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part8] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part9] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge3] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge4] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition3] (batchId=70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition4] (batchId=34) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition5] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] (batchId=42) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge2] (batchId=91) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats2] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats4] (batchId=81) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_empty_dyn_part] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_15] (batchId=87) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_16] (batchId=74) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_17] (batchId=70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_18] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_25] (batchId=89) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_partitioned] (batchId=52) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_stats] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge2] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_partitioned] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_mm] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_non_mm] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[extrapolate_part_stats_partial_ndv] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] (batchId=163) org.apa
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349723#comment-16349723 ] Hive QA commented on HIVE-17935: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 19m 17s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | modules | C: common ql itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8980/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch, > HIVE-17935.3.patch, HIVE-17935.4.patch, HIVE-17935.5.patch, > HIVE-17935.6.patch, HIVE-17935.7.patch, HIVE-17935.8.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) O
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256625#comment-16256625 ] Hive QA commented on HIVE-17935: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12898098/HIVE-17935.7.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7876/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7876/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7876/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-11-17 08:13:54.847 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-7876/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-11-17 08:13:54.850 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 987d130 HIVE-16756 : Vectorization: LongColModuloLongColumn throws java.lang.ArithmeticException: / by zero (Vihang Karajgaonkar, reviewed by Matt McCline) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 987d130 HIVE-16756 : Vectorization: LongColModuloLongColumn throws java.lang.ArithmeticException: / by zero (Vihang Karajgaonkar, reviewed by Matt McCline) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-11-17 08:13:59.272 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: ql/src/test/results/clientpositive/llap/ppd_union_view.q.out:258 error: ql/src/test/results/clientpositive/llap/ppd_union_view.q.out: patch does not apply error: patch failed: ql/src/test/results/clientpositive/llap/sysdb.q.out:2190 error: ql/src/test/results/clientpositive/llap/sysdb.q.out: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12898098 - PreCommit-HIVE-Build > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch, > HIVE-17935.3.patch, HIVE-17935.4.patch, HIVE-17935.5.patch, > HIVE-17935.6.patch, HIVE-17935.7.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) OOM issues in the map tasks because it > trying to simul
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250477#comment-16250477 ] Hive QA commented on HIVE-17935: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12897347/HIVE-17935.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11380 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] (batchId=77) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ppd_union_view] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_part_update] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_partitioned] (batchId=160) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=111) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge2 (batchId=274) org.apache.hadoop.hive.ql.TestTxnCommands2.testMultiInsert (batchId=274) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDynamicPartitionsMerge2 (batchId=284) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMultiInsert (batchId=284) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.schemaEvolutionAddColDynamicPartitioningUpdate (batchId=221) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitions (batchId=233) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitionsUnionAll (batchId=233) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomCreatedDynamicPartitionsMultiInsert (batchId=230) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomCreatedDynamicPartitionsUnionAll (batchId=230) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7793/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7793/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7793/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12897347 - PreCommit-HIVE-Build > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch, > HIVE-17935.3.patch, HIVE-17935.4.patch, HIVE-17935.5.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) OOM issues in the map tasks because it > trying to simultaneously write to 100 different files. > It now seems that the feature is probably mature enough that it can be > enabled by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245285#comment-16245285 ] Hive QA commented on HIVE-17935: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12896740/HIVE-17935.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 39 failed/errored test(s), 11372 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_partitioned] (batchId=27) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_partitioned] (batchId=50) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge2] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_partitioned] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_mm] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_non_mm] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[extrapolate_part_stats_partial_ndv] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part1] (batchId=166) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part3] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part5] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_part_update] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_part_update] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_dml] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_partitioned] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time] (batchId=165) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge2] (batchId=176) org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_dyn_part] (batchId=89) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=111) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge2 (batchId=274) org.apache.hadoop.hive.ql.TestTxnCommands2.testMultiInsert (batchId=274) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDynamicPartitionsMerge2 (batchId=284) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMultiInsert (batchId=284) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.schemaEvolutionAddColDynamicPartitioningUpdate (batchId=221) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitions (batchId=230) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomCreatedDynamicPartitions (batchId=230) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7727/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7727/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7727/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 39 tests failed {noformat} This message is automatically generated.
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234703#comment-16234703 ] Prasanth Jayachandran commented on HIVE-17935: -- bq. Is it fair to say that the gains from changing the default are potentially large while the losses are comparatively small? For cases where it is beneficial, this is definitely a huge gain. There are many gains with this optimization. The point I was trying to make is that users should be aware of the regression for some cases until optimizer makes this decision automatically. > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) OOM issues in the map tasks because it > trying to simultaneously write to 100 different files. > It now seems that the feature is probably mature enough that it can be > enabled by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234701#comment-16234701 ] Prasanth Jayachandran commented on HIVE-17935: -- bq. Do you think the possible performance regression for some jobs to be large? Unfortunately, not quantifiable. Overhead is essentially sort + shuffle + new tasks spin up for reduce tasks. If partition column count is low and data size is small, the regression factor will be completely different than the case with large data set. > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) OOM issues in the map tasks because it > trying to simultaneously write to 100 different files. > It now seems that the feature is probably mature enough that it can be > enabled by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234652#comment-16234652 ] Andrew Sherman commented on HIVE-17935: --- Thanks [~prasanth_j] for helpful comments. Do you think the possible performance regression for some jobs to be large? Is it fair to say that the gains from changing the default are potentially large while the losses are comparatively small? > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) OOM issues in the map tasks because it > trying to simultaneously write to 100 different files. > It now seems that the feature is probably mature enough that it can be > enabled by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234570#comment-16234570 ] Prasanth Jayachandran commented on HIVE-17935: -- The thing to note is that this might cause performance regression for some jobs. Jobs with partition column values in the order of 10s will have regression as it may run as map only job. This feature will force a reducer stage even for small jobs. In some cases, reducer deduplication can bring in gains but in cases where there is extra reducer and small partition count this will slow down. This optimization is really beneficial when there are lots of partition which can cause queries to OOM or create GC pressure. In all cases, this will also result in optimal file structure (concurrent writers for ORC can result in too many small stripes per file which is suboptimal). So there are good and bad about this optimization. Ideally we want optimizer to make smart decision during planning whether to enable this or not based on column stats from source table. cc/ [~ashutoshc] > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) OOM issues in the map tasks because it > trying to simultaneously write to 100 different files. > It now seems that the feature is probably mature enough that it can be > enabled by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)