[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815165#comment-15815165 ] Chaoyu Tang commented on HIVE-15530: LGTM, +1 > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch, HIVE-15530.5.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815019#comment-15815019 ] Hive QA commented on HIVE-15530: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12846582/HIVE-15530.5.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10931 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=121) [auto_sortmerge_join_13.q,join4.q,join35.q,udf_percentile.q,join_reorder3.q,subquery_in.q,auto_join19.q,stats14.q,vectorization_15.q,union7.q,vectorization_nested_udf.q,vector_groupby_3.q,vectorized_ptf.q,auto_join2.q,groupby1_map_skew.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=148) org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=213) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2857/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2857/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2857/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12846582 - PreCommit-HIVE-Build > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch, HIVE-15530.5.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814799#comment-15814799 ] Yibing Shi commented on HIVE-15530: --- You are right that the column stats don't need to be updated if only column positions are changed. Current patch doesn't optimize this, because I didn't notice that {{areSameColumns}} also compares column positions. I will upload a new patch soon. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812855#comment-15812855 ] Chaoyu Tang commented on HIVE-15530: +1, Yes, you are right. For a renamed column, its entry in related tables should be updated as well. But for alter table to only change the column position, should we update its stats? I am not sure if it is a common case like "ALTER TABLE test_change CHANGE a a STRING AFTER b;" to position column a after b. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812787#comment-15812787 ] Yibing Shi commented on HIVE-15530: --- Hi [~ctang.ma], thanks for looking into this patch! I believe that the stats should be still be updated in the scenario you described, because it is column name (not ID) is stored in stats tables. When a column name is changed, the existing stats info should be updated, or at least removed. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812396#comment-15812396 ] Chaoyu Tang commented on HIVE-15530: [~Yibing] The patch looks good. However, I have a small question about this: {code} static boolean columnsIncluded(List oldCols, List newCols) { if (oldCols.size() > newCols.size()) { return false; } else if (oldCols.size() == newCols.size()){ return areSameColumns(oldCols, newCols); } else { return areSameColumns(oldCols, newCols.subList(0, oldCols.size())); } } {code} For the alter table only changing the column name or/and position in a table, the oldCols.size() equals to newCols.size(), but areSameColumns(oldCols, newCols) might return false, in this case, should we still update the the column statistics? > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804635#comment-15804635 ] Aihua Xu commented on HIVE-15530: - The patch looks good to me. +1. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803743#comment-15803743 ] Hive QA commented on HIVE-15530: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12845926/HIVE-15530.4.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10920 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=215) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2811/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2811/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2811/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12845926 - PreCommit-HIVE-Build > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801534#comment-15801534 ] Hive QA commented on HIVE-15530: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12845794/HIVE-15530.3.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10917 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=139) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2801/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2801/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2801/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12845794 - PreCommit-HIVE-Build > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801500#comment-15801500 ] Aihua Xu commented on HIVE-15530: - Hey [~Yibing] I just took a brief look at the patch. In the new file you created, you need to include the apache license header, which you can copy from other files. And also, you don't need to include create time, your name and contact in the file (I guess it's the convention?). Rather you can include the comments to explain what the class does, etc. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800690#comment-15800690 ] Hive QA commented on HIVE-15530: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12845722/HIVE-15530.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2795/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2795/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2795/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-01-05 08:04:01.754 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-2795/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-01-05 08:04:01.757 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at ad335c1 HIVE-15518: Refactoring rows and range related classes to put the window type on Window (Aihua Xu, reviewed by Yongzhi Chen) + git clean -f -d Removing ql/src/test/queries/clientpositive/analyze_tbl_date.q Removing ql/src/test/results/clientpositive/analyze_tbl_date.q.out + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at ad335c1 HIVE-15518: Refactoring rows and range related classes to put the window type on Window (Aihua Xu, reviewed by Yongzhi Chen) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-01-05 08:04:02.692 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveAlterHandler.java: No such file or directory error: metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreUtils.java: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12845722 - PreCommit-HIVE-Build > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794303#comment-15794303 ] Hive QA commented on HIVE-15530: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12845326/HIVE-15530.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10883 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=139) [skewjoinopt15.q,vector_coalesce.q,orc_ppd_decimal.q,cbo_rp_lineage2.q,insert_into_with_schema.q,join_emit_interval.q,load_dyn_part3.q,auto_sortmerge_join_14.q,vector_null_projection.q,vector_cast_constant.q,mapjoin2.q,bucket_map_join_tez2.q,correlationoptimizer4.q,schema_evol_orc_acidvec_part_update.q,vectorization_12.q,vector_number_compare_projection.q,orc_merge_incompat3.q,vector_leftsemi_mapjoin.q,update_all_non_partitioned.q,multi_column_in_single.q,schema_evol_orc_nonvec_table.q,cbo_rp_semijoin.q,tez_insert_overwrite_local_directory_1.q,schema_evol_text_vecrow_table.q,vector_count.q,auto_sortmerge_join_15.q,vector_if_expr.q,delete_whole_partition.q,vector_decimal_6.q,sample1.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2760/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2760/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2760/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12845326 - PreCommit-HIVE-Build > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794176#comment-15794176 ] Pengcheng Xiong commented on HIVE-15530: [~Yibing], could u add a test case for this? Thanks. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)