[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064991#comment-16064991 ] Sergio Peña commented on HIVE-16559: I committed to master. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, > HIVE-16559.06.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063691#comment-16063691 ] Sergio Peña commented on HIVE-16559: +1 The patch looks good. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, > HIVE-16559.06.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063204#comment-16063204 ] Barna Zsombor Klara commented on HIVE-16559: Failures are unrelated: - HIVE-16908 - for HCat failures - HIVE-16785 - is taking care of replication failures - HIVE-15776 - for vector_if_expr - HIVE-16931 - PerfTests - HIVE-16959 - insert_overwrite_local_directory_1 - tez_smb_main seems to be failing constantly > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, > HIVE-16559.06.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063052#comment-16063052 ] Hive QA commented on HIVE-16559: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12874485/HIVE-16559.06.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10846 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=238) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=146) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=233) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=233) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=233) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=233) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=217) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=217) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=178) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5772/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5772/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5772/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12874485 - PreCommit-HIVE-Build > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, > HIVE-16559.06.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057286#comment-16057286 ] Barna Zsombor Klara commented on HIVE-16559: Failures are unrelated: - HIVE-16908 - for HCat failures - HIVE-16785 - is taking care of replication failures - HIVE-15776 - for vector_if_expr - HIVE-16931 - created for the failing PerfTests as they have been failing for close to 100 runs. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053804#comment-16053804 ] Hive QA commented on HIVE-16559: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873463/HIVE-16559.05.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10832 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5675/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5675/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5675/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873463 - PreCommit-HIVE-Build > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053594#comment-16053594 ] Hive QA commented on HIVE-16559: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873245/HIVE-16559.04.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5674/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5674/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5674/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-06-19 08:11:14.979 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5674/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-06-19 08:11:14.981 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at cb6bf88 HIVE-16902: investigate "failed to remove operation log" errors (Aihua Xu, reviewed by Yongzhi Chen) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at cb6bf88 HIVE-16902: investigate "failed to remove operation log" errors (Aihua Xu, reviewed by Yongzhi Chen) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-06-19 08:11:19.639 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java:458 error: ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12873245 - PreCommit-HIVE-Build > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019575#comment-16019575 ] Hive QA commented on HIVE-16559: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12869254/HIVE-16559.03.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10745 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_change_col] (batchId=24) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_cascade] (batchId=83) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_1] (batchId=80) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat11] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat12] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat13] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat14] (batchId=72) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_complex] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_primitive] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_complex] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=144) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5382/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5382/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5382/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12869254 - PreCommit-HIVE-Build > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019502#comment-16019502 ] Barna Zsombor Klara commented on HIVE-16559: I have replaced the original patch with a new one. The testcase should work without the need to cascade table changes to partitions. Thanks to [~spena] for the idea. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007909#comment-16007909 ] Barna Zsombor Klara commented on HIVE-16559: Failing qtests haven been failing for several runs and/or are known to be falky: - HIVE-16606 vector_join30 - HIVE-15288 explainuser_3 - explainanalyze_3 same diff as explainuser_3 and failing for 6 runs > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007296#comment-16007296 ] Hive QA commented on HIVE-16559: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12867546/HIVE-16559.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10688 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join30] (batchId=149) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=97) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=97) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5203/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5203/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5203/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12867546 - PreCommit-HIVE-Build > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002276#comment-16002276 ] Barna Zsombor Klara commented on HIVE-16559: Just to clarify, technically it is possible to fix this issue in the {{ObjectInspectorConverters}} by matching the converters between the input and output fields based on field names (currently they are matched based on field order). But this would mean an overhead whenever we select from a table, even when there is no schema evolution. I find this tradeoff to be not worth it especially since altering the table with the cascade option yields correct results with a one time overhead, when the column changes are propagated to the partitions. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001148#comment-16001148 ] Barna Zsombor Klara commented on HIVE-16559: I ran the same test as in the Jira description but with ORC as the file format and HIVE_SCHEMA_EVOLUTION set to false and ended up with:{{Error: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.Text (state=,code=0)}} when trying to select from the altered table. I don't think ORC supports this either. But then again I don't really understand that check for ORC, how does setting HIVE_SCHEMA_EVOLUTION to false end up with a "supported" mode for schema evolution? > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001064#comment-16001064 ] Sergio Peña commented on HIVE-16559: I see that ORC doesn't support this only when ACID tables are found, and HIVE_SCHEMA_EVOLUTION is enabled. Otherwise, it is supported. Shouldn't we support this for PARQUET as well? > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996949#comment-15996949 ] Barna Zsombor Klara commented on HIVE-16559: Both test failures are known flaky tests: - HIVE-16569 - HIVE-15776 > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989784#comment-15989784 ] Hive QA commented on HIVE-16559: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865552/HIVE-16559.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4934/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4934/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4934/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865552 - PreCommit-HIVE-Build > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );