[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-16559: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, > HIVE-16559.06.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Attachment: HIVE-16559.06.patch Updated after RB comments. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, > HIVE-16559.06.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Attachment: HIVE-16559.05.patch Rebased patch as it did not apply anymore. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Status: Patch Available (was: Open) > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Attachment: HIVE-16559.04.patch The previous patched caused regressions which are complicated to solve and/or would make this change not worth the effort. I am reuploading the previous version of the patch which only contains the validation check. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch, HIVE-16559.04.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Status: Open (was: Patch Available) Cancelling the patch as it is causing regressions. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Attachment: HIVE-16559.03.patch > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Status: Patch Available (was: Open) > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, > HIVE-16559.03.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Status: Open (was: Patch Available) > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Attachment: HIVE-16559.02.patch Updated the patch with a the new API to check for the ParquetSerde and added the check for the schema evolution enabled property and the acid table property as it is with ORC. > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Attachment: HIVE-16559.01.patch First draft containing a check to prevent the dropping of columns if the table is: - partitioned - stored in parquet - cascade option is missing > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Fix Version/s: 3.0.0 Status: Patch Available (was: Open) > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Description: Parquet schema evolution should make it possible to have partitions/tables backed by files with different schemas. Hive should match the table columns with file columns based on the column name if possible. However if the serde for a table is missing columns from the serde of a partition Hive fails to match the columns together. Steps to reproduce: {code} CREATE TABLE myparquettable_parted ( name string, favnumber int, favcolor string, age int, favpet string ) PARTITIONED BY (day string) STORED AS PARQUET; INSERT OVERWRITE TABLE myparquettable_parted PARTITION(day='2017-04-04') SELECT 'mary' as name, 5 AS favnumber, 'blue' AS favcolor, 35 AS age, 'dog' AS favpet; alter table myparquettable_parted REPLACE COLUMNS ( favnumber int, age int ); > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Component/s: Serializers/Deserializers > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > REPLACE COLUMNS > ( > favnumber int, > age int > );