[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16559:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, 
> HIVE-16559.06.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-26 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Attachment: HIVE-16559.06.patch

Updated after RB comments.

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, 
> HIVE-16559.06.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-19 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Attachment: HIVE-16559.05.patch

Rebased patch as it did not apply anymore.

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-19 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Status: Patch Available  (was: Open)

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Attachment: HIVE-16559.04.patch

The previous patched caused regressions which are complicated to solve and/or 
would make this change not worth the effort. I am reuploading the previous 
version of the patch which only contains the validation check.

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-05-22 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Status: Open  (was: Patch Available)

Cancelling the patch as it is causing regressions.

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-05-22 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Attachment: HIVE-16559.03.patch

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-05-22 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Status: Patch Available  (was: Open)

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-05-19 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Status: Open  (was: Patch Available)

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-05-11 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Attachment: HIVE-16559.02.patch

Updated the patch with a the new API to check for the ParquetSerde and added 
the check for the schema evolution enabled property and the acid table property 
as it is with ORC.

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Attachment: HIVE-16559.01.patch

First draft containing a check to prevent the dropping of columns if the table 
is:
- partitioned
- stored in parquet
- cascade option is missing

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Fix Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Description: 
Parquet schema evolution should make it possible to have partitions/tables 
 backed by files with different schemas. Hive should match the table columns 
with file columns based on the column name if possible.
However if the serde for a table is missing columns from the serde of a 
partition Hive fails to match the columns together.
Steps to reproduce:
{code}
CREATE TABLE myparquettable_parted
(
  name string,
  favnumber int,
  favcolor string,
  age int,
  favpet string
)
PARTITIONED BY (day string)
STORED AS PARQUET;

INSERT OVERWRITE TABLE myparquettable_parted
PARTITION(day='2017-04-04')
SELECT
   'mary' as name,
   5 AS favnumber,
   'blue' AS favcolor,
   35 AS age,
   'dog' AS favpet;

alter table myparquettable_parted
REPLACE COLUMNS
(
favnumber int,
age int
);   
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Component/s: Serializers/Deserializers

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );