[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null
[ https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264656#comment-16264656 ] Ashutosh Chauhan commented on HIVE-18137: - Following rules suppose to be followed for schema evolution. * Partitions when they are created get their schema as current table schema. * There is no way to alter partition schema. * Except via {{cascade}} which is suppose to alter schema of all partitions so that they get same schema as current table schema. * At query time, data is read per schema of table. Partitions will be read with their own schema and then coerced into table schema. Keeping in mind above, in your example since partition schema is not altered, partition will be read per its old schema which means even if you insert new columns, since partition schema doesn't know about it, new columns will be ignored while reading partition. But since table schema contains it, we will add NULL for it after partition has been read and while coercing it to match table schema. So, current behavior will be considered correct. On the other hand if you have altered table schema using {{cascade}} then existing partition schema will also be updated and then partition will be read per this new schema so new column will be read and result set will be as per your second result set with one row with null and other with . Now this is how it *suppose* to work but since we have different code paths for self describing file formats like orc vs others like text if you get different behavior in some corner cases that will be considered bug. > Schema evolution: newly inserted column value in pre-existing partition is > masked to null > - > > Key: HIVE-18137 > URL: https://issues.apache.org/jira/browse/HIVE-18137 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich > > {code} > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set hive.mapred.mode=nonstrict; > set hive.cli.print.header=true; > SET hive.exec.schema.evolution=true; > SET hive.vectorized.use.vectorized.input.format=true; > SET hive.vectorized.use.vector.serde.deserialize=false; > SET hive.vectorized.use.row.serde.deserialize=false; > SET hive.vectorized.execution.enabled=false; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.metastore.disallow.incompatible.col.type.changes=true; > set hive.default.fileformat=textfile; > set hive.llap.io.enabled=false; > CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) > PARTITIONED BY(part INT); > insert into table part_add_int_permute_select partition(part=1) VALUES (1, > , 'new'); > alter table part_add_int_permute_select add columns(c int); > insert into table part_add_int_permute_select partition(part=1) VALUES (2, > , 'new', ); > select insert_num,part,a,b,c from part_add_int_permute_select; > {code} > results for the last select: > {code} > 1 1 new NULL > 2 1 new NULL > {code} > I think the following result should be expected: > {code} > 1 1 new NULL > 2 1 new > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null
[ https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264054#comment-16264054 ] Zoltan Haindrich commented on HIVE-18137: - I've found a bug in my changes; and the results are now going back to the old version; I'm starting to convince myself that the old result is fine; since that column doesn't exists at that particular partition... > Schema evolution: newly inserted column value in pre-existing partition is > masked to null > - > > Key: HIVE-18137 > URL: https://issues.apache.org/jira/browse/HIVE-18137 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich > > {code} > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set hive.mapred.mode=nonstrict; > set hive.cli.print.header=true; > SET hive.exec.schema.evolution=true; > SET hive.vectorized.use.vectorized.input.format=true; > SET hive.vectorized.use.vector.serde.deserialize=false; > SET hive.vectorized.use.row.serde.deserialize=false; > SET hive.vectorized.execution.enabled=false; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.metastore.disallow.incompatible.col.type.changes=true; > set hive.default.fileformat=textfile; > set hive.llap.io.enabled=false; > CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) > PARTITIONED BY(part INT); > insert into table part_add_int_permute_select partition(part=1) VALUES (1, > , 'new'); > alter table part_add_int_permute_select add columns(c int); > insert into table part_add_int_permute_select partition(part=1) VALUES (2, > , 'new', ); > select insert_num,part,a,b,c from part_add_int_permute_select; > {code} > results for the last select: > {code} > 1 1 new NULL > 2 1 new NULL > {code} > I think the following result should be expected: > {code} > 1 1 new NULL > 2 1 new > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null
[ https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263985#comment-16263985 ] Zoltan Haindrich commented on HIVE-18137: - I've done a deep dive in the history but its pretty hard to uncover when this have changed...there are many renames...multiple different versions of the same test (vec/nonvec;part/table;orc/text,x/y) - this is kinda like a matrix test... > Schema evolution: newly inserted column value in pre-existing partition is > masked to null > - > > Key: HIVE-18137 > URL: https://issues.apache.org/jira/browse/HIVE-18137 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich > > {code} > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set hive.mapred.mode=nonstrict; > set hive.cli.print.header=true; > SET hive.exec.schema.evolution=true; > SET hive.vectorized.use.vectorized.input.format=true; > SET hive.vectorized.use.vector.serde.deserialize=false; > SET hive.vectorized.use.row.serde.deserialize=false; > SET hive.vectorized.execution.enabled=false; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.metastore.disallow.incompatible.col.type.changes=true; > set hive.default.fileformat=textfile; > set hive.llap.io.enabled=false; > CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) > PARTITIONED BY(part INT); > insert into table part_add_int_permute_select partition(part=1) VALUES (1, > , 'new'); > alter table part_add_int_permute_select add columns(c int); > insert into table part_add_int_permute_select partition(part=1) VALUES (2, > , 'new', ); > select insert_num,part,a,b,c from part_add_int_permute_select; > {code} > results for the last select: > {code} > 1 1 new NULL > 2 1 new NULL > {code} > I think the following result should be expected: > {code} > 1 1 new NULL > 2 1 new > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null
[ https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263975#comment-16263975 ] Zoltan Haindrich commented on HIVE-18137: - [~ekoifman],[~ashutoshc]: not entirely sure why...but this have somehow changed during one of my changes...I consider it an improvement... I've tried to follow the historythru various refactors; renames/etc I've found an ancient version of this test which do had similar results: https://github.com/apache/hive/blob/2f0339b08b375a1b656a178627600fc26c0a974c/ql/src/test/results/clientpositive/schema_evol_orc_nonvec_mapwork_part.q.out#L136 > Schema evolution: newly inserted column value in pre-existing partition is > masked to null > - > > Key: HIVE-18137 > URL: https://issues.apache.org/jira/browse/HIVE-18137 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich > > {code} > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set hive.mapred.mode=nonstrict; > set hive.cli.print.header=true; > SET hive.exec.schema.evolution=true; > SET hive.vectorized.use.vectorized.input.format=true; > SET hive.vectorized.use.vector.serde.deserialize=false; > SET hive.vectorized.use.row.serde.deserialize=false; > SET hive.vectorized.execution.enabled=false; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.metastore.disallow.incompatible.col.type.changes=true; > set hive.default.fileformat=textfile; > set hive.llap.io.enabled=false; > CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) > PARTITIONED BY(part INT); > insert into table part_add_int_permute_select partition(part=1) VALUES (1, > , 'new'); > alter table part_add_int_permute_select add columns(c int); > insert into table part_add_int_permute_select partition(part=1) VALUES (2, > , 'new', ); > select insert_num,part,a,b,c from part_add_int_permute_select; > {code} > results for the last select: > {code} > 1 1 new NULL > 2 1 new NULL > {code} > I think the following result should be expected: > {code} > 1 1 new NULL > 2 1 new > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)