[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null

2017-11-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264656#comment-16264656
 ] 

Ashutosh Chauhan commented on HIVE-18137:
-

Following rules suppose to be followed for schema evolution. 
* Partitions when they are created get their schema as current table schema.
* There is no way to alter partition schema.
* Except via {{cascade}} which is suppose to alter schema of all partitions so 
that they get same schema as current table schema.
* At query time, data is read per schema of table. Partitions will be read with 
their own schema and then coerced into table schema.

Keeping in mind above, in your example since partition schema is not altered, 
partition will be read per its old schema which means even if you insert new 
columns, since partition schema doesn't know about it, new columns will be 
ignored while reading partition. But since table schema contains it, we will 
add NULL for it after partition has been read and while coercing it to match 
table schema. So, current behavior will be considered correct.
On the other hand if you have altered table schema using {{cascade}} then 
existing partition schema will also be updated and then partition will be read 
per this new schema so new column will be read and result set will be as per 
your second result set with one row with null and other with .

Now this is how it *suppose* to work but since we have different code paths for 
self describing file formats like orc vs others like text if you get different 
behavior in some corner cases that will be considered bug.

> Schema evolution: newly inserted column value in pre-existing partition is 
> masked to null
> -
>
> Key: HIVE-18137
> URL: https://issues.apache.org/jira/browse/HIVE-18137
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>
> {code}
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.mapred.mode=nonstrict;
> set hive.cli.print.header=true;
> SET hive.exec.schema.evolution=true;
> SET hive.vectorized.use.vectorized.input.format=true;
> SET hive.vectorized.use.vector.serde.deserialize=false;
> SET hive.vectorized.use.row.serde.deserialize=false;
> SET hive.vectorized.execution.enabled=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.metastore.disallow.incompatible.col.type.changes=true;
> set hive.default.fileformat=textfile;
> set hive.llap.io.enabled=false;
> CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) 
> PARTITIONED BY(part INT);
> insert into table part_add_int_permute_select partition(part=1) VALUES (1, 
> , 'new');
> alter table part_add_int_permute_select add columns(c int);
> insert into table part_add_int_permute_select partition(part=1) VALUES (2, 
> , 'new', );
> select insert_num,part,a,b,c from part_add_int_permute_select;
> {code}
> results for the last select:
> {code}
> 1  1   new NULL
> 2  1   new NULL
> {code}
> I think the following result should be expected:
> {code}
> 1  1   new NULL
> 2  1   new 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null

2017-11-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264054#comment-16264054
 ] 

Zoltan Haindrich commented on HIVE-18137:
-

I've found a bug in my changes; and the results are now going back to the old 
version;
I'm starting to convince myself that the old result is fine; since that column 
doesn't exists at that particular partition...

> Schema evolution: newly inserted column value in pre-existing partition is 
> masked to null
> -
>
> Key: HIVE-18137
> URL: https://issues.apache.org/jira/browse/HIVE-18137
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>
> {code}
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.mapred.mode=nonstrict;
> set hive.cli.print.header=true;
> SET hive.exec.schema.evolution=true;
> SET hive.vectorized.use.vectorized.input.format=true;
> SET hive.vectorized.use.vector.serde.deserialize=false;
> SET hive.vectorized.use.row.serde.deserialize=false;
> SET hive.vectorized.execution.enabled=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.metastore.disallow.incompatible.col.type.changes=true;
> set hive.default.fileformat=textfile;
> set hive.llap.io.enabled=false;
> CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) 
> PARTITIONED BY(part INT);
> insert into table part_add_int_permute_select partition(part=1) VALUES (1, 
> , 'new');
> alter table part_add_int_permute_select add columns(c int);
> insert into table part_add_int_permute_select partition(part=1) VALUES (2, 
> , 'new', );
> select insert_num,part,a,b,c from part_add_int_permute_select;
> {code}
> results for the last select:
> {code}
> 1  1   new NULL
> 2  1   new NULL
> {code}
> I think the following result should be expected:
> {code}
> 1  1   new NULL
> 2  1   new 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null

2017-11-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263985#comment-16263985
 ] 

Zoltan Haindrich commented on HIVE-18137:
-

I've done a deep dive in the history but its pretty hard to uncover when this 
have changed...there are many renames...multiple different versions of the same 
test (vec/nonvec;part/table;orc/text,x/y) - this is kinda like a matrix test...

> Schema evolution: newly inserted column value in pre-existing partition is 
> masked to null
> -
>
> Key: HIVE-18137
> URL: https://issues.apache.org/jira/browse/HIVE-18137
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>
> {code}
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.mapred.mode=nonstrict;
> set hive.cli.print.header=true;
> SET hive.exec.schema.evolution=true;
> SET hive.vectorized.use.vectorized.input.format=true;
> SET hive.vectorized.use.vector.serde.deserialize=false;
> SET hive.vectorized.use.row.serde.deserialize=false;
> SET hive.vectorized.execution.enabled=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.metastore.disallow.incompatible.col.type.changes=true;
> set hive.default.fileformat=textfile;
> set hive.llap.io.enabled=false;
> CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) 
> PARTITIONED BY(part INT);
> insert into table part_add_int_permute_select partition(part=1) VALUES (1, 
> , 'new');
> alter table part_add_int_permute_select add columns(c int);
> insert into table part_add_int_permute_select partition(part=1) VALUES (2, 
> , 'new', );
> select insert_num,part,a,b,c from part_add_int_permute_select;
> {code}
> results for the last select:
> {code}
> 1  1   new NULL
> 2  1   new NULL
> {code}
> I think the following result should be expected:
> {code}
> 1  1   new NULL
> 2  1   new 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null

2017-11-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263975#comment-16263975
 ] 

Zoltan Haindrich commented on HIVE-18137:
-

[~ekoifman],[~ashutoshc]: not entirely sure why...but this have somehow changed 
during one of my changes...I consider it an improvement...

I've tried to follow the historythru various refactors; renames/etc I've 
found an ancient version of this test which do had similar results:
https://github.com/apache/hive/blob/2f0339b08b375a1b656a178627600fc26c0a974c/ql/src/test/results/clientpositive/schema_evol_orc_nonvec_mapwork_part.q.out#L136


> Schema evolution: newly inserted column value in pre-existing partition is 
> masked to null
> -
>
> Key: HIVE-18137
> URL: https://issues.apache.org/jira/browse/HIVE-18137
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>
> {code}
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.mapred.mode=nonstrict;
> set hive.cli.print.header=true;
> SET hive.exec.schema.evolution=true;
> SET hive.vectorized.use.vectorized.input.format=true;
> SET hive.vectorized.use.vector.serde.deserialize=false;
> SET hive.vectorized.use.row.serde.deserialize=false;
> SET hive.vectorized.execution.enabled=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.metastore.disallow.incompatible.col.type.changes=true;
> set hive.default.fileformat=textfile;
> set hive.llap.io.enabled=false;
> CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) 
> PARTITIONED BY(part INT);
> insert into table part_add_int_permute_select partition(part=1) VALUES (1, 
> , 'new');
> alter table part_add_int_permute_select add columns(c int);
> insert into table part_add_int_permute_select partition(part=1) VALUES (2, 
> , 'new', );
> select insert_num,part,a,b,c from part_add_int_permute_select;
> {code}
> results for the last select:
> {code}
> 1  1   new NULL
> 2  1   new NULL
> {code}
> I think the following result should be expected:
> {code}
> 1  1   new NULL
> 2  1   new 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)