[jira] [Updated] (HIVE-16291) Hive fails when unions a parquet table with itself

2017-04-07 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16291:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Yibing for the work.

> Hive fails when unions a parquet table with itself
> --
>
> Key: HIVE-16291
> URL: https://issues.apache.org/jira/browse/HIVE-16291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Fix For: 3.0.0
>
> Attachments: HIVE-16291.1.patch, HIVE-16291.2.patch
>
>
> Reproduce commands:
> {code:sql}
> create table tst_unin (col1 int) partitioned by (p_tdate int) stored as 
> parquet;
> insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
> insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
> select count(*) from (select tst_unin.p_tdate from tst_unin where 
> tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1;
> {code}
> The table is stored in Parquet format, which is a columnar file format. Hive 
> tries to push the query predicates to the table scan operators so that only 
> the needed columns are read. This is done by adding the needed column IDs 
> into job configuration with property "hive.io.file.readcolumn.ids".
> In above case, the query unions the result of 2 subqueries, which select data 
> from one same table. The first subquery doesn't need any column from Parquet 
> file, while the second subquery needs a column "col1". Hive has a bug here, 
> it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which 
> method ColumnProjectionUtils.getReadColumnIDs cannot parse.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16291) Hive fails when unions a parquet table with itself

2017-04-05 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-16291:
--
Attachment: HIVE-16291.2.patch

> Hive fails when unions a parquet table with itself
> --
>
> Key: HIVE-16291
> URL: https://issues.apache.org/jira/browse/HIVE-16291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-16291.1.patch, HIVE-16291.2.patch
>
>
> Reproduce commands:
> {code:sql}
> create table tst_unin (col1 int) partitioned by (p_tdate int) stored as 
> parquet;
> insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
> insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
> select count(*) from (select tst_unin.p_tdate from tst_unin where 
> tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1;
> {code}
> The table is stored in Parquet format, which is a columnar file format. Hive 
> tries to push the query predicates to the table scan operators so that only 
> the needed columns are read. This is done by adding the needed column IDs 
> into job configuration with property "hive.io.file.readcolumn.ids".
> In above case, the query unions the result of 2 subqueries, which select data 
> from one same table. The first subquery doesn't need any column from Parquet 
> file, while the second subquery needs a column "col1". Hive has a bug here, 
> it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which 
> method ColumnProjectionUtils.getReadColumnIDs cannot parse.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16291) Hive fails when unions a parquet table with itself

2017-03-31 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16291:

Description: 
Reproduce commands:

{code:sql}
create table tst_unin (col1 int) partitioned by (p_tdate int) stored as parquet;
insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
select count(*) from (select tst_unin.p_tdate from tst_unin where 
tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1;
{code}

The table is stored in Parquet format, which is a columnar file format. Hive 
tries to push the query predicates to the table scan operators so that only the 
needed columns are read. This is done by adding the needed column IDs into job 
configuration with property "hive.io.file.readcolumn.ids".

In above case, the query unions the result of 2 subqueries, which select data 
from one same table. The first subquery doesn't need any column from Parquet 
file, while the second subquery needs a column "col1". Hive has a bug here, it 
finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which method 
ColumnProjectionUtils.getReadColumnIDs cannot parse.


  was:
Reproduce commands:

{code:sql}
create table tst_unin (col1 int) partitioned by (p_tdate int) stored as parquet;
insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
select count(*) from (select tst_unin.p_tdate from tst_unin union all select 
tst_unin.p_tdate from tst_unin where tst_unin.col1=20160302) t1;
{code}

The table is stored in Parquet format, which is a columnar file format. Hive 
tries to push the query predicates to the table scan operators so that only the 
needed columns are read. This is done by adding the needed column IDs into job 
configuration with property "hive.io.file.readcolumn.ids".

In above case, the query unions the result of 2 subqueries, which select data 
from one same table. The first subquery doesn't need any column from Parquet 
file, while the second subquery needs a column "col1". Hive has a bug here, it 
finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which method 
ColumnProjectionUtils.getReadColumnIDs cannot parse.



> Hive fails when unions a parquet table with itself
> --
>
> Key: HIVE-16291
> URL: https://issues.apache.org/jira/browse/HIVE-16291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-16291.1.patch
>
>
> Reproduce commands:
> {code:sql}
> create table tst_unin (col1 int) partitioned by (p_tdate int) stored as 
> parquet;
> insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
> insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
> select count(*) from (select tst_unin.p_tdate from tst_unin where 
> tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1;
> {code}
> The table is stored in Parquet format, which is a columnar file format. Hive 
> tries to push the query predicates to the table scan operators so that only 
> the needed columns are read. This is done by adding the needed column IDs 
> into job configuration with property "hive.io.file.readcolumn.ids".
> In above case, the query unions the result of 2 subqueries, which select data 
> from one same table. The first subquery doesn't need any column from Parquet 
> file, while the second subquery needs a column "col1". Hive has a bug here, 
> it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which 
> method ColumnProjectionUtils.getReadColumnIDs cannot parse.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16291) Hive fails when unions a parquet table with itself

2017-03-24 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-16291:
--
Assignee: Yibing Shi
  Status: Patch Available  (was: Open)

> Hive fails when unions a parquet table with itself
> --
>
> Key: HIVE-16291
> URL: https://issues.apache.org/jira/browse/HIVE-16291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-16291.1.patch
>
>
> Reproduce commands:
> {code:sql}
> create table tst_unin (col1 int) partitioned by (p_tdate int) stored as 
> parquet;
> insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
> insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
> select count(*) from (select tst_unin.p_tdate from tst_unin union all select 
> tst_unin.p_tdate from tst_unin where tst_unin.col1=20160302) t1;
> {code}
> The table is stored in Parquet format, which is a columnar file format. Hive 
> tries to push the query predicates to the table scan operators so that only 
> the needed columns are read. This is done by adding the needed column IDs 
> into job configuration with property "hive.io.file.readcolumn.ids".
> In above case, the query unions the result of 2 subqueries, which select data 
> from one same table. The first subquery doesn't need any column from Parquet 
> file, while the second subquery needs a column "col1". Hive has a bug here, 
> it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which 
> method ColumnProjectionUtils.getReadColumnIDs cannot parse.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16291) Hive fails when unions a parquet table with itself

2017-03-24 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-16291:
--
Attachment: HIVE-16291.1.patch

> Hive fails when unions a parquet table with itself
> --
>
> Key: HIVE-16291
> URL: https://issues.apache.org/jira/browse/HIVE-16291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Yibing Shi
> Attachments: HIVE-16291.1.patch
>
>
> Reproduce commands:
> {code:sql}
> create table tst_unin (col1 int) partitioned by (p_tdate int) stored as 
> parquet;
> insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
> insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
> select count(*) from (select tst_unin.p_tdate from tst_unin union all select 
> tst_unin.p_tdate from tst_unin where tst_unin.col1=20160302) t1;
> {code}
> The table is stored in Parquet format, which is a columnar file format. Hive 
> tries to push the query predicates to the table scan operators so that only 
> the needed columns are read. This is done by adding the needed column IDs 
> into job configuration with property "hive.io.file.readcolumn.ids".
> In above case, the query unions the result of 2 subqueries, which select data 
> from one same table. The first subquery doesn't need any column from Parquet 
> file, while the second subquery needs a column "col1". Hive has a bug here, 
> it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which 
> method ColumnProjectionUtils.getReadColumnIDs cannot parse.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)