[jira] [Updated] (HIVE-4367) enhance TRUNCATE syntax to drop data of external table

2018-06-28 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-4367:
--
Summary: enhance TRUNCATE syntax to drop data of external table  (was: 
enhance  TRUNCATE syntax  to drop data of external table)

> enhance TRUNCATE syntax to drop data of external table
> --
>
> Key: HIVE-4367
> URL: https://issues.apache.org/jira/browse/HIVE-4367
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: caofangkun
>Assignee: caofangkun
>Priority: Minor
> Attachments: HIVE-4367-1.patch, HIVE-4367.2.patch.txt, 
> HIVE-4367.3.patch, HIVE-4367.4.patch, HIVE-4367.5.patch, HIVE-4367.6.patch
>
>
> In my use case ,
> sometimes I have to remove data of external tables to free up storage space 
> of the cluster .
> So it's necessary for to enhance the syntax like 
> "TRUNCATE TABLE srcpart_truncate PARTITION (dt='201130412') FORCE;"
> to remove data from EXTERNAL table.
> And I add a configuration property to enable remove data to Trash 
> 
>   hive.truncate.skiptrash
>   false
>   
>  if true will remove data to trash, else false drop data immediately
>   
> 
> For example :
> hive (default)> TRUNCATE TABLE external1 partition (ds='11'); 
> FAILED: Error in semantic analysis: Cannot truncate non-managed table 
> external1
> hive (default)> TRUNCATE TABLE external1 partition (ds='11') FORCE;
> [2013-04-16 17:15:52]: Compile Start 
> [2013-04-16 17:15:52]: Compile End
> [2013-04-16 17:15:52]: OK
> [2013-04-16 17:15:52]: Time taken: 0.413 seconds
> hive (default)> set hive.truncate.skiptrash;
> hive.truncate.skiptrash=false
> hive (default)> set hive.truncate.skiptrash=true; 
> hive (default)> TRUNCATE TABLE external1 partition (ds='12') FORCE;
> [2013-04-16 17:16:21]: Compile Start 
> [2013-04-16 17:16:21]: Compile End
> [2013-04-16 17:16:21]: OK
> [2013-04-16 17:16:21]: Time taken: 0.143 seconds
> hive (default)> dfs -ls /user/test/.Trash/Current/; 
> Found 1 items
> drwxr-xr-x -test supergroup 0 2013-04-16 17:06 /user/test/.Trash/Current/ds=11



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-4367) enhance TRUNCATE syntax to drop data of external table

2018-06-28 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-4367:
--
Summary: enhance  TRUNCATE syntax  to drop data of external table  (was: 
enhance  TRUNCATE syntex  to drop data of external table)

> enhance  TRUNCATE syntax  to drop data of external table
> 
>
> Key: HIVE-4367
> URL: https://issues.apache.org/jira/browse/HIVE-4367
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: caofangkun
>Assignee: caofangkun
>Priority: Minor
> Attachments: HIVE-4367-1.patch, HIVE-4367.2.patch.txt, 
> HIVE-4367.3.patch, HIVE-4367.4.patch, HIVE-4367.5.patch, HIVE-4367.6.patch
>
>
> In my use case ,
> sometimes I have to remove data of external tables to free up storage space 
> of the cluster .
> So it's necessary for to enhance the syntax like 
> "TRUNCATE TABLE srcpart_truncate PARTITION (dt='201130412') FORCE;"
> to remove data from EXTERNAL table.
> And I add a configuration property to enable remove data to Trash 
> 
>   hive.truncate.skiptrash
>   false
>   
>  if true will remove data to trash, else false drop data immediately
>   
> 
> For example :
> hive (default)> TRUNCATE TABLE external1 partition (ds='11'); 
> FAILED: Error in semantic analysis: Cannot truncate non-managed table 
> external1
> hive (default)> TRUNCATE TABLE external1 partition (ds='11') FORCE;
> [2013-04-16 17:15:52]: Compile Start 
> [2013-04-16 17:15:52]: Compile End
> [2013-04-16 17:15:52]: OK
> [2013-04-16 17:15:52]: Time taken: 0.413 seconds
> hive (default)> set hive.truncate.skiptrash;
> hive.truncate.skiptrash=false
> hive (default)> set hive.truncate.skiptrash=true; 
> hive (default)> TRUNCATE TABLE external1 partition (ds='12') FORCE;
> [2013-04-16 17:16:21]: Compile Start 
> [2013-04-16 17:16:21]: Compile End
> [2013-04-16 17:16:21]: OK
> [2013-04-16 17:16:21]: Time taken: 0.143 seconds
> hive (default)> dfs -ls /user/test/.Trash/Current/; 
> Found 1 items
> drwxr-xr-x -test supergroup 0 2013-04-16 17:06 /user/test/.Trash/Current/ds=11



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15190) Field names are not preserved in ORC files written with ACID

2018-06-14 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513065#comment-16513065
 ] 

Anthony Hsu commented on HIVE-15190:


[~prasanth_j], thank you for the review. I won't have time in the near future 
to work on this, so please feel free to take this up.

> Field names are not preserved in ORC files written with ACID
> 
>
> Key: HIVE-15190
> URL: https://issues.apache.org/jira/browse/HIVE-15190
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0, 3.1.0, 4.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Critical
> Attachments: HIVE-15190.1.patch, HIVE-15190.2.patch
>
>
> To repro:
> {noformat}
> drop table if exists orc_nonacid;
> drop table if exists orc_acid;
> create table orc_nonacid (a int) clustered by (a) into 2 buckets stored as 
> orc;
> create table orc_acid (a int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='true');
> insert into table orc_nonacid values(1), (2);
> insert into table orc_acid values(1), (2);
> {noformat}
> Running {{hive --service orcfiledump }} on the files created by the 
> {{insert}} statements above, you'll see that for {{orc_nonacid}}, the files 
> have schema {{struct}} whereas for {{orc_acid}}, the files have schema 
> {{struct>}}.
>  The last field {{row}} should have schema {{struct}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-12414) ALTER TABLE UNSET SERDEPROPERTIES does not work

2018-04-24 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450272#comment-16450272
 ] 

Anthony Hsu commented on HIVE-12414:


Currently, the only way to drop SERDEPROPERTIES is to use the Thrift API 
directly (e.g.: using the MetaStoreClient via Java code).

> ALTER TABLE UNSET SERDEPROPERTIES does not work
> ---
>
> Key: HIVE-12414
> URL: https://issues.apache.org/jira/browse/HIVE-12414
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, SQL
>Affects Versions: 1.1.1
>Reporter: Lenni Kuff
>Assignee: Reuben Kuhnert
>Priority: Major
>  Labels: newbie
>
> alter table tablename set tblproperties ('key'='value')  => works as expected
> alter table tablename unset tblproperties ('key')  => works as expected
> alter table tablename set serdeproperties ('key'='value')  => works as 
> expected
> alter table tablename unset serdeproperties ('key')  => not supported
> FAILED: ParseException line 1:28 mismatched input 'serdeproperties' expecting 
> TBLPROPERTIES near 'unset' in alter properties statement



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-12414) ALTER TABLE UNSET SERDEPROPERTIES does not work

2018-04-24 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-12414:
---
Summary: ALTER TABLE UNSET SERDEPROPERTIES does not work  (was: ALTER TABLE 
UNSET SERDEPROPERTY does not work)

> ALTER TABLE UNSET SERDEPROPERTIES does not work
> ---
>
> Key: HIVE-12414
> URL: https://issues.apache.org/jira/browse/HIVE-12414
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, SQL
>Affects Versions: 1.1.1
>Reporter: Lenni Kuff
>Assignee: Reuben Kuhnert
>Priority: Major
>  Labels: newbie
>
> alter table tablename set tblproperties ('key'='value')  => works as expected
> alter table tablename unset tblproperties ('key')  => works as expected
> alter table tablename set serdeproperties ('key'='value')  => works as 
> expected
> alter table tablename unset serdeproperties ('key')  => not supported
> FAILED: ParseException line 1:28 mismatched input 'serdeproperties' expecting 
> TBLPROPERTIES near 'unset' in alter properties statement



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-27 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379649#comment-16379649
 ] 

Anthony Hsu commented on HIVE-18695:


+1 (non-binding – I am not a Hive committer) on your patch, [~kgyrtkirk]

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-18695.01.patch
>
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-27 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379647#comment-16379647
 ] 

Anthony Hsu commented on HIVE-18695:


[~elserj]
{quote}bq.[~erwaman], I'm confused on the phrasing: does HIVE-18802 apply both 
before and after HIVE-15680? I think you're saying that it was an existing 
problem, but it would also fix the test failures that HIVE-15680 caused?
{quote}
Yes, the bug reported in HIVE-18802 happens with or without the changes made in 
HIVE-15680. I take back what I said about fixing HIVE-18802 also fixing 
HIVE-15680. I think they are related but slightly different bugs. With 
HIVE-18802, I think there's something funky going on with the way 
AccumuloStorageHandler is handling predicate pushdown.

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-18695.01.patch
>
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-26 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377392#comment-16377392
 ] 

Anthony Hsu commented on HIVE-18695:


[~kgyrtkirk], I didn't see an easy way to fix this test without breaking 
HIVE-15680, so for now, I think the easiest solution is to just revert the 
entire HIVE-15680 patch.

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-25 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376402#comment-16376402
 ] 

Anthony Hsu edited comment on HIVE-18695 at 2/26/18 5:50 AM:
-

Hi [~kgyrtkirk], I investigated this further and the change in HIVE-15680 that 
broke accumulo_queries.q is
{noformat}
// disable filter pushdown for mapreduce when there are more than one table 
aliases,
// since we don't clone jobConf per alias
if (mrwork != null && mrwork.getAliases() != null && mrwork.getAliases().size() 
> 1 &&
  jobConf.get(ConfVars.HIVE_EXECUTION_ENGINE.varname).equals("mr")) {
  return;
}{noformat}
In the case of the Accumulo CliDriver test, the execution engine is set to 
"mr", so the "return" here is triggered, and then the subsequent code that sets 
the filter expressions in the JobConf is not executed.

However, though removing the above fixes the test, I found a more serious 
problem that exists with or without the above. If the same Accumulo table is 
referenced multiple times in the same query, you get very strange results. 
Here's an example:
{noformat}
DROP TABLE accumulo_test;
CREATE TABLE accumulo_test(key int, value int)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,cf:string")
TBLPROPERTIES ("accumulo.table.name" = "accumulo_table_0");

INSERT OVERWRITE TABLE accumulo_test VALUES (0,0), (1,1), (2,2), (3,3);

SELECT * from accumulo_test where key == 1 union all select * from 
accumulo_test where key == 2;{noformat}
The expected output is
{code:java}
1 1
2 2{code}
but the actual output is
{code:java}
1  0
1  1
1  2
1  3
2  0
2  1
2  2
2  3
{code}
I've filed a separate ticket for this issue: HIVE-18802. I think a fix for this 
issue would also fix HIVE-15680, but for now, you can revert HIVE-15680.


was (Author: erwaman):
Hi [~kgyrtkirk], I investigated this further and the change in HIVE-15680 that 
broke accumulo_queries.q is
{noformat}
// disable filter pushdown for mapreduce when there are more than one table 
aliases,
// since we don't clone jobConf per alias
if (mrwork != null && mrwork.getAliases() != null && mrwork.getAliases().size() 
> 1 &&
  jobConf.get(ConfVars.HIVE_EXECUTION_ENGINE.varname).equals("mr")) {
  return;
}{noformat}
In the case of the Accumulo CliDriver test, the execution engine is set to 
"mr", so the "return" here is triggered, and then the subsequent code that sets 
the filter expressions on the TableScanDesc is not triggered.

However, though removing the above fixes the test, I found a more serious 
problem that exists with or without the above. If the same Accumulo table is 
referenced multiple times in the same query, you get very strange results. 
Here's an example:
{noformat}
DROP TABLE accumulo_test;
CREATE TABLE accumulo_test(key int, value int)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,cf:string")
TBLPROPERTIES ("accumulo.table.name" = "accumulo_table_0");

INSERT OVERWRITE TABLE accumulo_test VALUES (0,0), (1,1), (2,2), (3,3);

SELECT * from accumulo_test where key == 1 union all select * from 
accumulo_test where key == 2;{noformat}
The expected output is
{code:java}
1 1
2 2{code}
but the actual output is
{code:java}
1  0
1  1
1  2
1  3
2  0
2  1
2  2
2  3
{code}
I've filed a separate ticket for this issue: HIVE-18802. I think a fix for this 
issue would also fix HIVE-15680, but for now, you can revert HIVE-15680.

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-25 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376402#comment-16376402
 ] 

Anthony Hsu edited comment on HIVE-18695 at 2/26/18 5:47 AM:
-

Hi [~kgyrtkirk], I investigated this further and the change in HIVE-15680 that 
broke accumulo_queries.q is
{noformat}
// disable filter pushdown for mapreduce when there are more than one table 
aliases,
// since we don't clone jobConf per alias
if (mrwork != null && mrwork.getAliases() != null && mrwork.getAliases().size() 
> 1 &&
  jobConf.get(ConfVars.HIVE_EXECUTION_ENGINE.varname).equals("mr")) {
  return;
}{noformat}
In the case of the Accumulo CliDriver test, the execution engine is set to 
"mr", so the "return" here is triggered, and then the subsequent code that sets 
the filter expressions on the TableScanDesc is not triggered.

However, though removing the above fixes the test, I found a more serious 
problem that exists with or without the above. If the same Accumulo table is 
referenced multiple times in the same query, you get very strange results. 
Here's an example:
{noformat}
DROP TABLE accumulo_test;
CREATE TABLE accumulo_test(key int, value int)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,cf:string")
TBLPROPERTIES ("accumulo.table.name" = "accumulo_table_0");

INSERT OVERWRITE TABLE accumulo_test VALUES (0,0), (1,1), (2,2), (3,3);

SELECT * from accumulo_test where key == 1 union all select * from 
accumulo_test where key == 2;{noformat}
The expected output is
{code:java}
1 1
2 2{code}
but the actual output is
{code:java}
1  0
1  1
1  2
1  3
2  0
2  1
2  2
2  3
{code}
I've filed a separate ticket for this issue: HIVE-18802. I think a fix for this 
issue would also fix HIVE-15680, but for now, you can revert HIVE-15680.


was (Author: erwaman):
Hi [~kgyrtkirk], I investigated this further and the change in HIVE-15680 that 
broke accumulo_queries.q is
{noformat}
// disable filter pushdown for mapreduce when there are more than one table 
aliases,
// since we don't clone jobConf per alias
if (mrwork != null && mrwork.getAliases() != null && mrwork.getAliases().size() 
> 1 &&
  jobConf.get(ConfVars.HIVE_EXECUTION_ENGINE.varname).equals("mr")) {
  return;
}{noformat}
In the case of the Accumulo CliDriver test, the execution engine is set to 
"mr", so the "return" here is triggered, and then the subsequent code that sets 
the filter expressions on the TableScanDesc is not triggered.

However, though removing the above fixes the test, I found a more serious 
problem that exists with or without the above. If the same Accumulo table is 
referenced multiple times in the same query, you get very strange results. 
Here's an example:
{noformat}
DROP TABLE accumulo_test;
CREATE TABLE accumulo_test(key int, value int)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,cf:string")
TBLPROPERTIES ("accumulo.table.name" = "accumulo_table_0");

INSERT OVERWRITE TABLE accumulo_test VALUES (0,0), (1,1), (2,2), (3,3);

SELECT * from accumulo_test where key == 1 union all select * from 
accumulo_test where key == 2;{noformat}
The expected output is
{code:java}
1 1
2 2{code}
but the actual output is
{code:java}
1  0
1  1
1  2
1  3
2  0
2  1
2  2
2  3
{code}
I'll file a separate ticket for this issue. I think a fix for this issue would 
also fix HIVE-15680, but for now, you can revert HIVE-15680.

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-25 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376402#comment-16376402
 ] 

Anthony Hsu edited comment on HIVE-18695 at 2/26/18 5:34 AM:
-

Hi [~kgyrtkirk], I investigated this further and the change in HIVE-15680 that 
broke accumulo_queries.q is
{noformat}
// disable filter pushdown for mapreduce when there are more than one table 
aliases,
// since we don't clone jobConf per alias
if (mrwork != null && mrwork.getAliases() != null && mrwork.getAliases().size() 
> 1 &&
  jobConf.get(ConfVars.HIVE_EXECUTION_ENGINE.varname).equals("mr")) {
  return;
}{noformat}
In the case of the Accumulo CliDriver test, the execution engine is set to 
"mr", so the "return" here is triggered, and then the subsequent code that sets 
the filter expressions on the TableScanDesc is not triggered.

However, though removing the above fixes the test, I found a more serious 
problem that exists with or without the above. If the same Accumulo table is 
referenced multiple times in the same query, you get very strange results. 
Here's an example:
{noformat}
DROP TABLE accumulo_test;
CREATE TABLE accumulo_test(key int, value int)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,cf:string")
TBLPROPERTIES ("accumulo.table.name" = "accumulo_table_0");

INSERT OVERWRITE TABLE accumulo_test VALUES (0,0), (1,1), (2,2), (3,3);

SELECT * from accumulo_test where key == 1 union all select * from 
accumulo_test where key == 2;{noformat}
The expected output is
{code:java}
1 1
2 2{code}
but the actual output is
{code:java}
1  0
1  1
1  2
1  3
2  0
2  1
2  2
2  3
{code}
I'll file a separate ticket for this issue. I think a fix for this issue would 
also fix HIVE-15680, but for now, you can revert HIVE-15680.


was (Author: erwaman):
Hi [~kgyrtkirk], I investigated this further and the change in HIVE-15680 that 
broke accumulo_queries.q is
{noformat}
// disable filter pushdown for mapreduce when there are more than one table 
aliases,
// since we don't clone jobConf per alias
if (mrwork != null && mrwork.getAliases() != null && mrwork.getAliases().size() 
> 1 &&
  jobConf.get(ConfVars.HIVE_EXECUTION_ENGINE.varname).equals("mr")) {
  return;
}{noformat}
In the case of the Accumulo CliDriver test, the execution engine is set to 
"mr", so the "return" here is triggered, and then the subsequent code that sets 
the filter expressions on the TableScanDesc is not triggered.

However, though removing the above fixes the test, I found a more serious 
problem that still remains. If the same Accumulo table is referenced multiple 
times in the same query, you get very strange results. Here's an example:
{noformat}
DROP TABLE accumulo_test;
CREATE TABLE accumulo_test(key int, value int)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,cf:string")
TBLPROPERTIES ("accumulo.table.name" = "accumulo_table_0");

INSERT OVERWRITE TABLE accumulo_test VALUES (0,0), (1,1), (2,2), (3,3);

SELECT * from accumulo_test where key == 1 union all select * from 
accumulo_test where key == 2;{noformat}
The expected output is
{code:java}
1 1
2 2{code}
but the actual output is
{code:java}
1  0
1  1
1  2
1  3
2  0
2  1
2  2
2  3
{code}
I'll file a separate ticket for this issue. I think a fix for this issue would 
also fix HIVE-15680, but for now, you can revert HIVE-15680.

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-25 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376402#comment-16376402
 ] 

Anthony Hsu commented on HIVE-18695:


Hi [~kgyrtkirk], I investigated this further and the change in HIVE-15680 that 
broke accumulo_queries.q is
{noformat}
// disable filter pushdown for mapreduce when there are more than one table 
aliases,
// since we don't clone jobConf per alias
if (mrwork != null && mrwork.getAliases() != null && mrwork.getAliases().size() 
> 1 &&
  jobConf.get(ConfVars.HIVE_EXECUTION_ENGINE.varname).equals("mr")) {
  return;
}{noformat}
In the case of the Accumulo CliDriver test, the execution engine is set to 
"mr", so the "return" here is triggered, and then the subsequent code that sets 
the filter expressions on the TableScanDesc is not triggered.

However, though removing the above fixes the test, I found a more serious 
problem that still remains. If the same Accumulo table is referenced multiple 
times in the same query, you get very strange results. Here's an example:
{noformat}
DROP TABLE accumulo_test;
CREATE TABLE accumulo_test(key int, value int)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,cf:string")
TBLPROPERTIES ("accumulo.table.name" = "accumulo_table_0");

INSERT OVERWRITE TABLE accumulo_test VALUES (0,0), (1,1), (2,2), (3,3);

SELECT * from accumulo_test where key == 1 union all select * from 
accumulo_test where key == 2;{noformat}
The expected output is
{code:java}
1 1
2 2{code}
but the actual output is
{code:java}
1  0
1  1
1  2
1  3
2  0
2  1
2  2
2  3
{code}
I'll file a separate ticket for this issue. I think a fix for this issue would 
also fix HIVE-15680, but for now, you can revert HIVE-15680.

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-22 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373878#comment-16373878
 ] 

Anthony Hsu commented on HIVE-18695:


[~kgyrtkirk], thanks for sharing that link. I'll take a closer look this 
weekend and try to fix the problem.

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-21 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372284#comment-16372284
 ] 

Anthony Hsu commented on HIVE-18695:


[~kgyrtkirk], can you just run the tests after reverting HIVE-15680 to see if 
that fixes the tests?

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18695) fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]

2018-02-13 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363282#comment-16363282
 ] 

Anthony Hsu commented on HIVE-18695:


Hi [~kgyrtkirk], could you include an example stack trace of or a link to a 
test failure?

> fix TestAccumuloCliDriver.testCliDriver[accumulo_queries]
> -
>
> Key: HIVE-18695
> URL: https://issues.apache.org/jira/browse/HIVE-18695
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> seems to be broken by HIVE-15680



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-02-08 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch, HIVE-15353.5.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-02-08 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357352#comment-16357352
 ] 

Anthony Hsu commented on HIVE-15353:


[~pvary], thanks for the update! In that case, I think this ticket is no longer 
needed, as the test you added in HIVE-18509 proves this is no longer a problem. 
Will resolve this ticket.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch, HIVE-15353.5.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-02-07 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355614#comment-16355614
 ] 

Anthony Hsu commented on HIVE-15353:


Hi [~pvary],

Thanks for the review and comments.

I tested with Hive 1.1.0, and in that version, if one used the MetaStoreClient 
API directly, one could set a {{null}} {{cols}} value and get that saved into 
the metastore database. Using the CLIDriver API, {{cols}} always gets set, so 
this isn't a problem.

I will test the trunk version of Hive to see if the MetaStoreClient API still 
lets you set {{null cols}}. If so, we should fix that.

Anthony

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch, HIVE-15353.5.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-02-06 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353989#comment-16353989
 ] 

Anthony Hsu commented on HIVE-15353:


Looks like the PreCommit build is currently broken: 
[https://builds.apache.org/job/PreCommit-HIVE-Build/9046/console]

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch, HIVE-15353.5.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-02-05 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Patch Available  (was: Open)

Attached updated patch after rebasing on HEAD. Also updated RB: 
https://reviews.apache.org/r/54341/

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 1.1.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch, HIVE-15353.5.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-02-05 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Attachment: HIVE-15353.5.patch

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch, HIVE-15353.5.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2018-01-26 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341987#comment-16341987
 ] 

Anthony Hsu commented on HIVE-15680:


[~prasanth_j], [~niklaus.xiao]: thanks for the comments on the review. You guys 
are right – there seem to be correctness bugs in my patch; I think 
[~prasanth_j]'s patch to disable pushing filters for MR is the better approach 
for now. Thanks for fixing, [~prasanth_j]!

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch, HIVE-15680.5.patch, 
> HIVE-15680.6.patch, HIVE-15680.7.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17530) ClassCastException when converting uniontype

2017-09-19 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172368#comment-16172368
 ] 

Anthony Hsu commented on HIVE-17530:


Thanks, [~cwsteinbach] and [~rdsr]!

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 3.0.0
>
> Attachments: HIVE-17530.1.patch, HIVE-17530.2.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}
> The issue is that StandardUnionObjectInspector was creating and returning an 
> ArrayList rather than a UnionObject.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-14 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17530:
---
Description: 
To repro:
{noformat}
SET hive.exec.schema.evolution = false;

CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
PARTITIONED BY (b int) STORED AS ORC;

INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
create_union(1, true, value) FROM src LIMIT 5;

ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;

SELECT * FROM avro_orc_partitioned_uniontype;
{noformat}

The exception you get is:
{code}
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.UnionObject
{code}

The issue is that StandardUnionObjectInspector was creating and returning an 
ArrayList rather than a UnionObject.

  was:
To repro:
{noformat}
SET hive.exec.schema.evolution = false;

CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
PARTITIONED BY (b int) STORED AS ORC;

INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
create_union(1, true, value) FROM src LIMIT 5;

ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;

SELECT * FROM avro_orc_partitioned_uniontype;
{noformat}

The exception you get is:
{code}
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.UnionObject
{code}


> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch, HIVE-17530.2.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}
> The issue is that StandardUnionObjectInspector was creating and returning an 
> ArrayList rather than a UnionObject.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-14 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17530:
---
Attachment: HIVE-17530.2.patch

Uploaded new patch. Changes:

* Fixed test TestObjectInspectorConverters.testObjectInspectorConverters()
* Renamed SettableUnionObjectInspector.addField() to setFieldAndTag().

Also updated RB: https://reviews.apache.org/r/62321/

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch, HIVE-17530.2.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-14 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17530:
---
Status: Patch Available  (was: Open)

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch, HIVE-17530.2.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-14 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17530:
---
Status: Open  (was: Patch Available)

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17530:
---
Status: Patch Available  (was: Open)

Attached patch. Also posted RB: https://reviews.apache.org/r/62321/

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17530:
---
Attachment: HIVE-17530.1.patch

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17530) ClassCastException when converting uniontype

2017-09-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu reassigned HIVE-17530:
--


> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163793#comment-16163793
 ] 

Anthony Hsu commented on HIVE-17394:


Thanks, [~cwsteinbach] and [~rdsr] for the reviews!

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Fix For: 3.0.0
>
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163076#comment-16163076
 ] 

Anthony Hsu edited comment on HIVE-17394 at 9/12/17 3:05 PM:
-

Attached patch. RB: https://reviews.apache.org/r/62247/


was (Author: erwaman):
Attached patch.

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17394:
---
Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17394:
---
Attachment: HIVE-17394.1.patch

Attached patch.

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu reassigned HIVE-17394:
--

Assignee: Anthony Hsu

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png
>
>
> The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16831) Add unit tests for NPE fixes in HIVE-12054

2017-06-06 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039627#comment-16039627
 ] 

Anthony Hsu commented on HIVE-16831:


[~sbeeram]: I suggest putting all the tests in one qfile vs. separate files.

> Add unit tests for NPE fixes in HIVE-12054
> --
>
> Key: HIVE-16831
> URL: https://issues.apache.org/jira/browse/HIVE-16831
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16831.1.patch
>
>
> HIVE-12054 fixed NPE issues related to ObjectInspector which get triggered 
> when an empty ORC table/partition is read.
> This work adds tests that trigger that path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16670) Hive should automatically clean up hive.downloaded.resources.dir

2017-05-15 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011700#comment-16011700
 ] 

Anthony Hsu commented on HIVE-16670:


Test failures are unrelated and have been failing in other recent PreCommit 
builds.

> Hive should automatically clean up hive.downloaded.resources.dir
> 
>
> Key: HIVE-16670
> URL: https://issues.apache.org/jira/browse/HIVE-16670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-16670.1.patch
>
>
> Currently, Hive does not automatically clean up the 
> hive.downloaded.resources.dir, so resources and resource directories can 
> accumulate over time. Ideally, Hive should automatically clean up the 
> resources dir when the session ends.
> Ref: 
> https://github.com/apache/hive/blob/0ce98b3a7527f72216e9e41f7e610b44ee524758/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L677-L678



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16670) Hive should automatically clean up hive.downloaded.resources.dir

2017-05-15 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-16670:
---
Status: Patch Available  (was: Open)

> Hive should automatically clean up hive.downloaded.resources.dir
> 
>
> Key: HIVE-16670
> URL: https://issues.apache.org/jira/browse/HIVE-16670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-16670.1.patch
>
>
> Currently, Hive does not automatically clean up the 
> hive.downloaded.resources.dir, so resources and resource directories can 
> accumulate over time. Ideally, Hive should automatically clean up the 
> resources dir when the session ends.
> Ref: 
> https://github.com/apache/hive/blob/0ce98b3a7527f72216e9e41f7e610b44ee524758/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L677-L678



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16670) Hive should automatically clean up hive.downloaded.resources.dir

2017-05-15 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-16670:
---
Attachment: HIVE-16670.1.patch

Attached one-line patch.

> Hive should automatically clean up hive.downloaded.resources.dir
> 
>
> Key: HIVE-16670
> URL: https://issues.apache.org/jira/browse/HIVE-16670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-16670.1.patch
>
>
> Currently, Hive does not automatically clean up the 
> hive.downloaded.resources.dir, so resources and resource directories can 
> accumulate over time. Ideally, Hive should automatically clean up the 
> resources dir when the session ends.
> Ref: 
> https://github.com/apache/hive/blob/0ce98b3a7527f72216e9e41f7e610b44ee524758/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L677-L678



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16670) Hive should automatically clean up hive.downloaded.resources.dir

2017-05-15 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu reassigned HIVE-16670:
--


> Hive should automatically clean up hive.downloaded.resources.dir
> 
>
> Key: HIVE-16670
> URL: https://issues.apache.org/jira/browse/HIVE-16670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> Currently, Hive does not automatically clean up the 
> hive.downloaded.resources.dir, so resources and resource directories can 
> accumulate over time. Ideally, Hive should automatically clean up the 
> resources dir when the session ends.
> Ref: 
> https://github.com/apache/hive/blob/0ce98b3a7527f72216e9e41f7e610b44ee524758/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L677-L678



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-6365) Alter a partition to be of a different fileformat than the Table's fileformat. Use insert overwrite to write data to this partition. The partition fileformat is converted

2017-03-17 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu reassigned HIVE-6365:
-

Assignee: (was: Anthony Hsu)

> Alter a partition to be of a different fileformat than the Table's 
> fileformat. Use insert overwrite to write data to this partition. The 
> partition fileformat is converted back to table's fileformat after the insert 
> operation. 
> --
>
> Key: HIVE-6365
> URL: https://issues.apache.org/jira/browse/HIVE-6365
> Project: Hive
>  Issue Type: Bug
> Environment: emr
>Reporter: Pavan Srinivas
>
> Lets say, there is partitioned table like 
> Step1:
> >> CREATE TABLE srcpart (key STRING, value STRING)
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> Step2:
> Alter the fileformat for a specific available partition. 
> >> alter table srcpart partition(ds="2008-04-08", hr="12") set fileformat  
> >> orc;
> Step3:
> Describe the partition.
> >> desc formatted srcpart partition(ds="2008-04-08", hr="12")
> .
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  -1
> Bucket Columns:   []
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Step4:
> Write the data to this partition using insert overwrite. 
> >>insert overwrite  table srcpart partition(ds="2008-04-08",hr="12") select 
> >>key, value from ... 
> Step5:
> Describe the partition again. 
> >> desc formatted srcpart partition(ds="2008-04-08", hr="12")
> .
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Compressed:   No
> Num Buckets:  -1
> Bucket Columns:   []
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> The fileformat of the partition is converted back to the table's original 
> fileformat. It should have retained and written the data in the modified 
> fileformat. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-31 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847866#comment-15847866
 ] 

Anthony Hsu commented on HIVE-15680:


[~gopalv], [~sershe], [~xuefuz]: Is it possible to run the LLAP tests all in 
one process, so you can step through the code easily? If so, could you provide 
some pointers?

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch, HIVE-15680.5.patch, HIVE-15680.6.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-30 Thread Anthony Hsu (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Anthony Hsu updated  HIVE-15680 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15680 
 
 
 
  Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query  
 
 
 
 
 
 
 
 
 

Change By:
 
 Anthony Hsu 
 
 
 

Status:
 
 Open Patch Available 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-30 Thread Anthony Hsu (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Anthony Hsu updated  HIVE-15680 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15680 
 
 
 
  Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query  
 
 
 
 
 
 
 
 
 

Change By:
 
 Anthony Hsu 
 
 
 

Status:
 
 Patch Available Open 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-30 Thread Anthony Hsu (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Anthony Hsu updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15680 
 
 
 
  Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query  
 
 
 
 
 
 
 
 
 
 
Uploaded new patch. 
 
 
 
 
 
 
 
 
 

Change By:
 
 Anthony Hsu 
 
 
 

Attachment:
 
 HIVE-15680.6.patch 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-28 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Open  (was: Patch Available)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch, HIVE-15680.5.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-28 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Patch Available  (was: Open)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch, HIVE-15680.5.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-28 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Attachment: HIVE-15680.5.patch

Uploaded new patch.

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch, HIVE-15680.5.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Attachment: HIVE-15680.4.patch

Fixed NPEs in LLAP tests, uploaded new patch, and updated 
[RB|https://reviews.apache.org/r/55816/].

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Patch Available  (was: Open)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Open  (was: Patch Available)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Patch Available  (was: Open)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Attachment: HIVE-15680.3.patch

Added some missing null checks. Uploaded new patch and updated RB: 
https://reviews.apache.org/r/55816/

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Open  (was: Patch Available)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Attachment: HIVE-15680.1.patch

Uploaded patch. Also posted RB at https://reviews.apache.org/r/55816/.

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Patch Available  (was: Open)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832786#comment-15832786
 ] 

Anthony Hsu edited comment on HIVE-15680 at 1/21/17 3:42 AM:
-

Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 1.711 seconds, Fetched: 1 row(s)
{noformat}

Here's the explain plan, which does show a single mapper processing two table 
scans:
{noformat}
hive (default)> explain
  > select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: x
filterExpr: (number = 1) (type: boolean)
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (number = 1) (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
Union
  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  TableScan
alias: y
filterExpr: (number = 2) (type: boolean)
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (number = 2) (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
Union
  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 0.237 seconds, Fetched: 55 row(s)
{noformat}


was (Author: erwaman):
Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 

[jira] [Commented] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832786#comment-15832786
 ] 

Anthony Hsu commented on HIVE-15680:


Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 1.711 seconds, Fetched: 1 row(s)
{noformat}

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832395#comment-15832395
 ] 

Anthony Hsu commented on HIVE-15680:


[~gopalv]: I only tested with MRv2. Not sure about other execution engines but 
I will test.

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2017-01-06 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806417#comment-15806417
 ] 

Anthony Hsu commented on HIVE-14044:


I believe this issue was fixed by HIVE-11785.

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-1898) The ESCAPED BY clause does not seem to pick up newlines in columns and the line terminator cannot be changed

2017-01-06 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-1898:
--
Summary: The ESCAPED BY clause does not seem to pick up newlines in columns 
and the line terminator cannot be changed  (was: The ESCAPED BY clause does not 
seem to pick up newlines in colums and the line terminator cannot be changed)

> The ESCAPED BY clause does not seem to pick up newlines in columns and the 
> line terminator cannot be changed
> 
>
> Key: HIVE-1898
> URL: https://issues.apache.org/jira/browse/HIVE-1898
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Josh Patterson
>Priority: Minor
>
> If I want to preserve data in columns which contains a newline (webcrawling 
> for instance) I cannot set the ESCAPED BY clause to escape these out (other 
> characters such as commas escape fine, however). This may be due to the line 
> terminators, which are locked to be newlines, are picked up first, and then 
> fields processed. 
> This seems to be related to:
> "SerDe should escape some special characters"
> https://issues.apache.org/jira/browse/HIVE-136
> and
> "Implement "LINES TERMINATED BY""
> https://issues.apache.org/jira/browse/HIVE-302
> where at comment: 
> https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435
> "This is not fixable currently because the line terminator is determined by 
> LineRecordReader.LineReader which is in the Hadoop land."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10608) Fix useless 'if' statement in RetryingMetaStoreClient (135)

2017-01-04 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-10608:
---
Summary: Fix useless 'if' statement in RetryingMetaStoreClient (135)  (was: 
Fix useless 'if' stamement in RetryingMetaStoreClient (135))

> Fix useless 'if' statement in RetryingMetaStoreClient (135)
> ---
>
> Key: HIVE-10608
> URL: https://issues.apache.org/jira/browse/HIVE-10608
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-10608.1.patch, rb33861.patch
>
>
> "if" statement below is useless because it ends with ;
> {code}
>   } catch (MetaException e) {
> if (e.getMessage().matches("(?s).*(IO|TTransport)Exception.*"));
> caughtException = e;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9642) Hive metastore client retries don't happen consistently for all api calls

2017-01-04 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799892#comment-15799892
 ] 

Anthony Hsu commented on HIVE-9642:
---

Has this already been fixed by HIVE-10384?

> Hive metastore client retries don't happen consistently for all api calls
> -
>
> Key: HIVE-9642
> URL: https://issues.apache.org/jira/browse/HIVE-9642
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Xiaobing Zhou
>Assignee: Daniel Dai
> Attachments: HIVE-9642.1.patch, HIVE-9642.2.patch, HIVE-9642.3.patch, 
> HIVE-9642.4.patch, HIVE-9642.5.patch, HIVE-9642.5.patch, HIVE-9642.6.patch, 
> HIVE-9642.7.patch
>
>
> When org.apache.thrift.transport.TTransportException is thrown for issues 
> like socket timeout, the retry via RetryingMetaStoreClient happens only in 
> certain cases.
> Retry happens for the getDatabase call in but not for getAllDatabases().
> The reason is RetryingMetaStoreClient checks for TTransportException being 
> the cause for InvocationTargetException. But in case of some calls such as 
> getAllDatabases in HiveMetastoreClient, all exceptions get wrapped in a 
> MetaException. We should remove this unnecessary wrapping of exceptions for 
> certain functions in HMC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753231#comment-15753231
 ] 

Anthony Hsu commented on HIVE-15438:


Thanks, Ashutosh!

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 2.2.0
>
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15411) ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES

2016-12-15 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752727#comment-15752727
 ] 

Anthony Hsu commented on HIVE-15411:


Test failures look unrelated. Same qtests have failed in other precommit builds 
around the same time.

> ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES
> ---
>
> Key: HIVE-15411
> URL: https://issues.apache.org/jira/browse/HIVE-15411
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15411.1.patch
>
>
> Currently, {{ALTER TABLE ... ADD PARTITION}} only lets you set the 
> partition's LOCATION but not its FILEFORMAT or SERDEPROPERTIES. In order to 
> change the FILEFORMAT or SERDEPROPERTIES, you have to issue two additional 
> calls to {{ALTER TABLE ... PARTITION ... SET FILEFORMAT}} and {{ALTER TABLE 
> ... PARTITION ... SET SERDEPROPERTIES}}. This is not atomic, and queries that 
> interleave the ALTER TABLE commands may fail.
> We should extend the grammar to support setting FILEFORMAT and 
> SERDEPROPERTIES atomically as part of the ADD PARTITION command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15438:
---
Status: Patch Available  (was: Open)

Uploaded patch.

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15438:
---
Attachment: HIVE-15438.1.patch

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15411) ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES

2016-12-14 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15411:
---
Status: Patch Available  (was: Open)

Uploaded patch. Also created RB here: https://reviews.apache.org/r/54765/

> ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES
> ---
>
> Key: HIVE-15411
> URL: https://issues.apache.org/jira/browse/HIVE-15411
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15411.1.patch
>
>
> Currently, {{ALTER TABLE ... ADD PARTITION}} only lets you set the 
> partition's LOCATION but not its FILEFORMAT or SERDEPROPERTIES. In order to 
> change the FILEFORMAT or SERDEPROPERTIES, you have to issue two additional 
> calls to {{ALTER TABLE ... PARTITION ... SET FILEFORMAT}} and {{ALTER TABLE 
> ... PARTITION ... SET SERDEPROPERTIES}}. This is not atomic, and queries that 
> interleave the ALTER TABLE commands may fail.
> We should extend the grammar to support setting FILEFORMAT and 
> SERDEPROPERTIES atomically as part of the ADD PARTITION command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15411) ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES

2016-12-14 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15411:
---
Attachment: HIVE-15411.1.patch

> ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES
> ---
>
> Key: HIVE-15411
> URL: https://issues.apache.org/jira/browse/HIVE-15411
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15411.1.patch
>
>
> Currently, {{ALTER TABLE ... ADD PARTITION}} only lets you set the 
> partition's LOCATION but not its FILEFORMAT or SERDEPROPERTIES. In order to 
> change the FILEFORMAT or SERDEPROPERTIES, you have to issue two additional 
> calls to {{ALTER TABLE ... PARTITION ... SET FILEFORMAT}} and {{ALTER TABLE 
> ... PARTITION ... SET SERDEPROPERTIES}}. This is not atomic, and queries that 
> interleave the ALTER TABLE commands may fail.
> We should extend the grammar to support setting FILEFORMAT and 
> SERDEPROPERTIES atomically as part of the ADD PARTITION command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746685#comment-15746685
 ] 

Anthony Hsu commented on HIVE-15353:


Test failures look unrelated.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-6365) Alter a partition to be of a different fileformat than the Table's fileformat. Use insert overwrite to write data to this partition. The partition fileformat is converted

2016-12-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu reassigned HIVE-6365:
-

Assignee: Anthony Hsu

> Alter a partition to be of a different fileformat than the Table's 
> fileformat. Use insert overwrite to write data to this partition. The 
> partition fileformat is converted back to table's fileformat after the insert 
> operation. 
> --
>
> Key: HIVE-6365
> URL: https://issues.apache.org/jira/browse/HIVE-6365
> Project: Hive
>  Issue Type: Bug
> Environment: emr
>Reporter: Pavan Srinivas
>Assignee: Anthony Hsu
>
> Lets say, there is partitioned table like 
> Step1:
> >> CREATE TABLE srcpart (key STRING, value STRING)
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> Step2:
> Alter the fileformat for a specific available partition. 
> >> alter table srcpart partition(ds="2008-04-08", hr="12") set fileformat  
> >> orc;
> Step3:
> Describe the partition.
> >> desc formatted srcpart partition(ds="2008-04-08", hr="12")
> .
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  -1
> Bucket Columns:   []
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Step4:
> Write the data to this partition using insert overwrite. 
> >>insert overwrite  table srcpart partition(ds="2008-04-08",hr="12") select 
> >>key, value from ... 
> Step5:
> Describe the partition again. 
> >> desc formatted srcpart partition(ds="2008-04-08", hr="12")
> .
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Compressed:   No
> Num Buckets:  -1
> Bucket Columns:   []
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> The fileformat of the partition is converted back to the table's original 
> fileformat. It should have retained and written the data in the modified 
> fileformat. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Patch Available  (was: Open)

Reuploaded the same patch as HIVE-15353.4.patch to try to trigger the PreCommit 
tests again.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Open  (was: Patch Available)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Attachment: HIVE-15353.4.patch

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-4095) Add exchange partition in Hive

2016-12-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743019#comment-15743019
 ] 

Anthony Hsu edited comment on HIVE-4095 at 12/12/16 8:16 PM:
-

[~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 
(trunk), the destination table should come first. I clarified the example in 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExchangePartition.


was (Author: erwaman):
[~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 
(trunk), the destination table should come first.

> Add exchange partition in Hive
> --
>
> Key: HIVE-4095
> URL: https://issues.apache.org/jira/browse/HIVE-4095
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Dheeraj Kumar Singh
> Fix For: 0.12.0
>
> Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
> HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
> HIVE-4095.part12.patch.txt, hive.4095.1.patch, hive.4095.refresh.patch, 
> hive.4095.svn.thrift.patch, hive.4095.svn.thrift.patch.refresh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4095) Add exchange partition in Hive

2016-12-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743019#comment-15743019
 ] 

Anthony Hsu commented on HIVE-4095:
---

[~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 
(trunk), the destination table should come first.

> Add exchange partition in Hive
> --
>
> Key: HIVE-4095
> URL: https://issues.apache.org/jira/browse/HIVE-4095
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Dheeraj Kumar Singh
> Fix For: 0.12.0
>
> Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
> HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
> HIVE-4095.part12.patch.txt, hive.4095.1.patch, hive.4095.refresh.patch, 
> hive.4095.svn.thrift.patch, hive.4095.svn.thrift.patch.refresh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743001#comment-15743001
 ] 

Anthony Hsu commented on HIVE-15353:


Canceled and resubmitted patch. Will see if PreCommit tests run.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-12 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Open  (was: Patch Available)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-12 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Patch Available  (was: Open)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6365) Alter a partition to be of a different fileformat than the Table's fileformat. Use insert overwrite to write data to this partition. The partition fileformat is converted

2016-12-09 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-6365:
--
Summary: Alter a partition to be of a different fileformat than the Table's 
fileformat. Use insert overwrite to write data to this partition. The partition 
fileformat is converted back to table's fileformat after the insert operation.  
 (was: Alter a partition to be of a different fileformat than the Table's 
fileformat. Use insert overwrite to write data to this partition. The partition 
fileformat is coverted back to table's fileformat after the insert operation. )

> Alter a partition to be of a different fileformat than the Table's 
> fileformat. Use insert overwrite to write data to this partition. The 
> partition fileformat is converted back to table's fileformat after the insert 
> operation. 
> --
>
> Key: HIVE-6365
> URL: https://issues.apache.org/jira/browse/HIVE-6365
> Project: Hive
>  Issue Type: Bug
> Environment: emr
>Reporter: Pavan Srinivas
>
> Lets say, there is partitioned table like 
> Step1:
> >> CREATE TABLE srcpart (key STRING, value STRING)
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> Step2:
> Alter the fileformat for a specific available partition. 
> >> alter table srcpart partition(ds="2008-04-08", hr="12") set fileformat  
> >> orc;
> Step3:
> Describe the partition.
> >> desc formatted srcpart partition(ds="2008-04-08", hr="12")
> .
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  -1
> Bucket Columns:   []
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Step4:
> Write the data to this partition using insert overwrite. 
> >>insert overwrite  table srcpart partition(ds="2008-04-08",hr="12") select 
> >>key, value from ... 
> Step5:
> Describe the partition again. 
> >> desc formatted srcpart partition(ds="2008-04-08", hr="12")
> .
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Compressed:   No
> Num Buckets:  -1
> Bucket Columns:   []
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> The fileformat of the partition is converted back to the table's original 
> fileformat. It should have retained and written the data in the modified 
> fileformat. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15411) ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES

2016-12-09 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736385#comment-15736385
 ] 

Anthony Hsu commented on HIVE-15411:


Proposal is to extend the ADD PARTITION grammar to support the following:
{noformat}
ALTER TABLE table_name ADD [IF NOT EXISTS]
PARTITION (part_col='part_value', ...)
  [FILEFORMAT ]  -- new
  [SERDEPROPERTIES ('key1'='val', ...)]  -- new
  [LOCATION 'location1']
PARTITION (part_col='part_value', ...)
  [FILEFORMAT ]  -- new
  [SERDEPROPERTIES ('key1'='val', ...)]  -- new
  [LOCATION 'location2']
...;
{noformat}

> ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES
> ---
>
> Key: HIVE-15411
> URL: https://issues.apache.org/jira/browse/HIVE-15411
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> Currently, {{ALTER TABLE ... ADD PARTITION}} only lets you set the 
> partition's LOCATION but not its FILEFORMAT or SERDEPROPERTIES. In order to 
> change the FILEFORMAT or SERDEPROPERTIES, you have to issue two additional 
> calls to {{ALTER TABLE ... PARTITION ... SET FILEFORMAT}} and {{ALTER TABLE 
> ... PARTITION ... SET SERDEPROPERTIES}}. This is not atomic, and queries that 
> interleave the ALTER TABLE commands may fail.
> We should extend the grammar to support setting FILEFORMAT and 
> SERDEPROPERTIES atomically as part of the ADD PARTITION command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-09 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735806#comment-15735806
 ] 

Anthony Hsu commented on HIVE-15353:


HIVE-15353.3.patch seems to have been tested; the results just weren't 
auto-posted to this JIRA: 
https://builds.apache.org/job/PreCommit-HIVE-Build/2502/console. Looks like the 
PreCommit build is currently failing due to:
{noformat}
[INFO] -
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] No compiler is provided in this environment. Perhaps you are running on 
a JRE rather than a JDK?
{noformat}

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-08 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733427#comment-15733427
 ] 

Anthony Hsu edited comment on HIVE-15353 at 12/8/16 9:32 PM:
-

Uploaded new patch.

After further offline discussion with [~cwsteinbach], we decided updating the 
Thrift API was not the right approach, given that {{cols}} should always be 
set. {{add_partition}} and {{alter_partition}} should not accept partitions 
with null {{cols}} fields. This can be fixed in a follow-up ticket: HIVE-15394. 
For now, this patch simply eliminates the NPEs on the metastore side.


was (Author: erwaman):
Uploaded new patch.

After further offline discussion with [~cwsteinbach], we decided updating the 
Thrift API was not the right approach, given that {{cols}} should always be 
set. {{add_partition}} and {{alter_partition}} should not accept partitions 
with null {{cols}} fields. This can be fixed in a follow-up ticket. For now, 
this patch simply eliminates the NPEs on the metastore side.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-08 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Patch Available  (was: Open)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-08 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Open  (was: Patch Available)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-08 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733427#comment-15733427
 ] 

Anthony Hsu edited comment on HIVE-15353 at 12/8/16 9:25 PM:
-

Uploaded new patch.

After further offline discussion with [~cwsteinbach], we decided updating the 
Thrift API was not the right approach, given that {{cols}} should always be 
set. {{add_partition}} and {{alter_partition}} should not accept partitions 
with null {{cols}} fields. This can be fixed in a follow-up ticket. For now, 
this patch simply eliminates the NPEs on the metastore side.


was (Author: erwaman):
Uploaded new patch.

After further offline discussion with [~cwsteinbach], we decided updating the 
Thrift API was not the right approach, given that {{cols}} should always be 
set. {{add_partition}} should not accept partitions with null {{cols}} fields. 
This can be fixed in a follow-up ticket. For now, this patch simply eliminates 
the NPEs on the metastore side.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-08 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Attachment: HIVE-15353.3.patch

Uploaded new patch.

After further offline discussion with [~cwsteinbach], we decided updating the 
Thrift API was not the right approach, given that {{cols}} should always be 
set. {{add_partition}} should not accept partitions with null {{cols}} fields. 
This can be fixed in a follow-up ticket. For now, this patch simply eliminates 
the NPEs on the metastore side.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-08 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Description: 
When using the HiveMetaStoreClient API directly to talk to the metastore, you 
get NullPointerExceptions when StorageDescriptor.cols is null in the 
Table/Partition object in the following calls:

* create_table
* alter_table
* alter_partition

Calling add_partition with StorageDescriptor.cols set to null causes null to be 
stored in the metastore database and subsequent calls to alter_partition for 
that partition to fail with an NPE.

Null checks should be added to eliminate the NPEs in the metastore.

  was:
When using the HiveMetaStoreClient API directly to talk to the metastore, you 
get NullPointerExceptions when StorageDescriptor.cols is null in the 
Table/Partition object in the following calls:

* create_table
* alter_table
* alter_partition

Calling add_partition with StorageDescriptor.cols set to null causes null to be 
stored in the metastore database and subsequent calls to alter_partition for 
that partition to fail with an NPE.

The simplest way to fix these NPEs seems to be to update the 
StorageDescriptor.cols Thrift definition and set a default value of empty list. 
Some null checks will also have to be added to handle existing nulls in the 
metastore database.


> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-06 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Open  (was: Patch Available)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> The simplest way to fix these NPEs seems to be to update the 
> StorageDescriptor.cols Thrift definition and set a default value of empty 
> list. Some null checks will also have to be added to handle existing nulls in 
> the metastore database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-06 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Patch Available  (was: Open)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> The simplest way to fix these NPEs seems to be to update the 
> StorageDescriptor.cols Thrift definition and set a default value of empty 
> list. Some null checks will also have to be added to handle existing nulls in 
> the metastore database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-05 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Attachment: HIVE-15353.2.patch

Uploaded new patch to fix HiveMetaStore unit tests.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> The simplest way to fix these NPEs seems to be to update the 
> StorageDescriptor.cols Thrift definition and set a default value of empty 
> list. Some null checks will also have to be added to handle existing nulls in 
> the metastore database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-03 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Patch Available  (was: Open)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> The simplest way to fix these NPEs seems to be to update the 
> StorageDescriptor.cols Thrift definition and set a default value of empty 
> list. Some null checks will also have to be added to handle existing nulls in 
> the metastore database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-03 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Attachment: HIVE-15353.1.patch

Uploaded patch. Also posted to RB: https://reviews.apache.org/r/54341/

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> The simplest way to fix these NPEs seems to be to update the 
> StorageDescriptor.cols Thrift definition and set a default value of empty 
> list. Some null checks will also have to be added to handle existing nulls in 
> the metastore database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-03 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15718515#comment-15718515
 ] 

Anthony Hsu commented on HIVE-15353:


Example stack trace:
{noformat}
2016-12-02T14:25:36,994 ERROR [pool-6-thread-6] metastore.RetryingHMSHandler: 
MetaException(message:java.lang.NullPointerException)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6152)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:3828)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:3765)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partition(HiveMetaStore.java:3748)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy21.alter_partition(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partition.getResult(ThriftHiveMetastore.java:12394)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partition.getResult(ThriftHiveMetastore.java:12378)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:103)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.areSameColumns(MetaStoreUtils.java:629)
at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.updatePartColumnStats(HiveAlterHandler.java:770)
at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartition(HiveAlterHandler.java:405)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:3799)
... 17 more
{noformat}

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> The simplest way to fix these NPEs seems to be to update the 
> StorageDescriptor.cols Thrift definition and set a default value of empty 
> list. Some null checks will also have to be added to handle existing nulls in 
> the metastore database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15084) Flaky test: TestMiniTezCliDriver:explainanalyze_2, 3, 4, 5

2016-11-27 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15084:
---
Description: 
Example diffs:
{noformat:title=explainanalyze_2.q.out}
1881c1881
< Group By Operator [GBY_2] (rows=1/500 
width=8)
---
> Group By Operator [GBY_2] (rows=1/1 
> width=8)
2227c2227
<   Group By Operator [GBY_11] (rows=250/392 width=280)
---
>   Group By Operator [GBY_11] (rows=250/310 width=280)
2237c2237
<   Group By Operator [GBY_17] (rows=501/392 width=464)
---
>   Group By Operator [GBY_17] (rows=501/310 width=464)
2243c2243
<   Group By Operator [GBY_11] (rows=250/392 width=280)
---
>   Group By Operator [GBY_11] (rows=250/310 width=280)
2260c2260
<   Group By Operator [GBY_17] (rows=501/392 width=464)
---
>   Group By Operator [GBY_17] (rows=501/310 width=464)
{noformat}

{noformat:title=explainanalyze_2.q.out}
367c367
<   Group By Operator 
[GBY_78] (rows=262/334 width=178)
---
>   Group By Operator 
> [GBY_78] (rows=262/331 width=178)
378c378
<   Group By Operator 
[GBY_78] (rows=262/334 width=178)
---
>   Group By Operator 
> [GBY_78] (rows=262/331 width=178)
2135c2135
< Group By Operator [GBY_2] (rows=1/241 width=8)
---
> Group By Operator [GBY_2] (rows=1/1 width=8)
{noformat}

>From https://builds.apache.org/job/PreCommit-HIVE-Build/2295/testReport/:
{noformat:title=explainanalyze_4.q.out}
248c248
< Group By Operator [GBY_10] (rows=615/10 width=12)
---
> Group By Operator [GBY_10] (rows=615/5 width=12)
{noformat}

{noformat:title=explainanalyze_5.q.out}
143c143
<   Group By Operator [GBY_9] (rows=262/522 
width=178)
---
>   Group By Operator [GBY_9] (rows=262/331 
> width=178)
154c154
<   Group By Operator [GBY_9] (rows=262/522 
width=178)
---
>   Group By Operator [GBY_9] (rows=262/331 
> width=178)
{noformat}

  was:
Example diffs:
{noformat:title=explainanalyze_2.q.out}
1881c1881
< Group By Operator [GBY_2] (rows=1/500 
width=8)
---
> Group By Operator [GBY_2] (rows=1/1 
> width=8)
2227c2227
<   Group By Operator [GBY_11] (rows=250/392 width=280)
---
>   Group By Operator [GBY_11] (rows=250/310 width=280)
2237c2237
<   Group By Operator [GBY_17] (rows=501/392 width=464)
---
>   Group By Operator [GBY_17] (rows=501/310 width=464)
2243c2243
<   Group By Operator [GBY_11] (rows=250/392 width=280)
---
>   Group By Operator [GBY_11] (rows=250/310 width=280)
2260c2260
<   Group By Operator [GBY_17] (rows=501/392 width=464)
---
>   Group By Operator [GBY_17] (rows=501/310 width=464)
{noformat}

{noformat:title=explainanalyze_2.q.out}
367c367
<   Group By Operator 
[GBY_78] (rows=262/334 width=178)
---
>   Group By Operator 
> [GBY_78] (rows=262/331 width=178)
378c378
<   Group By Operator 
[GBY_78] (rows=262/334 width=178)
---
>   Group By Operator 
> [GBY_78] (rows=262/331 width=178)
2135c2135
< Group By Operator [GBY_2] (rows=1/241 width=8)
---
> Group By Operator [GBY_2] (rows=1/1 width=8)
{noformat}

{noformat:title=explainanalyze_5.q.out}
143c143
<   Group By Operator [GBY_9] (rows=262/522 
width=178)
---
>   Group By Operator [GBY_9] (rows=262/331 
> width=178)
154c154
<   Group By Operator [GBY_9] (rows=262/522 
width=178)
---
>   Group By Operator [GBY_9] (rows=262/331 
> width=178)
{noformat}


> Flaky test: TestMiniTezCliDriver:explainanalyze_2, 3, 4, 5
> --
>
> Key: HIVE-15084
> URL: https://issues.apache.org/jira/browse/HIVE-15084
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>
> Example diffs:
> {noformat:title=explainanalyze_2.q.out}
> 

[jira] [Updated] (HIVE-15190) Field names are not preserved in ORC files written with ACID

2016-11-27 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15190:
---
Status: Patch Available  (was: Open)

> Field names are not preserved in ORC files written with ACID
> 
>
> Key: HIVE-15190
> URL: https://issues.apache.org/jira/browse/HIVE-15190
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15190.1.patch, HIVE-15190.2.patch
>
>
> To repro:
> {noformat}
> drop table if exists orc_nonacid;
> drop table if exists orc_acid;
> create table orc_nonacid (a int) clustered by (a) into 2 buckets stored as 
> orc;
> create table orc_acid (a int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='true');
> insert into table orc_nonacid values(1), (2);
> insert into table orc_acid values(1), (2);
> {noformat}
> Running {{hive --service orcfiledump }} on the files created by the 
> {{insert}} statements above, you'll see that for {{orc_nonacid}}, the files 
> have schema {{struct}} whereas for {{orc_acid}}, the files have schema 
> {{struct>}}.
>  The last field {{row}} should have schema {{struct}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15190) Field names are not preserved in ORC files written with ACID

2016-11-27 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15190:
---
Attachment: HIVE-15190.2.patch

Canceling + resubmitting the original patch didn't seem to trigger PreCommit 
again. I've reuploaded the same patch as a different file (HIVE-15190.2.patch) 
to try to trigger the PreCommit build again.

> Field names are not preserved in ORC files written with ACID
> 
>
> Key: HIVE-15190
> URL: https://issues.apache.org/jira/browse/HIVE-15190
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15190.1.patch, HIVE-15190.2.patch
>
>
> To repro:
> {noformat}
> drop table if exists orc_nonacid;
> drop table if exists orc_acid;
> create table orc_nonacid (a int) clustered by (a) into 2 buckets stored as 
> orc;
> create table orc_acid (a int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='true');
> insert into table orc_nonacid values(1), (2);
> insert into table orc_acid values(1), (2);
> {noformat}
> Running {{hive --service orcfiledump }} on the files created by the 
> {{insert}} statements above, you'll see that for {{orc_nonacid}}, the files 
> have schema {{struct}} whereas for {{orc_acid}}, the files have schema 
> {{struct>}}.
>  The last field {{row}} should have schema {{struct}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15190) Field names are not preserved in ORC files written with ACID

2016-11-27 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15190:
---
Status: Open  (was: Patch Available)

> Field names are not preserved in ORC files written with ACID
> 
>
> Key: HIVE-15190
> URL: https://issues.apache.org/jira/browse/HIVE-15190
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15190.1.patch
>
>
> To repro:
> {noformat}
> drop table if exists orc_nonacid;
> drop table if exists orc_acid;
> create table orc_nonacid (a int) clustered by (a) into 2 buckets stored as 
> orc;
> create table orc_acid (a int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='true');
> insert into table orc_nonacid values(1), (2);
> insert into table orc_acid values(1), (2);
> {noformat}
> Running {{hive --service orcfiledump }} on the files created by the 
> {{insert}} statements above, you'll see that for {{orc_nonacid}}, the files 
> have schema {{struct}} whereas for {{orc_acid}}, the files have schema 
> {{struct>}}.
>  The last field {{row}} should have schema {{struct}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14936) Flaky test: TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]

2016-11-27 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-14936:
---
Description: 
https://builds.apache.org/job/PreCommit-HIVE-Build/1489/testReport/org.apache.hadoop.hive.cli/TestMiniLlapCliDriver/testCliDriver_orc_ppd_schema_evol_3a_/
{code}
224c224
HDFS_BYTES_READ: 17046
226c226
HDFS_READ_OPS: 6
{code}

Have seen this diff fairly often.

Some other diffs that have been observed:

>From https://builds.apache.org/job/PreCommit-HIVE-Build/2292/testReport/:
{noformat}
741c741
RECORDS_OUT_INTERMEDIATE_Map_1: 1
781c781
RECORDS_OUT_INTERMEDIATE_Map_1: 1
1256c1256
RECORDS_OUT_INTERMEDIATE_Map_1: 1
1294c1294
RECORDS_OUT_INTERMEDIATE_Map_1: 1
{noformat}

>From https://builds.apache.org/job/PreCommit-HIVE-Build/2291/testReport/:
{noformat}
717c717
RECORDS_OUT_INTERMEDIATE_Map_1: 1
821c821
RECORDS_OUT_INTERMEDIATE_Map_1: 1
861c861
RECORDS_OUT_INTERMEDIATE_Map_1: 1
1294c1294
RECORDS_OUT_INTERMEDIATE_Map_1: 1
{noformat}

  was:
https://builds.apache.org/job/PreCommit-HIVE-Build/1489/testReport/org.apache.hadoop.hive.cli/TestMiniLlapCliDriver/testCliDriver_orc_ppd_schema_evol_3a_/
{code}
224c224
HDFS_BYTES_READ: 17046
226c226
HDFS_READ_OPS: 6
{code}

Have seen this diff fairly often.


> Flaky test: TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
> ---
>
> Key: HIVE-14936
> URL: https://issues.apache.org/jira/browse/HIVE-14936
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>
> https://builds.apache.org/job/PreCommit-HIVE-Build/1489/testReport/org.apache.hadoop.hive.cli/TestMiniLlapCliDriver/testCliDriver_orc_ppd_schema_evol_3a_/
> {code}
> 224c224
>  ---
> >HDFS_BYTES_READ: 17046
> 226c226
>  ---
> >HDFS_READ_OPS: 6
> {code}
> Have seen this diff fairly often.
> Some other diffs that have been observed:
> From https://builds.apache.org/job/PreCommit-HIVE-Build/2292/testReport/:
> {noformat}
> 741c741
>  ---
> >RECORDS_OUT_INTERMEDIATE_Map_1: 1
> 781c781
>  ---
> >RECORDS_OUT_INTERMEDIATE_Map_1: 1
> 1256c1256
>  ---
> >RECORDS_OUT_INTERMEDIATE_Map_1: 1
> 1294c1294
>  ---
> >RECORDS_OUT_INTERMEDIATE_Map_1: 1
> {noformat}
> From https://builds.apache.org/job/PreCommit-HIVE-Build/2291/testReport/:
> {noformat}
> 717c717
>  ---
> >RECORDS_OUT_INTERMEDIATE_Map_1: 1
> 821c821
>  ---
> >RECORDS_OUT_INTERMEDIATE_Map_1: 1
> 861c861
>  ---
> >RECORDS_OUT_INTERMEDIATE_Map_1: 1
> 1294c1294
>  ---
> >RECORDS_OUT_INTERMEDIATE_Map_1: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15084) Flaky test: TestMiniTezCliDriver:explainanalyze_2, 3, 4, 5

2016-11-27 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15084:
---
Description: 
Example diffs:
{noformat:title=explainanalyze_2.q.out}
1881c1881
< Group By Operator [GBY_2] (rows=1/500 
width=8)
---
> Group By Operator [GBY_2] (rows=1/1 
> width=8)
2227c2227
<   Group By Operator [GBY_11] (rows=250/392 width=280)
---
>   Group By Operator [GBY_11] (rows=250/310 width=280)
2237c2237
<   Group By Operator [GBY_17] (rows=501/392 width=464)
---
>   Group By Operator [GBY_17] (rows=501/310 width=464)
2243c2243
<   Group By Operator [GBY_11] (rows=250/392 width=280)
---
>   Group By Operator [GBY_11] (rows=250/310 width=280)
2260c2260
<   Group By Operator [GBY_17] (rows=501/392 width=464)
---
>   Group By Operator [GBY_17] (rows=501/310 width=464)
{noformat}

{noformat:title=explainanalyze_2.q.out}
367c367
<   Group By Operator 
[GBY_78] (rows=262/334 width=178)
---
>   Group By Operator 
> [GBY_78] (rows=262/331 width=178)
378c378
<   Group By Operator 
[GBY_78] (rows=262/334 width=178)
---
>   Group By Operator 
> [GBY_78] (rows=262/331 width=178)
2135c2135
< Group By Operator [GBY_2] (rows=1/241 width=8)
---
> Group By Operator [GBY_2] (rows=1/1 width=8)
{noformat}

{noformat:title=explainanalyze_5.q.out}
143c143
<   Group By Operator [GBY_9] (rows=262/522 
width=178)
---
>   Group By Operator [GBY_9] (rows=262/331 
> width=178)
154c154
<   Group By Operator [GBY_9] (rows=262/522 
width=178)
---
>   Group By Operator [GBY_9] (rows=262/331 
> width=178)
{noformat}

> Flaky test: TestMiniTezCliDriver:explainanalyze_2, 3, 4, 5
> --
>
> Key: HIVE-15084
> URL: https://issues.apache.org/jira/browse/HIVE-15084
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>
> Example diffs:
> {noformat:title=explainanalyze_2.q.out}
> 1881c1881
> < Group By Operator [GBY_2] (rows=1/500 
> width=8)
> ---
> > Group By Operator [GBY_2] (rows=1/1 
> > width=8)
> 2227c2227
> <   Group By Operator [GBY_11] (rows=250/392 width=280)
> ---
> >   Group By Operator [GBY_11] (rows=250/310 width=280)
> 2237c2237
> <   Group By Operator [GBY_17] (rows=501/392 width=464)
> ---
> >   Group By Operator [GBY_17] (rows=501/310 width=464)
> 2243c2243
> <   Group By Operator [GBY_11] (rows=250/392 width=280)
> ---
> >   Group By Operator [GBY_11] (rows=250/310 width=280)
> 2260c2260
> <   Group By Operator [GBY_17] (rows=501/392 width=464)
> ---
> >   Group By Operator [GBY_17] (rows=501/310 width=464)
> {noformat}
> {noformat:title=explainanalyze_2.q.out}
> 367c367
> <   Group By Operator 
> [GBY_78] (rows=262/334 width=178)
> ---
> >   Group By Operator 
> > [GBY_78] (rows=262/331 width=178)
> 378c378
> <   Group By Operator 
> [GBY_78] (rows=262/334 width=178)
> ---
> >   Group By Operator 
> > [GBY_78] (rows=262/331 width=178)
> 2135c2135
> < Group By Operator [GBY_2] (rows=1/241 
> width=8)
> ---
> > Group By Operator [GBY_2] (rows=1/1 width=8)
> {noformat}
> {noformat:title=explainanalyze_5.q.out}
> 143c143
> <   Group By Operator [GBY_9] 
> (rows=262/522 width=178)
> ---
> >   Group By Operator [GBY_9] 
> > (rows=262/331 width=178)
> 154c154
> <   Group By Operator [GBY_9] 
> (rows=262/522 width=178)
> ---
> >   Group By Operator [GBY_9] 
> > (rows=262/331 width=178)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >