from:"Eric Lin \(JIRA\)"

[jira] [Comment Edited] (HIVE-22472) Unable to create hive view

2019-11-19 Thread Eric Lin (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978035#comment-16978035
 ] 

Eric Lin edited comment on HIVE-22472 at 11/20/19 3:26 AM:
---

Seems to relate to HIVE-14719, or maybe a duplicate.


was (Author: ericlin):
Seems to relate to HIVE-14719

> Unable to create hive view 
> ---
>
> Key: HIVE-22472
> URL: https://issues.apache.org/jira/browse/HIVE-22472
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5
>Reporter: Ashok
>Priority: Major
>
> unable to create hive view with EXISTS clause:
> Error:
> FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 
> 'sq_1': (possible column names are: _table_or_col lf) file_date) sq_corr_, (. 
> (tok_table_or_col sq_1) sq_corr_1))
>  
> Below reproduction steps:
> -- Setup Tables
> create table bug_part_1 (table_name string, partition_date date, file_date 
> timestamp);
> create table bug_part_2 (id string, file_date timestamp) partitioned by 
> (partition_date date);
> -- Example 1 - Works if just query.
> select vlf.id
>  from bug_part_2 vlf
>  where 1=1
>  and exists (
>  select null
>  from (
>  select max(file_date) file_date, max(partition_date) as partition_date
>  from bug_part_1
>  ) lf
>  where lf.partition_date = vlf.partition_date and lf.file_date = vlf.file_date
>  );
> -- Example 2 - Fails in view.
> create or replace view bug_view
> as
> select vlf.id
>  from bug_part_2 vlf
>  where 1=1
>  and exists (
>  select null
>  from (
>  select max(file_date) file_date, max(partition_date) as partition_date
>  from bug_part_1
>  ) lf
>  where lf.partition_date = vlf.partition_date and lf.file_date = vlf.file_date
>  );
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22472) Unable to create hive view

2019-11-19 Thread Eric Lin (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978035#comment-16978035
 ] 

Eric Lin commented on HIVE-22472:
-

Seems to relate to HIVE-14719

> Unable to create hive view 
> ---
>
> Key: HIVE-22472
> URL: https://issues.apache.org/jira/browse/HIVE-22472
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5
>Reporter: Ashok
>Priority: Major
>
> unable to create hive view with EXISTS clause:
> Error:
> FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 
> 'sq_1': (possible column names are: _table_or_col lf) file_date) sq_corr_, (. 
> (tok_table_or_col sq_1) sq_corr_1))
>  
> Below reproduction steps:
> -- Setup Tables
> create table bug_part_1 (table_name string, partition_date date, file_date 
> timestamp);
> create table bug_part_2 (id string, file_date timestamp) partitioned by 
> (partition_date date);
> -- Example 1 - Works if just query.
> select vlf.id
>  from bug_part_2 vlf
>  where 1=1
>  and exists (
>  select null
>  from (
>  select max(file_date) file_date, max(partition_date) as partition_date
>  from bug_part_1
>  ) lf
>  where lf.partition_date = vlf.partition_date and lf.file_date = vlf.file_date
>  );
> -- Example 2 - Fails in view.
> create or replace view bug_view
> as
> select vlf.id
>  from bug_part_2 vlf
>  where 1=1
>  and exists (
>  select null
>  from (
>  select max(file_date) file_date, max(partition_date) as partition_date
>  from bug_part_1
>  ) lf
>  where lf.partition_date = vlf.partition_date and lf.file_date = vlf.file_date
>  );
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22472) Unable to create hive view

2019-11-19 Thread Eric Lin (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978034#comment-16978034
 ] 

Eric Lin commented on HIVE-22472:
-

Setting hive.cbo.enable=true can help to avoid the issue, but not sure why yet.

> Unable to create hive view 
> ---
>
> Key: HIVE-22472
> URL: https://issues.apache.org/jira/browse/HIVE-22472
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5
>Reporter: Ashok
>Priority: Major
>
> unable to create hive view with EXISTS clause:
> Error:
> FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 
> 'sq_1': (possible column names are: _table_or_col lf) file_date) sq_corr_, (. 
> (tok_table_or_col sq_1) sq_corr_1))
>  
> Below reproduction steps:
> -- Setup Tables
> create table bug_part_1 (table_name string, partition_date date, file_date 
> timestamp);
> create table bug_part_2 (id string, file_date timestamp) partitioned by 
> (partition_date date);
> -- Example 1 - Works if just query.
> select vlf.id
>  from bug_part_2 vlf
>  where 1=1
>  and exists (
>  select null
>  from (
>  select max(file_date) file_date, max(partition_date) as partition_date
>  from bug_part_1
>  ) lf
>  where lf.partition_date = vlf.partition_date and lf.file_date = vlf.file_date
>  );
> -- Example 2 - Fails in view.
> create or replace view bug_view
> as
> select vlf.id
>  from bug_part_2 vlf
>  where 1=1
>  and exists (
>  select null
>  from (
>  select max(file_date) file_date, max(partition_date) as partition_date
>  from bug_part_1
>  ) lf
>  where lf.partition_date = vlf.partition_date and lf.file_date = vlf.file_date
>  );
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-22091) NPE on HS2 start up due to bad data in FUNCS table

2019-08-08 Thread Eric Lin (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin reassigned HIVE-22091:
---

Assignee: Eric Lin

> NPE on HS2 start up due to bad data in FUNCS table
> --
>
> Key: HIVE-22091
> URL: https://issues.apache.org/jira/browse/HIVE-22091
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Major
>
> If FUNCS table contains a stale DB_ID that has no links in DBS table, HS2 
> will fail to start up with NPE error:
> {code:bash}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:220)
> at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:338)
> at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:299)
> at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:274)
> at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:256)
> at 
> org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider.init(DefaultHiveAuthorizationProvider.java:29)
> at 
> org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProviderBase.setConf(HiveAuthorizationProviderBase.java:112)
> ... 21 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.NullPointerException)
> at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3646)
> at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:231)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:215)
> ... 27 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Resolved] (HIVE-14903) from_utc_time function issue for CET daylight savings

2017-10-11 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin resolved HIVE-14903.
-
Resolution: Fixed

Tested in CDH5.11, confirm fixed, but not sure which JIRA has the fix in the 
upstream, will just resolve it for now.

> from_utc_time function issue for CET daylight savings
> -
>
> Key: HIVE-14903
> URL: https://issues.apache.org/jira/browse/HIVE-14903
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Eric Lin
>Priority: Minor
>
> Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the 
> summer time is between 1:00 UTC on the last Sunday of March and 1:00 on the 
> last Sunday of October, see test case below:
> Impala:
> {code}
> select from_utc_timestamp('2016-10-30 00:30:00','CET');
> Query: select from_utc_timestamp('2016-10-30 00:30:00','CET')
> +--+
> | from_utc_timestamp('2016-10-30 00:30:00', 'cet') |
> +--+
> | 2016-10-30 01:30:00  |
> +--+
> {code}
> Hive:
> {code}
> select from_utc_timestamp('2016-10-30 00:30:00','CET');
> INFO  : OK
> ++--+
> |  _c0   |
> ++--+
> | 2016-10-30 01:30:00.0  |
> ++--+
> {code}
> MySQL:
> {code}
> mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' );
> +---+
> | CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) |
> +---+
> | 2016-10-30 02:30:00   |
> +---+
> {code}
> At 00:30AM UTC, the daylight saving has not finished so the time different 
> should still be 2 hours rather than 1. MySQL returned correct result
> At 1:30, results are correct:
> Impala:
> {code}
> Query: select from_utc_timestamp('2016-10-30 01:30:00','CET')
> +--+
> | from_utc_timestamp('2016-10-30 01:30:00', 'cet') |
> +--+
> | 2016-10-30 02:30:00  |
> +--+
> Fetched 1 row(s) in 0.01s
> {code}
> Hive:
> {code}
> ++--+
> |  _c0   |
> ++--+
> | 2016-10-30 02:30:00.0  |
> ++--+
> 1 row selected (0.252 seconds)
> {code}
> MySQL:
> {code}
> mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' );
> +---+
> | CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) |
> +---+
> | 2016-10-30 02:30:00   |
> +---+
> 1 row in set (0.00 sec)
> {code}
> Seems like a bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14903) from_utc_time function issue for CET daylight savings

2017-10-11 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201479#comment-16201479
 ] 

Eric Lin commented on HIVE-14903:
-

Thanks for letting me know [~zsombor.klara], and apologies for the delay.

> from_utc_time function issue for CET daylight savings
> -
>
> Key: HIVE-14903
> URL: https://issues.apache.org/jira/browse/HIVE-14903
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Eric Lin
>Priority: Minor
>
> Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the 
> summer time is between 1:00 UTC on the last Sunday of March and 1:00 on the 
> last Sunday of October, see test case below:
> Impala:
> {code}
> select from_utc_timestamp('2016-10-30 00:30:00','CET');
> Query: select from_utc_timestamp('2016-10-30 00:30:00','CET')
> +--+
> | from_utc_timestamp('2016-10-30 00:30:00', 'cet') |
> +--+
> | 2016-10-30 01:30:00  |
> +--+
> {code}
> Hive:
> {code}
> select from_utc_timestamp('2016-10-30 00:30:00','CET');
> INFO  : OK
> ++--+
> |  _c0   |
> ++--+
> | 2016-10-30 01:30:00.0  |
> ++--+
> {code}
> MySQL:
> {code}
> mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' );
> +---+
> | CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) |
> +---+
> | 2016-10-30 02:30:00   |
> +---+
> {code}
> At 00:30AM UTC, the daylight saving has not finished so the time different 
> should still be 2 hours rather than 1. MySQL returned correct result
> At 1:30, results are correct:
> Impala:
> {code}
> Query: select from_utc_timestamp('2016-10-30 01:30:00','CET')
> +--+
> | from_utc_timestamp('2016-10-30 01:30:00', 'cet') |
> +--+
> | 2016-10-30 02:30:00  |
> +--+
> Fetched 1 row(s) in 0.01s
> {code}
> Hive:
> {code}
> ++--+
> |  _c0   |
> ++--+
> | 2016-10-30 02:30:00.0  |
> ++--+
> 1 row selected (0.252 seconds)
> {code}
> MySQL:
> {code}
> mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' );
> +---+
> | CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) |
> +---+
> | 2016-10-30 02:30:00   |
> +---+
> 1 row in set (0.00 sec)
> {code}
> Seems like a bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16794) Default value for hive.spark.client.connect.timeout of 1000ms is too low

2017-06-10 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-16794:

Attachment: HIVE-16794.patch

Increasing the timeout to 5 seconds.

> Default value for hive.spark.client.connect.timeout of 1000ms is too low
> 
>
> Key: HIVE-16794
> URL: https://issues.apache.org/jira/browse/HIVE-16794
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
> Attachments: HIVE-16794.patch
>
>
> Currently the default timeout value for hive.spark.client.connect.timeout is 
> set at 1000ms, which is only 1 second. This is not enough when cluster is 
> busy and user will constantly getting the following timeout errors:
> {code}
> 17/05/03 03:20:08 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) 
> at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156) 
> at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:606) 
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
>  
> Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:220)
>  
> at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  
> at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  
> at java.lang.Thread.run(Thread.java:745) 
> 17/05/03 03:20:08 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 15, (reason: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915) 
> 17/05/03 03:20:16 ERROR yarn.ApplicationMaster: SparkContext did not 
> initialize after waiting for 10 ms. Please check earlier log output for 
> errors. Failing the application. 
> 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with FAILED (diag message: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915) 
> 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Deleting staging directory 
> .sparkStaging/application_1492040605432_11445 
> 17/05/03 03:20:16 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16794) Default value for hive.spark.client.connect.timeout of 1000ms is too low

2017-06-10 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-16794:

Status: Patch Available  (was: Open)

> Default value for hive.spark.client.connect.timeout of 1000ms is too low
> 
>
> Key: HIVE-16794
> URL: https://issues.apache.org/jira/browse/HIVE-16794
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
> Attachments: HIVE-16794.patch
>
>
> Currently the default timeout value for hive.spark.client.connect.timeout is 
> set at 1000ms, which is only 1 second. This is not enough when cluster is 
> busy and user will constantly getting the following timeout errors:
> {code}
> 17/05/03 03:20:08 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) 
> at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156) 
> at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:606) 
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
>  
> Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:220)
>  
> at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  
> at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  
> at java.lang.Thread.run(Thread.java:745) 
> 17/05/03 03:20:08 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 15, (reason: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915) 
> 17/05/03 03:20:16 ERROR yarn.ApplicationMaster: SparkContext did not 
> initialize after waiting for 10 ms. Please check earlier log output for 
> errors. Failing the application. 
> 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with FAILED (diag message: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915) 
> 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Deleting staging directory 
> .sparkStaging/application_1492040605432_11445 
> 17/05/03 03:20:16 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16794) Default value for hive.spark.client.connect.timeout of 1000ms is too low

2017-05-30 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin reassigned HIVE-16794:
---

Assignee: Eric Lin

> Default value for hive.spark.client.connect.timeout of 1000ms is too low
> 
>
> Key: HIVE-16794
> URL: https://issues.apache.org/jira/browse/HIVE-16794
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>
> Currently the default timeout value for hive.spark.client.connect.timeout is 
> set at 1000ms, which is only 1 second. This is not enough when cluster is 
> busy and user will constantly getting the following timeout errors:
> {code}
> 17/05/03 03:20:08 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) 
> at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156) 
> at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:606) 
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
>  
> Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:220)
>  
> at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  
> at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  
> at java.lang.Thread.run(Thread.java:745) 
> 17/05/03 03:20:08 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 15, (reason: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915) 
> 17/05/03 03:20:16 ERROR yarn.ApplicationMaster: SparkContext did not 
> initialize after waiting for 10 ms. Please check earlier log output for 
> errors. Failing the application. 
> 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with FAILED (diag message: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915) 
> 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Deleting staging directory 
> .sparkStaging/application_1492040605432_11445 
> 17/05/03 03:20:16 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-05-23 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020747#comment-16020747
 ] 

Eric Lin commented on HIVE-16029:
-

Hi [~appodictic],

I have modified test so that it passes, please help to review the code at 
https://reviews.apache.org/r/57009/.

Thanks

> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-16029.2.patch, HIVE-16029.3.patch, HIVE-16029.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-05-22 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-16029:

Attachment: HIVE-16029.3.patch

providing latest patch to update *.q.out files

> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-16029.2.patch, HIVE-16029.3.patch, HIVE-16029.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-04-19 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974206#comment-15974206
 ] 

Eric Lin commented on HIVE-16029:
-

Hi [~appodictic],

Thanks for the suggestion. I am trying to run the test for TestCliDriver using 
below command under directory itests/qtest by following the documentation 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Testing:

{code}
mvn test -Dtest=TestCliDriver
{code}

However, it kept failing with below error:

{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on 
project hive-it-qfile: ExecutionException: java.lang.RuntimeException: The 
forked VM terminated without properly saying goodbye. VM crash or System.exit 
called?
[ERROR] Command was /bin/sh -c cd /hadoop/code/hive/itests/qtest && 
/hadoop/jdk1.8.0_91/jre/bin/java -Xmx1024m -XX:MaxPermSize=256M -jar 
/hadoop/code/hive/itests/qtest/target/surefire/surefirebooter7738443094919274008.jar
 /hadoop/code/hive/itests/qtest/target/surefire/surefire4160478088421683107tmp 
/hadoop/code/hive/itests/qtest/target/surefire/surefire_05453129517537389906tmp
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on 
project hive-it-qfile: ExecutionException
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:414)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:357)
Caused by: org.apache.maven.plugin.MojoFailureException: ExecutionException
at 
org.apache.maven.plugin.surefire.SurefirePlugin.assertNoException(SurefirePlugin.java:262)
at 
org.apache.maven.plugin.surefire.SurefirePlugin.handleSummary(SurefirePlugin.java:252)
at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:854)
at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:722)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
... 19 more
Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException
at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:343)
at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:178)
at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:990)
at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:824)
... 22 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
The forked VM terminated without properly saying goodbye. VM crash or 
System.exit called?
Command was /bin/sh -c cd /hadoop/code/hive/itests/qtest && 
/hadoop/jdk1.8.0_91/jre/bin/java -Xmx1024m -XX:MaxPermSize=256M -jar 
/hadoop/code/hive/itests/qtest/target/surefire/surefirebooter7738443094919274008.jar
 /hadoop/code/hive/itests/qtest/target/surefire/surefire4160478088421683107tmp

[jira] [Commented] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-04-15 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969931#comment-15969931
 ] 

Eric Lin commented on HIVE-16029:
-

Review is also updated: https://reviews.apache.org/r/57009/.

Please help to review and see if there is any other changes required.

> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-16029.2.patch, HIVE-16029.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-04-15 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-16029:

Attachment: HIVE-16029.2.patch

Attaching new patch so that COLLECT_SET takes two arguments, first one is the 
same as before, second one is boolean value of true or false, which was 
suggested by Edward.

> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-16029.2.patch, HIVE-16029.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15166) Provide beeline option to set the jline history max size

2017-03-08 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-15166:

Attachment: HIVE-15166.3.patch

New patch based on latest Hive master code base.

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-15166.2.patch, HIVE-15166.3.patch, HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15166) Provide beeline option to set the jline history max size

2017-02-24 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883444#comment-15883444
 ] 

Eric Lin commented on HIVE-15166:
-

Looks like I need to rebase my code, as a lot has changed. Will update again 
the patch on coming Monday.

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-15166.2.patch, HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-02-23 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881671#comment-15881671
 ] 

Eric Lin commented on HIVE-16029:
-

Review request sent: https://reviews.apache.org/r/57009/

> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-16029.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-02-23 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-16029:

Attachment: HIVE-16029.patch

Adding first patch.

> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-16029.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-02-23 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-16029:

Status: Patch Available  (was: Open)

> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-16029.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-02-23 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-16029:

Description: 
See the test case below:

{code}
0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
+-+
| collect_set_test.a  |
+-+
| 1   |
| 2   |
| NULL|
| 4   |
| NULL|
+-+

0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
collect_set_test;
+---+
|  _c0  |
+---+
| [1,2,4]  |
+---+

{code}

The correct result should be:

{code}
0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
collect_set_test;
+---+
|  _c0  |
+---+
| [1,2,null,4]  |
+---+
{code}

  was:
See the test case below:

0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
+-+
| collect_set_test.a  |
+-+
| 1   |
| 2   |
| NULL|
| 4   |
| NULL|
+-+

0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
collect_set_test;
+---+
|  _c0  |
+---+
| [1,2,4]  |
+---+

The correct result should be:

0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
collect_set_test;
+---+
|  _c0  |
+---+
| [1,2,null,4]  |
+---+


> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-02-23 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin reassigned HIVE-16029:
---


> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
>
> See the test case below:
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> The correct result should be:
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15166) Provide beeline option to set the jline history max size

2017-02-22 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-15166:

Attachment: HIVE-15166.2.patch

Re-attach patch

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-15166.2.patch, HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15166) Provide beeline option to set the jline history max size

2017-02-22 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-15166:

Attachment: (was: HIVE-15166.2.patch)

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15166) Provide beeline option to set the jline history max size

2017-01-16 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825584#comment-15825584
 ] 

Eric Lin commented on HIVE-15166:
-

Hi [~aihuaxu],

I have created review: https://reviews.apache.org/r/55605/

Thanks

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-15166.2.patch, HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15166) Provide beeline option to set the jline history max size

2017-01-16 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-15166:

Attachment: HIVE-15166.2.patch

Hi [~aihuaxu],

I have updated the patch using latest master branch. However, I am not able to 
build Hive due to the following error:

{code}
[INFO] Scanning for projects...
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[ERROR] 'dependencies.dependency.version' for 
org.powermock:powermock-module-junit4:jar must be a valid version but is 
'${powermock.version}'. @ line 782, column 16
[ERROR] 'dependencies.dependency.version' for 
org.powermock:powermock-api-mockito:jar must be a valid version but is 
'${powermock.version}'. @ line 788, column 16
 @
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR]   The project org.apache.hive:hive:2.2.0-SNAPSHOT 
(/Users/ericlin/hadoop/hive/pom.xml) has 2 errors
[ERROR] 'dependencies.dependency.version' for 
org.powermock:powermock-module-junit4:jar must be a valid version but is 
'${powermock.version}'. @ line 782, column 16
[ERROR] 'dependencies.dependency.version' for 
org.powermock:powermock-api-mockito:jar must be a valid version but is 
'${powermock.version}'. @ line 788, column 16
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
{code}

This happens on master code, so I am not able to test the change. I will see if 
I can fix it. But in the mean time, can you please help to review the change 
and provide me further feedback?

Thanks

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-15166.2.patch, HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15166) Provide beeline option to set the jline history max size

2017-01-15 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823492#comment-15823492
 ] 

Eric Lin commented on HIVE-15166:
-

Hi [~aihuaxu],

Should I create a JIRA review for you?

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15166) Provide beeline option to set the jline history max size

2017-01-15 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823486#comment-15823486
 ] 

Eric Lin commented on HIVE-15166:
-

[~aihuaxu],

Thanks for the comment. Please give me sometime to review it. It has been a 
while since I submitted the patch. I will provide a new patch soon.

Thanks

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15166) Provide beeline option to set the jline history max size

2016-11-08 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-15166:

Priority: Minor  (was: Major)

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15166) Provide beeline option to set the jline history max size

2016-11-08 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-15166:

Assignee: Eric Lin

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14482) Add and drop table partition is not audit logged in HMS

2016-10-20 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593695#comment-15593695
 ] 

Eric Lin edited comment on HIVE-14482 at 10/21/16 2:12 AM:
---

The error seems to be unrelated (please see the attached screenshot) and there 
is no existing test coverage for HMS audit logs, also it is pretty hard to test 
it.

Please advise if the patch can be accepted.

Thanks


was (Author: ericlin):
The error seems to be unrelated and there is no existing test coverage for HMS 
audit logs, also it is pretty hard to test it.

Please advise if the patch can be accepted.

Thanks

> Add and drop table partition is not audit logged in HMS
> ---
>
> Key: HIVE-14482
> URL: https://issues.apache.org/jira/browse/HIVE-14482
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-14482.2.patch, HIVE-14482.patch, Screen Shot 
> 2016-10-21 at 1.06.33 pm.png
>
>
> When running:
> {code}
> ALTER TABLE test DROP PARTITION (b=140);
> {code}
> I only see the following in the HMS log:
> {code}
> 2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,082 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154094 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : 
> db=default tbl=test
> 2016-08-08 23:12:34,096 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr 
> : db=default tbl=test
> 2016-08-08 23:12:34,112 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  start=1470723154095 end=1470723154112 duration=17 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,172 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,173 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,173 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154186 duration=14 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,187 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,187 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,199 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154199 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,203 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,215 INFO  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported 
> only on partition keys of type string
> 2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: 
> [pool-4-thread-2]: Could not find uri with key 
> [dfs.encryption.key.provider.uri] to create a keyProvider !!
> 2016-08-08 23:12:34,239 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
>

[jira] [Updated] (HIVE-14482) Add and drop table partition is not audit logged in HMS

2016-10-20 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14482:

Attachment: Screen Shot 2016-10-21 at 1.06.33 pm.png

The error seems to be unrelated and there is no existing test coverage for HMS 
audit logs, also it is pretty hard to test it.

Please advise if the patch can be accepted.

Thanks

> Add and drop table partition is not audit logged in HMS
> ---
>
> Key: HIVE-14482
> URL: https://issues.apache.org/jira/browse/HIVE-14482
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-14482.2.patch, HIVE-14482.patch, Screen Shot 
> 2016-10-21 at 1.06.33 pm.png
>
>
> When running:
> {code}
> ALTER TABLE test DROP PARTITION (b=140);
> {code}
> I only see the following in the HMS log:
> {code}
> 2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,082 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154094 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : 
> db=default tbl=test
> 2016-08-08 23:12:34,096 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr 
> : db=default tbl=test
> 2016-08-08 23:12:34,112 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  start=1470723154095 end=1470723154112 duration=17 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,172 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,173 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,173 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154186 duration=14 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,187 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,187 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,199 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154199 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,203 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,215 INFO  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported 
> only on partition keys of type string
> 2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: 
> [pool-4-thread-2]: Could not find uri with key 
> [dfs.encryption.key.provider.uri] to create a keyProvider !!
> 2016-08-08 23:12:34,239 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: dropPartition() will move partition-directories to 
> trash-directory.
> 2016-08-08 23:12:34,239 INFO  hive.metastore.hivemetastoressimpl: 
> [pool-4-thread-2]: deleting  
> hdfs://:8020/user/hive/warehouse/default/test/b=140
> 2016-08-08 23:12:34,247 INFO

[jira] [Updated] (HIVE-14482) Add and drop table partition is not audit logged in HMS

2016-10-20 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14482:

Attachment: HIVE-14482.2.patch

> Add and drop table partition is not audit logged in HMS
> ---
>
> Key: HIVE-14482
> URL: https://issues.apache.org/jira/browse/HIVE-14482
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-14482.2.patch, HIVE-14482.patch
>
>
> When running:
> {code}
> ALTER TABLE test DROP PARTITION (b=140);
> {code}
> I only see the following in the HMS log:
> {code}
> 2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,082 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154094 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : 
> db=default tbl=test
> 2016-08-08 23:12:34,096 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr 
> : db=default tbl=test
> 2016-08-08 23:12:34,112 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  start=1470723154095 end=1470723154112 duration=17 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,172 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,173 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,173 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154186 duration=14 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,187 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,187 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,199 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154199 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,203 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,215 INFO  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported 
> only on partition keys of type string
> 2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: 
> [pool-4-thread-2]: Could not find uri with key 
> [dfs.encryption.key.provider.uri] to create a keyProvider !!
> 2016-08-08 23:12:34,239 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: dropPartition() will move partition-directories to 
> trash-directory.
> 2016-08-08 23:12:34,239 INFO  hive.metastore.hivemetastoressimpl: 
> [pool-4-thread-2]: deleting  
> hdfs://:8020/user/hive/warehouse/default/test/b=140
> 2016-08-08 23:12:34,247 INFO  org.apache.hadoop.fs.TrashPolicyDefault: 
> [pool-4-thread-2]: Moved: 
> 'hdfs://:8020/user/hive/warehouse/default/test/b=140' to trash at: 
> hdfs://:8020/user/hive/.Trash/Current/user/hive/warehouse/default/test/b=140
> 2016-08-08 23:12:34,247 INFO

[jira] [Commented] (HIVE-14903) from_utc_time function issue for CET daylight savings

2016-10-06 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551569#comment-15551569
 ] 

Eric Lin commented on HIVE-14903:
-

Both Hive and Impala seem to have the issue, so also created 
https://issues.cloudera.org/browse/IMPALA-4250

> from_utc_time function issue for CET daylight savings
> -
>
> Key: HIVE-14903
> URL: https://issues.apache.org/jira/browse/HIVE-14903
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Eric Lin
>Priority: Minor
>
> Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the 
> summer time is between 1:00 UTC on the last Sunday of March and 1:00 on the 
> last Sunday of October, see test case below:
> Impala:
> {code}
> select from_utc_timestamp('2016-10-30 00:30:00','CET');
> Query: select from_utc_timestamp('2016-10-30 00:30:00','CET')
> +--+
> | from_utc_timestamp('2016-10-30 00:30:00', 'cet') |
> +--+
> | 2016-10-30 01:30:00  |
> +--+
> {code}
> Hive:
> {code}
> select from_utc_timestamp('2016-10-30 00:30:00','CET');
> INFO  : OK
> ++--+
> |  _c0   |
> ++--+
> | 2016-10-30 01:30:00.0  |
> ++--+
> {code}
> MySQL:
> {code}
> mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' );
> +---+
> | CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) |
> +---+
> | 2016-10-30 02:30:00   |
> +---+
> {code}
> At 00:30AM UTC, the daylight saving has not finished so the time different 
> should still be 2 hours rather than 1. MySQL returned correct result
> At 1:30, results are correct:
> Impala:
> {code}
> Query: select from_utc_timestamp('2016-10-30 01:30:00','CET')
> +--+
> | from_utc_timestamp('2016-10-30 01:30:00', 'cet') |
> +--+
> | 2016-10-30 02:30:00  |
> +--+
> Fetched 1 row(s) in 0.01s
> {code}
> Hive:
> {code}
> ++--+
> |  _c0   |
> ++--+
> | 2016-10-30 02:30:00.0  |
> ++--+
> 1 row selected (0.252 seconds)
> {code}
> MySQL:
> {code}
> mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' );
> +---+
> | CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) |
> +---+
> | 2016-10-30 02:30:00   |
> +---+
> 1 row in set (0.00 sec)
> {code}
> Seems like a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14903) from_utc_time function issue for CET daylight savings

2016-10-06 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14903:

Description: 
Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the summer 
time is between 1:00 UTC on the last Sunday of March and 1:00 on the last 
Sunday of October, see test case below:

Impala:

{code}
select from_utc_timestamp('2016-10-30 00:30:00','CET');
Query: select from_utc_timestamp('2016-10-30 00:30:00','CET')
+--+
| from_utc_timestamp('2016-10-30 00:30:00', 'cet') |
+--+
| 2016-10-30 01:30:00  |
+--+
{code}

Hive:

{code}
select from_utc_timestamp('2016-10-30 00:30:00','CET');
INFO  : OK
++--+
|  _c0   |
++--+
| 2016-10-30 01:30:00.0  |
++--+
{code}

MySQL:

{code}
mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' );
+---+
| CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) |
+---+
| 2016-10-30 02:30:00   |
+---+
{code}

At 00:30AM UTC, the daylight saving has not finished so the time different 
should still be 2 hours rather than 1. MySQL returned correct result

At 1:30, results are correct:

Impala:

{code}
Query: select from_utc_timestamp('2016-10-30 01:30:00','CET')
+--+
| from_utc_timestamp('2016-10-30 01:30:00', 'cet') |
+--+
| 2016-10-30 02:30:00  |
+--+
Fetched 1 row(s) in 0.01s
{code}

Hive:

{code}
++--+
|  _c0   |
++--+
| 2016-10-30 02:30:00.0  |
++--+
1 row selected (0.252 seconds)
{code}

MySQL:

{code}
mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' );
+---+
| CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) |
+---+
| 2016-10-30 02:30:00   |
+---+
1 row in set (0.00 sec)
{code}

Seems like a bug.

  was:
Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the summer 
time is between 1:00 UTC on the last Sunday of March and 1:00 on the last 
Sunday of October, see test case below:

Impala:

{code}
[host-10-17-101-195.coe.cloudera.com:25003] > select 
from_utc_timestamp('2016-10-30 00:30:00','CET');
Query: select from_utc_timestamp('2016-10-30 00:30:00','CET')
+--+
| from_utc_timestamp('2016-10-30 00:30:00', 'cet') |
+--+
| 2016-10-30 01:30:00  |
+--+
{code}

Hive:

{code}
0: jdbc:hive2://host-10-17-101-195.coe.cloude> select 
from_utc_timestamp('2016-10-30 00:30:00','CET');
INFO  : OK
++--+
|  _c0   |
++--+
| 2016-10-30 01:30:00.0  |
++--+
{code}

MySQL:

{code}
mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' );
+---+
| CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) |
+---+
| 2016-10-30 02:30:00   |
+---+
{code}

At 00:30AM UTC, the daylight saving has not finished so the time different 
should still be 2 hours rather than 1. MySQL returned correct result

At 1:30, results are correct:

Impala:

{code}
Query: select from_utc_timestamp('2016-10-30 01:30:00','CET')
+--+
| from_utc_timestamp('2016-10-30 01:30:00', 'cet') |
+--+
| 2016-10-30 02:30:00  |
+--+
Fetched 1 row(s) in 0.01s
{code}

Hive:

{code}
++--+
|  _c0   |
++--+
| 2016-10-30 02:30:00.0  |
++--+
1 row selected (0.252 seconds)
{code}

MySQL:

{code}
mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' );
+---+
| CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) |
+---+
| 2016-10-30 02:30:00   |
+---+
1 row in set (0.00 sec)
{code}

Seems like a bug.

[jira] [Updated] (HIVE-14482) Add and drop table partition is not audit logged in HMS

2016-09-07 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14482:

Status: Patch Available  (was: Open)

Added audit log for add, drop and rename partitions

> Add and drop table partition is not audit logged in HMS
> ---
>
> Key: HIVE-14482
> URL: https://issues.apache.org/jira/browse/HIVE-14482
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-14482.patch
>
>
> When running:
> {code}
> ALTER TABLE test DROP PARTITION (b=140);
> {code}
> I only see the following in the HMS log:
> {code}
> 2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,082 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154094 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : 
> db=default tbl=test
> 2016-08-08 23:12:34,096 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr 
> : db=default tbl=test
> 2016-08-08 23:12:34,112 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  start=1470723154095 end=1470723154112 duration=17 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,172 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,173 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,173 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154186 duration=14 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,187 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,187 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,199 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154199 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,203 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,215 INFO  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported 
> only on partition keys of type string
> 2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: 
> [pool-4-thread-2]: Could not find uri with key 
> [dfs.encryption.key.provider.uri] to create a keyProvider !!
> 2016-08-08 23:12:34,239 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: dropPartition() will move partition-directories to 
> trash-directory.
> 2016-08-08 23:12:34,239 INFO  hive.metastore.hivemetastoressimpl: 
> [pool-4-thread-2]: deleting  
> hdfs://:8020/user/hive/warehouse/default/test/b=140
> 2016-08-08 23:12:34,247 INFO  org.apache.hadoop.fs.TrashPolicyDefault: 
> [pool-4-thread-2]: Moved: 
> 'hdfs://:8020/user/hive/warehouse/default/test/b=140' to trash at: 
> hdfs://:8020/user/hive/.Trash/Current/user/hive/warehouse/default/test/b=140
>

[jira] [Updated] (HIVE-14482) Add and drop table partition is not audit logged in HMS

2016-09-07 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14482:

Attachment: HIVE-14482.patch

> Add and drop table partition is not audit logged in HMS
> ---
>
> Key: HIVE-14482
> URL: https://issues.apache.org/jira/browse/HIVE-14482
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-14482.patch
>
>
> When running:
> {code}
> ALTER TABLE test DROP PARTITION (b=140);
> {code}
> I only see the following in the HMS log:
> {code}
> 2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,082 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154094 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : 
> db=default tbl=test
> 2016-08-08 23:12:34,096 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr 
> : db=default tbl=test
> 2016-08-08 23:12:34,112 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  start=1470723154095 end=1470723154112 duration=17 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,172 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,173 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,173 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154186 duration=14 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,187 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,187 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,199 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154199 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,203 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,215 INFO  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported 
> only on partition keys of type string
> 2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: 
> [pool-4-thread-2]: Could not find uri with key 
> [dfs.encryption.key.provider.uri] to create a keyProvider !!
> 2016-08-08 23:12:34,239 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: dropPartition() will move partition-directories to 
> trash-directory.
> 2016-08-08 23:12:34,239 INFO  hive.metastore.hivemetastoressimpl: 
> [pool-4-thread-2]: deleting  
> hdfs://:8020/user/hive/warehouse/default/test/b=140
> 2016-08-08 23:12:34,247 INFO  org.apache.hadoop.fs.TrashPolicyDefault: 
> [pool-4-thread-2]: Moved: 
> 'hdfs://:8020/user/hive/warehouse/default/test/b=140' to trash at: 
> hdfs://:8020/user/hive/.Trash/Current/user/hive/warehouse/default/test/b=140
> 2016-08-08 23:12:34,247 INFO

[jira] [Updated] (HIVE-14482) Add and drop table partition is not audit logged in HMS

2016-08-14 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14482:

Summary: Add and drop table partition is not audit logged in HMS  (was: 
Drop table partition is not audit logged in HMS)

> Add and drop table partition is not audit logged in HMS
> ---
>
> Key: HIVE-14482
> URL: https://issues.apache.org/jira/browse/HIVE-14482
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
>
> When running:
> {code}
> ALTER TABLE test DROP PARTITION (b=140);
> {code}
> I only see the following in the HMS log:
> {code}
> 2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,082 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154094 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : 
> db=default tbl=test
> 2016-08-08 23:12:34,096 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr 
> : db=default tbl=test
> 2016-08-08 23:12:34,112 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  start=1470723154095 end=1470723154112 duration=17 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,172 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,173 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,173 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154186 duration=14 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,187 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
> 2016-08-08 23:12:34,187 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
> ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
> tbl=test
> 2016-08-08 23:12:34,199 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  end=1470723154199 duration=13 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=2 
> retryCount=0 error=false>
> 2016-08-08 23:12:34,203 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
> [pool-4-thread-2]:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2016-08-08 23:12:34,215 INFO  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported 
> only on partition keys of type string
> 2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: 
> [pool-4-thread-2]: Could not find uri with key 
> [dfs.encryption.key.provider.uri] to create a keyProvider !!
> 2016-08-08 23:12:34,239 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [pool-4-thread-2]: dropPartition() will move partition-directories to 
> trash-directory.
> 2016-08-08 23:12:34,239 INFO  hive.metastore.hivemetastoressimpl: 
> [pool-4-thread-2]: deleting  
> hdfs://:8020/user/hive/warehouse/default/test/b=140
> 2016-08-08 23:12:34,247 INFO  org.apache.hadoop.fs.TrashPolicyDefault: 
> [pool-4-thread-2]: Moved: 
> 'hdfs://:8020/user/hive/warehouse/default/test/b=140' to trash at: 
> hdfs://:8020/user/hive/.Trash/Current/user/hive/warehouse/default/test/b=140
> 2016-08-08

[jira] [Commented] (HIVE-14537) Please add HMS audit logs for ADD and CHANGE COLUMN operations

2016-08-14 Thread Eric Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420528#comment-15420528
 ] 

Eric Lin commented on HIVE-14537:
-

This one seems a bit more complicated than HIVE-14482, so I created this 
separately.

> Please add HMS audit logs for ADD and CHANGE COLUMN operations
> --
>
> Key: HIVE-14537
> URL: https://issues.apache.org/jira/browse/HIVE-14537
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Lin
>Priority: Minor
>
> Currently if you ALTER TABLE test ADD COLUMNS (c int), the only audit log we 
> can see is:
> {code}
> 2016-08-09T13:29:56,411 INFO  [pool-6-thread-2]: metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(754)) - 2: source:127.0.0.1 alter_table: 
> db=default tbl=test newtbl=test
> {code}
> This is not enough to tell which columns are added or changed. It would be 
> useful to add such information.
> Ideally we could see:
> {code}
> 2016-08-09T13:29:56,411 INFO  [pool-6-thread-2]: metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(754)) - 2: source:127.0.0.1 alter_table: 
> db=default tbl=test newtbl=test newCol=c[int]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14482) Drop table partition is not audit logged in HMS

2016-08-09 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14482:

Description: 
When running:

{code}
ALTER TABLE test DROP PARTITION (b=140);
{code}

I only see the following in the HMS log:

{code}
2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
2016-08-08 23:12:34,082 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
tbl=test
2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : db=default 
tbl=test
2016-08-08 23:12:34,096 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr : 
db=default tbl=test
2016-08-08 23:12:34,112 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,172 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,173 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
2016-08-08 23:12:34,173 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
tbl=test
2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,187 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
2016-08-08 23:12:34,187 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
tbl=test
2016-08-08 23:12:34,199 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,203 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,215 INFO  org.apache.hadoop.hive.metastore.ObjectStore: 
[pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported 
only on partition keys of type string
2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: 
[pool-4-thread-2]: Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
2016-08-08 23:12:34,239 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: dropPartition() will move partition-directories to 
trash-directory.
2016-08-08 23:12:34,239 INFO  hive.metastore.hivemetastoressimpl: 
[pool-4-thread-2]: deleting  
hdfs://:8020/user/hive/warehouse/default/test/b=140
2016-08-08 23:12:34,247 INFO  org.apache.hadoop.fs.TrashPolicyDefault: 
[pool-4-thread-2]: Moved: 
'hdfs://:8020/user/hive/warehouse/default/test/b=140' to trash at: 
hdfs://:8020/user/hive/.Trash/Current/user/hive/warehouse/default/test/b=140
2016-08-08 23:12:34,247 INFO  hive.metastore.hivemetastoressimpl: 
[pool-4-thread-2]: Moved to trash: 
hdfs://:8020/user/hive/warehouse/default/test/b=140
2016-08-08 23:12:34,247 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
{code}

There is no entry in the "HiveMetaStore.audit" to show that partition b=140 was 
dropped.

When we add a new partition, we can see the following:

{code}
2016-08-08 23:04:48,534 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx append_partition : 
db=default tbl=test[130]
{code}

Ideally we should see the similar message when dropping partitions.

  was:
When running:

{code}
ALTER TABLE test DROP PARTITION (b=140);
{code}

I only see the following in the HMS log:

{code}
2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=case_104408 tbl=test
2016-08-08 23:12:34,082 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : 
db=case_104408 tbl=test
2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,095 INFO

[jira] [Updated] (HIVE-13160) HS2 unable to load UDFs on startup when HMS is not ready

2016-02-25 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-13160:

Description: 
The error looks like this:

{code}
2016-02-18 14:43:54,251 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083
2016-02-18 14:48:54,692 WARN  hive.metastore: [main]: Failed to connect to the 
MetaStore Server...
2016-02-18 14:48:54,692 INFO  hive.metastore: [main]: Waiting 1 seconds before 
next connection attempt.
2016-02-18 14:48:55,692 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083
2016-02-18 14:53:55,800 WARN  hive.metastore: [main]: Failed to connect to the 
MetaStore Server...
2016-02-18 14:53:55,800 INFO  hive.metastore: [main]: Waiting 1 seconds before 
next connection attempt.
2016-02-18 14:53:56,801 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083
2016-02-18 14:58:56,967 WARN  hive.metastore: [main]: Failed to connect to the 
MetaStore Server...
2016-02-18 14:58:56,967 INFO  hive.metastore: [main]: Waiting 1 seconds before 
next connection attempt.
2016-02-18 14:58:57,994 WARN  hive.ql.metadata.Hive: [main]: Failed to register 
all functions.
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1492)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2915)
...
016-02-18 14:58:57,997 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083
2016-02-18 15:03:58,094 WARN  hive.metastore: [main]: Failed to connect to the 
MetaStore Server...
2016-02-18 15:03:58,095 INFO  hive.metastore: [main]: Waiting 1 seconds before 
next connection attempt.
2016-02-18 15:03:59,095 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083
2016-02-18 15:08:59,203 WARN  hive.metastore: [main]: Failed to connect to the 
MetaStore Server...
2016-02-18 15:08:59,203 INFO  hive.metastore: [main]: Waiting 1 seconds before 
next connection attempt.
2016-02-18 15:09:00,203 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083
2016-02-18 15:14:00,304 WARN  hive.metastore: [main]: Failed to connect to the 
MetaStore Server...
2016-02-18 15:14:00,304 INFO  hive.metastore: [main]: Waiting 1 seconds before 
next connection attempt.
2016-02-18 15:14:01,306 INFO  org.apache.hive.service.server.HiveServer2: 
[main]: Shutting down HiveServer2
2016-02-18 15:14:01,308 INFO  org.apache.hive.service.server.HiveServer2: 
[main]: Exception caught when calling stop of HiveServer2 before retrying start
java.lang.NullPointerException
at org.apache.hive.service.server.HiveServer2.stop(HiveServer2.java:283)
at 
org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:351)
at 
org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:69)
at 
org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:545)
{code}

And then none of the functions will be available for use as HS2 does not 
re-register them after HMS is up and ready.

This is not desired behaviour, we shouldn't allow HS2 to be in a servicing 
state if function list is not ready. Or, maybe instead of initialize the 
function list when HS2 starts, try to load the function list when each Hive 
session is created. Of course we can have a cache of function list somewhere 
for better performance, but we would better decouple it from class Hive.

  was:
The error looks like this:

{code}
2016-02-24 21:16:09,901 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083
2016-02-24 21:16:09,971 WARN  hive.metastore: [main]: Failed to connect to the 
MetaStore Server...
2016-02-24 21:16:09,971 INFO  hive.metastore: [main]: Waiting 1 seconds before 
next connection attempt.
2016-02-24 21:16:10,971 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083
2016-02-24 21:16:10,975 WARN  hive.metastore: [main]: Failed to connect to the 
MetaStore Server...
2016-02-24 21:16:10,976 INFO  hive.metastore: [main]: Waiting 1 seconds before 
next connection attempt.
2016-02-24 21:16:11,976 INFO  hive.metastore: [main]: Trying to connect to 
metastore with URI

[jira] [Updated] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-06 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-12788:

Description: 
See the test case below:

{code}
0: jdbc:hive2://localhost:1/default> create table test (a int);

0: jdbc:hive2://localhost:1/default> insert overwrite table test values (1);

0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
No rows affected (0.01 seconds)

0: jdbc:hive2://localhost:1/default> set 
hive.mapred.supports.subdirectories=true;
No rows affected (0.007 seconds)

0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
SELECT COUNT(1) FROM test;
+--+--+
| _u1._c0  |
+--+--+
+--+--+
{code}

Run the same query without setting hive.mapred.supports.subdirectories and 
hive.optimize.union.remove to true will give correct result:

{code}
0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
SELECT COUNT(1) FROM test;
+--+--+
| _u1._c0  |
+--+--+
| 1|
| 1|
+--+--+
{code}

UNION ALL without COUNT function will work as expected:

{code}
0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT * 
FROM test;
++--+
| _u1.a  |
++--+
| 1  |
| 1  |
++--+
{code}

  was:
See the test case below:

{code}
0: jdbc:hive2://localhost:1/default> create table test (a int);

0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
No rows affected (0.01 seconds)

0: jdbc:hive2://localhost:1/default> set 
hive.mapred.supports.subdirectories=true;
No rows affected (0.007 seconds)

0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
SELECT COUNT(1) FROM test;
+--+--+
| _u1._c0  |
+--+--+
+--+--+
{code}

Run the same query without setting hive.mapred.supports.subdirectories and 
hive.optimize.union.remove to true will give correct result:

{code}
0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
SELECT COUNT(1) FROM test;
+--+--+
| _u1._c0  |
+--+--+
| 1|
| 1|
+--+--+
{code}

UNION ALL without COUNT function will work as expected:

{code}
0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT * 
FROM test;
++--+
| _u1.a  |
++--+
| 1  |
| 1  |
++--+
{code}


> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-06 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-12788:

Description: 
See the test case below:

{code}
0: jdbc:hive2://localhost:1/default> create table test (a int);

0: jdbc:hive2://localhost:1/default> insert overwrite table test values (1);

0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
No rows affected (0.01 seconds)

0: jdbc:hive2://localhost:1/default> set 
hive.mapred.supports.subdirectories=true;
No rows affected (0.007 seconds)

0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
SELECT COUNT(1) FROM test;
+--+--+
| _u1._c0  |
+--+--+
+--+--+
{code}

UNION ALL without COUNT function will work as expected:

{code}
0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT * 
FROM test;
++--+
| _u1.a  |
++--+
| 1  |
| 1  |
++--+
{code}

Run the same query without setting hive.mapred.supports.subdirectories and 
hive.optimize.union.remove to true will give correct result:

{code}
0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
+---+--+
|set|
+---+--+
| hive.optimize.union.remove=false  |
+---+--+

0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
SELECT COUNT(1) FROM test;
+--+--+
| _u1._c0  |
+--+--+
| 1|
| 1|
+--+--+
{code}



  was:
See the test case below:

{code}
0: jdbc:hive2://localhost:1/default> create table test (a int);

0: jdbc:hive2://localhost:1/default> insert overwrite table test values (1);

0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
No rows affected (0.01 seconds)

0: jdbc:hive2://localhost:1/default> set 
hive.mapred.supports.subdirectories=true;
No rows affected (0.007 seconds)

0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
SELECT COUNT(1) FROM test;
+--+--+
| _u1._c0  |
+--+--+
+--+--+
{code}

Run the same query without setting hive.mapred.supports.subdirectories and 
hive.optimize.union.remove to true will give correct result:

{code}
0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
SELECT COUNT(1) FROM test;
+--+--+
| _u1._c0  |
+--+--+
| 1|
| 1|
+--+--+
{code}

UNION ALL without COUNT function will work as expected:

{code}
0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT * 
FROM test;
++--+
| _u1.a  |
++--+
| 1  |
| 1  |
++--+
{code}


> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12506) SHOW CREATE TABLE command creates a table that does not work for RCFile format

2015-11-23 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-12506:

Description: 
See the following test case:

1) Create a table with RCFile format:

{code}
DROP TABLE IF EXISTS test;
CREATE TABLE test (a int) PARTITIONED BY (p int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
STORED AS RCFILE;
{code}

2) run "DESC FORMATTED test"

{code}
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
InputFormat:org.apache.hadoop.hive.ql.io.RCFileInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.RCFileOutputFormat
{code}

shows that SerDe used is "ColumnarSerDe"

3) run "SHOW CREATE TABLE" and get the output:

{code}
CREATE TABLE `test`(
  `a` int)
PARTITIONED BY (
  `p` int)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
  'hdfs://node5.lab.cloudera.com:8020/user/hive/warehouse/case_78732.db/test'
TBLPROPERTIES (
  'transient_lastDdlTime'='1448343875')
{code}

Note that there is no mention of "ColumnarSerDe"

4) Drop the table and then create the table again using the output from 3)

5) Check the output of "DESC FORMATTED test"

{code}
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:org.apache.hadoop.hive.ql.io.RCFileInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.RCFileOutputFormat
{code}

The SerDe falls back to "LazySimpleSerDe", which is not correct.

Any further query tries to INSERT or SELECT this table will fail with errors

I suspect that we can't specify ROW FORMAT DELIMITED with ROW FORMAT SERDE at 
the same time at table creation, this causes confusion to end users as copy 
table structure using "SHOW CREATE TABLE" will not work.


  was:
See the following test case:

1) Create a table with RCFile format:

{code}
DROP TABLE IF EXISTS test;
CREATE TABLE test (a int) PARTITIONED BY (p int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
STORED AS RCFILE;
{code}

2) run "DESC FORMATTED test"

{code}
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
InputFormat:org.apache.hadoop.hive.ql.io.RCFileInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.RCFileOutputFormat
{code}

shows that SerDe used is "ColumnarSerDe"

3) run "SHOW CREATE TABLE" and get the output:

{code}
CREATE TABLE `test`(
  `a` int)
PARTITIONED BY (
  `p` int)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
  'hdfs://node5.lab.cloudera.com:8020/user/hive/warehouse/case_78732.db/test'
TBLPROPERTIES (
  'transient_lastDdlTime'='1448343875')
{code}

Note that there is no mention of "ColumnarSerDe"

4) Drop the table and then create the table again using the output from 3)

5) Check the output of "DESC FORMATTED test"

{code}

# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:org.apache.hadoop.hive.ql.io.RCFileInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.RCFileOutputFormat
{code}

The SerDe falls back to "LazySimpleSerDe", which is not correct.

Any further query tries to INSERT or SELECT this table will fail with errors



> SHOW CREATE TABLE command creates a table that does not work for RCFile format
> --
>
> Key: HIVE-12506
> URL: https://issues.apache.org/jira/browse/HIVE-12506
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>
> See the following test case:
> 1) Create a table with RCFile format:
> {code}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (a int) PARTITIONED BY (p int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
> STORED AS RCFILE;
> {code}
> 2) run "DESC FORMATTED test"
> {code}
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
> InputFormat:  org.apache.hadoop.hive.ql.io.RCFileInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
> {code}
> shows that SerDe used is "ColumnarSerDe"
> 3) run "SHOW CREATE TABLE" and get the output:
> {code}
> CREATE TABLE `test`(
>   `a` int)
> PARTITIONED BY (
>   `p` int)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '|'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
> LOCATION
>

[jira] [Updated] (HIVE-12368) Provide support for different versions of same JAR files for loading UDFs

2015-11-08 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-12368:

Priority: Minor  (was: Major)

> Provide support for different versions of same JAR files for loading UDFs
> -
>
> Key: HIVE-12368
> URL: https://issues.apache.org/jira/browse/HIVE-12368
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Eric Lin
>Assignee: Vaibhav Gumashta
>Priority: Minor
>
> If we want to setup one cluster to support multiple environments, namely DEV, 
> QA, PRE-PROD etc, this is done in the way that data from different 
> environment will be generated into different locations in the HDFS and Hive 
> Databases.
> This works fine, however, when need to deploy UDF classes for different 
> environments, it becomes tricky, as each class has the same namespace, even 
> though we have created udf-dev.jar, udf-qa.jar etc.
> Creating each HS2 per environment is another option, however, with LB setup, 
> it becomes harder.
> The request is to have HS2 support loading UDFs in such environment, the 
> implementation is open to discussion.
> I know that this setup is no ideal, as the better approach is to have one 
> cluster per environment, however, in the case that you have limited number of 
> nodes in the setup, this might be the only option and I believe many people 
> can benefit from it.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12368) Provide support for different versions of same JAR files for loading UDFs

2015-11-08 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-12368:

Description: 
If we want to setup one cluster to support multiple environments, namely DEV, 
QA, PRE-PROD etc, this is done in the way that data from different environments 
can be generated into different locations in the HDFS and Hive Databases.

This works fine, however, when need to deploy UDF classes for different 
environments, it becomes tricky, as each class has the same namespace, even 
though we have created udf-dev.jar, udf-qa.jar etc.

Creating each HS2 per environment is another option, however, with LB setup, it 
becomes harder.

The request is to have HS2 support loading UDFs in such environment, the 
implementation is open to discussion.

I know that this setup is no ideal, as the better approach is to have one 
cluster per environment, however, in the case that you have limited number of 
nodes in the setup, this might be the only option and I believe many people can 
benefit from it.

Thanks

  was:
If we want to setup one cluster to support multiple environments, namely DEV, 
QA, PRE-PROD etc, this is done in the way that data from different environment 
will be generated into different locations in the HDFS and Hive Databases.

This works fine, however, when need to deploy UDF classes for different 
environments, it becomes tricky, as each class has the same namespace, even 
though we have created udf-dev.jar, udf-qa.jar etc.

Creating each HS2 per environment is another option, however, with LB setup, it 
becomes harder.

The request is to have HS2 support loading UDFs in such environment, the 
implementation is open to discussion.

I know that this setup is no ideal, as the better approach is to have one 
cluster per environment, however, in the case that you have limited number of 
nodes in the setup, this might be the only option and I believe many people can 
benefit from it.

Thanks


> Provide support for different versions of same JAR files for loading UDFs
> -
>
> Key: HIVE-12368
> URL: https://issues.apache.org/jira/browse/HIVE-12368
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Eric Lin
>Assignee: Vaibhav Gumashta
>Priority: Minor
>
> If we want to setup one cluster to support multiple environments, namely DEV, 
> QA, PRE-PROD etc, this is done in the way that data from different 
> environments can be generated into different locations in the HDFS and Hive 
> Databases.
> This works fine, however, when need to deploy UDF classes for different 
> environments, it becomes tricky, as each class has the same namespace, even 
> though we have created udf-dev.jar, udf-qa.jar etc.
> Creating each HS2 per environment is another option, however, with LB setup, 
> it becomes harder.
> The request is to have HS2 support loading UDFs in such environment, the 
> implementation is open to discussion.
> I know that this setup is no ideal, as the better approach is to have one 
> cluster per environment, however, in the case that you have limited number of 
> nodes in the setup, this might be the only option and I believe many people 
> can benefit from it.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

45 matches

Mail list logo