[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166157#comment-17166157
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

[~gopalv] [~prasanth_j] Gentle reminder :

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-27 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: [~ashutoshc] [~rajesh.balamohan] Could you please review?)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166155#comment-17166155
 ] 

Syed Shameerur Rahman commented on HIVE-23873:
--

[~chiran54321] To run hive pre-commit tests you need to raise pull request in 
github (hive-repo). Can you please do the same? and  please add a test around 
this case.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch, 
> HIVE-23873.3.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "hive.sql.database.type" = "ORACLE", 
> "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", 
> "hive

[jira] [Comment Edited] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Chiran Ravani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166042#comment-17166042
 ] 

Chiran Ravani edited comment on HIVE-23873 at 7/28/20, 12:51 AM:
-

[~srahman] I was on leave, could not work over weekend. I tested with existing 
test cases which passes with Lower Case patch, however I got your point, 
modified the code for JdbcRecordIterator.java to fetch column names using 
ResultSetMetaData, attached updated patch.


was (Author: chiran54321):
[~srahman] I was on leave, could not work over weekend. I got your point, I 
tested with existing test cases which passes with Lower Case, however I got 
your point, modified the code for JdbcRecordIterator.java to fetch column names 
using ResultSetMetaData, attached updated patch.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch, 
> HIVE-23873.3.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME 

[jira] [Comment Edited] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Chiran Ravani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166042#comment-17166042
 ] 

Chiran Ravani edited comment on HIVE-23873 at 7/28/20, 12:51 AM:
-

[~srahman] I was on leave, could not work over weekend. I tested with existing 
test cases which passes with Lower Case patch, however I got your point, 
modified the code for JdbcRecordIterator.java to fetch column names using 
ResultSetMetaData, attached updated patch.

CC: [~jcamachorodriguez]


was (Author: chiran54321):
[~srahman] I was on leave, could not work over weekend. I tested with existing 
test cases which passes with Lower Case patch, however I got your point, 
modified the code for JdbcRecordIterator.java to fetch column names using 
ResultSetMetaData, attached updated patch.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch, 
> HIVE-23873.3.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(

[jira] [Updated] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Chiran Ravani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiran Ravani updated HIVE-23873:
-
Attachment: HIVE-23873.3.patch

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch, 
> HIVE-23873.3.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "hive.sql.database.type" = "ORACLE", 
> "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", 
> "hive.sql.jdbc.url" = "jdbc:oracle:thin:@orachehostname/XE", 
> "hive.sql.dbcp.username" = "chiran", 
> "hive.sql.dbcp.password" = "supersecurepassword", 
> "hive.sql.table" = "TESTHIVEJDBCSTORAGE", 
> "hive.sql.d

[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Chiran Ravani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166042#comment-17166042
 ] 

Chiran Ravani commented on HIVE-23873:
--

[~srahman] I was on leave, could not work over weekend. I got your point, I 
tested with existing test cases which passes with Lower Case, however I got 
your point, modified the code for JdbcRecordIterator.java to fetch column names 
using ResultSetMetaData, attached updated patch.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch, 
> HIVE-23873.3.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "h

[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-07-27 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23916:

Attachment: HIVE-23916.02.patch

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23916.01.patch, HIVE-23916.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-07-27 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23916:

Attachment: (was: HIVE-23916.02.patch)

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23916.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23940) Add TPCH tables (scale factor 0.001) as qt datasets

2020-07-27 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-23940:
--


> Add TPCH tables (scale factor 0.001) as qt datasets
> ---
>
> Key: HIVE-23940
> URL: https://issues.apache.org/jira/browse/HIVE-23940
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Currently there are only two TPCH tables (lineitem, part) in qt datasets and 
> the data do not reflect an actual scale factor. 
> TPC-H schema is quite popular and having all tables is useful to create 
> meaningful and understandable queries. 
> Moreover, keeping the standard proportions allows to have query plans that 
> are going to be meaningful when the scale factor changes and makes it easier 
> to compare the correctness of the results against other databases.  
> The goal of this issue is to add all TPCH tables with their data at scale 
> factor 0.001 as qt datasets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23939) SharedWorkOptimizer: take the union of columns in mergeable TableScans

2020-07-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-23939:
--
Summary: SharedWorkOptimizer: take the union of columns in mergeable 
TableScans  (was: SharedWorkOptimizer: taking the union of columns in mergeable 
TableScans)

> SharedWorkOptimizer: take the union of columns in mergeable TableScans
> --
>
> Key: HIVE-23939
> URL: https://issues.apache.org/jira/browse/HIVE-23939
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> POSTHOOK: query: explain
> select case when (select count(*) 
>   from store_sales 
>   where ss_quantity between 1 and 20) > 409437
> then (select avg(ss_ext_list_price) 
>   from store_sales 
>   where ss_quantity between 1 and 20) 
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 1 and 20) end bucket1 ,
>case when (select count(*)
>   from store_sales
>   where ss_quantity between 21 and 40) > 4595804
> then (select avg(ss_ext_list_price)
>   from store_sales
>   where ss_quantity between 21 and 40) 
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 21 and 40) end bucket2,
>case when (select count(*)
>   from store_sales
>   where ss_quantity between 41 and 60) > 7887297
> then (select avg(ss_ext_list_price)
>   from store_sales
>   where ss_quantity between 41 and 60)
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 41 and 60) end bucket3,
>case when (select count(*)
>   from store_sales
>   where ss_quantity between 61 and 80) > 10872978
> then (select avg(ss_ext_list_price)
>   from store_sales
>   where ss_quantity between 61 and 80)
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 61 and 80) end bucket4,
>case when (select count(*)
>   from store_sales
>   where ss_quantity between 81 and 100) > 43571537
> then (select avg(ss_ext_list_price)
>   from store_sales
>   where ss_quantity between 81 and 100)
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 81 and 100) end bucket5
> from reason
> where r_reason_sk = 1
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@reason
> POSTHOOK: Input: default@store_sales
> POSTHOOK: Output: hdfs://### HDFS PATH ###
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Reducer 34 (CUSTOM_SIMPLE_EDGE), Reducer 9 (CUSTOM_SIMPLE_EDGE)
> Reducer 11 <- Reducer 10 (CUSTOM_SIMPLE_EDGE), Reducer 18 (CUSTOM_SIMPLE_EDGE)
> Reducer 12 <- Reducer 11 (CUSTOM_SIMPLE_EDGE), Reducer 24 (CUSTOM_SIMPLE_EDGE)
> Reducer 13 <- Reducer 12 (CUSTOM_SIMPLE_EDGE), Reducer 30 (CUSTOM_SIMPLE_EDGE)
> Reducer 14 <- Reducer 13 (CUSTOM_SIMPLE_EDGE), Reducer 19 (CUSTOM_SIMPLE_EDGE)
> Reducer 15 <- Reducer 14 (CUSTOM_SIMPLE_EDGE), Reducer 25 (CUSTOM_SIMPLE_EDGE)
> Reducer 16 <- Reducer 15 (CUSTOM_SIMPLE_EDGE), Reducer 31 (CUSTOM_SIMPLE_EDGE)
> Reducer 18 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 19 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Reducer 20 (CUSTOM_SIMPLE_EDGE)
> Reducer 20 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 21 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 22 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 24 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 25 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 26 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 27 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 28 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE), Reducer 26 (CUSTOM_SIMPLE_EDGE)
> Reducer 30 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 31 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 32 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 33 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 34 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE), Reducer 32 (CUSTOM_SIMPLE_EDGE)
> Reducer 5 <- Reducer 21 (CUSTOM_SIMPLE_EDGE), Reducer 4 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Reducer 27 (CUSTOM_SIMPLE_EDGE), Reducer 5 (CUSTOM_SIMPLE_EDGE)
> Reducer 7 <- Reducer 33 (CUSTOM_SIMPLE_EDGE), Redu

[jira] [Assigned] (HIVE-23939) SharedWorkOptimizer: taking the union of columns in mergeable TableScans

2020-07-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-23939:
-


> SharedWorkOptimizer: taking the union of columns in mergeable TableScans
> 
>
> Key: HIVE-23939
> URL: https://issues.apache.org/jira/browse/HIVE-23939
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> POSTHOOK: query: explain
> select case when (select count(*) 
>   from store_sales 
>   where ss_quantity between 1 and 20) > 409437
> then (select avg(ss_ext_list_price) 
>   from store_sales 
>   where ss_quantity between 1 and 20) 
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 1 and 20) end bucket1 ,
>case when (select count(*)
>   from store_sales
>   where ss_quantity between 21 and 40) > 4595804
> then (select avg(ss_ext_list_price)
>   from store_sales
>   where ss_quantity between 21 and 40) 
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 21 and 40) end bucket2,
>case when (select count(*)
>   from store_sales
>   where ss_quantity between 41 and 60) > 7887297
> then (select avg(ss_ext_list_price)
>   from store_sales
>   where ss_quantity between 41 and 60)
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 41 and 60) end bucket3,
>case when (select count(*)
>   from store_sales
>   where ss_quantity between 61 and 80) > 10872978
> then (select avg(ss_ext_list_price)
>   from store_sales
>   where ss_quantity between 61 and 80)
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 61 and 80) end bucket4,
>case when (select count(*)
>   from store_sales
>   where ss_quantity between 81 and 100) > 43571537
> then (select avg(ss_ext_list_price)
>   from store_sales
>   where ss_quantity between 81 and 100)
> else (select avg(ss_net_paid_inc_tax)
>   from store_sales
>   where ss_quantity between 81 and 100) end bucket5
> from reason
> where r_reason_sk = 1
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@reason
> POSTHOOK: Input: default@store_sales
> POSTHOOK: Output: hdfs://### HDFS PATH ###
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Reducer 34 (CUSTOM_SIMPLE_EDGE), Reducer 9 (CUSTOM_SIMPLE_EDGE)
> Reducer 11 <- Reducer 10 (CUSTOM_SIMPLE_EDGE), Reducer 18 (CUSTOM_SIMPLE_EDGE)
> Reducer 12 <- Reducer 11 (CUSTOM_SIMPLE_EDGE), Reducer 24 (CUSTOM_SIMPLE_EDGE)
> Reducer 13 <- Reducer 12 (CUSTOM_SIMPLE_EDGE), Reducer 30 (CUSTOM_SIMPLE_EDGE)
> Reducer 14 <- Reducer 13 (CUSTOM_SIMPLE_EDGE), Reducer 19 (CUSTOM_SIMPLE_EDGE)
> Reducer 15 <- Reducer 14 (CUSTOM_SIMPLE_EDGE), Reducer 25 (CUSTOM_SIMPLE_EDGE)
> Reducer 16 <- Reducer 15 (CUSTOM_SIMPLE_EDGE), Reducer 31 (CUSTOM_SIMPLE_EDGE)
> Reducer 18 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 19 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Reducer 20 (CUSTOM_SIMPLE_EDGE)
> Reducer 20 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 21 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 22 <- Map 17 (CUSTOM_SIMPLE_EDGE)
> Reducer 24 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 25 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 26 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 27 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 28 <- Map 23 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE), Reducer 26 (CUSTOM_SIMPLE_EDGE)
> Reducer 30 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 31 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 32 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 33 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 34 <- Map 29 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE), Reducer 32 (CUSTOM_SIMPLE_EDGE)
> Reducer 5 <- Reducer 21 (CUSTOM_SIMPLE_EDGE), Reducer 4 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Reducer 27 (CUSTOM_SIMPLE_EDGE), Reducer 5 (CUSTOM_SIMPLE_EDGE)
> Reducer 7 <- Reducer 33 (CUSTOM_SIMPLE_EDGE), Reducer 6 (CUSTOM_SIMPLE_EDGE)
> Reducer 8 <- Reducer 22 (CUSTOM_SIMPLE_EDGE), Reducer 7 (CUSTOM_SIMPLE_EDGE)
> Reducer 9 <- Reducer 28 (CUSTOM_SIMPLE_EDGE), Re

[jira] [Updated] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23938:

Description: 
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
... 
org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon

OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
version 11.0 and will likely be removed in a future release.
Unrecognized VM option 'UseGCLogFileRotation'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{code}

These are not valid in JDK11:
{code}
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles
-XX:GCLogFileSize
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
{code}

Instead something like:
{code}
-Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
{code}

  was:
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
version 11.0 and will likely be removed in a future release.
Unrecognized VM option 'UseGCLogFileRotation'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{code}

These are not valid in JDK11:
{code}
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles
-XX:GCLogFileSize
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
{code}

Instead something like:
{code}
-Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
{code}


> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  

[jira] [Commented] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165806#comment-17165806
 ] 

László Bodor commented on HIVE-23938:
-

just for reference, with the settings below I got the attached file on JDK11:  
[^gc_2020-07-27-13.log] 
{code}
-Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
{code}

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: gc_2020-07-27-13.log
>
>
> https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
> {code}
> JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
> -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
> {code}
> on JDK11 I got something like:
> {code}
> + exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
> -Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
> -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
> -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
> -XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 
> -Dhttp.maxConnections=10 -Dasync.profiler.home=/grid/0/async-profiler -server 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
> -XX:+PrintGCDateStamps 
> -Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
>  
> -Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
>  -Dlog4j.configurationFile= 
> -Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
>  
> -Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
>  -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
> -classpath 
> '/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
>  org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
> OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
> version 11.0 and will likely be removed in a future release.
> Unrecognized VM option 'UseGCLogFileRotation'
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> These are not valid in JDK11:
> {code}
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles
> -XX:GCLogFileSize
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> {code}
> Instead something like:
> {code}
> -Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23938:

Attachment: gc_2020-07-27-13.log

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: gc_2020-07-27-13.log
>
>
> https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
> {code}
> JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
> -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
> {code}
> on JDK11 I got something like:
> {code}
> + exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
> -Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
> -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
> -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
> -XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 
> -Dhttp.maxConnections=10 -Dasync.profiler.home=/grid/0/async-profiler -server 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
> -XX:+PrintGCDateStamps 
> -Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
>  
> -Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
>  -Dlog4j.configurationFile= 
> -Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
>  
> -Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
>  -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
> -classpath 
> '/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
>  org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
> OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
> version 11.0 and will likely be removed in a future release.
> Unrecognized VM option 'UseGCLogFileRotation'
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> These are not valid in JDK11:
> {code}
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles
> -XX:GCLogFileSize
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> {code}
> Instead something like:
> {code}
> -Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23723) Limit operator pushdown through LOJ

2020-07-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23723:
-
Attachment: (was: HIVE-23723.1.patch)

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23938:

Description: 
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
version 11.0 and will likely be removed in a future release.
Unrecognized VM option 'UseGCLogFileRotation'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{code}

These are not valid in JDK11:
{code}
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles
-XX:GCLogFileSize
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
{code}

Instead something like:
{code}
-Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
{code}

  was:
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM 

[jira] [Updated] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23938:

Description: 
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
version 11.0 and will likely be removed in a future release.
Unrecognized VM option 'UseGCLogFileRotation'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{code}

There are not valid in JDK11:
{code}
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles
-XX:GCLogFileSize
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
{code}

Instead something like:
{code}
-Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
{code}

  was:
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM 

[jira] [Updated] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23938:

Description: 
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
version 11.0 and will likely be removed in a future release.
Unrecognized VM option 'UseGCLogFileRotation'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{code}

There are not valid in JDK11:
{code}
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles
-XX:GCLogFileSize 
{code}

Instead something like:
{code}
-Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
{code}

  was:
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 

[jira] [Updated] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23938:

Description: 
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
version 11.0 and will likely be removed in a future release.
Unrecognized VM option 'UseGCLogFileRotation'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{code}

There are not valid in JDK11:
{code}
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles
-XX:GCLogFileSize 
{code}

Instead something like:
{code}
java -Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M ...
{code}

  was:
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was depre

[jira] [Updated] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23938:

Description: 
https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
{code}
JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
-XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
{code}

on JDK11 I got something like:
{code}
+ exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
-Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
-XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 -Dhttp.maxConnections=10 
-Dasync.profiler.home=/grid/0/async-profiler -server 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
-XX:+PrintGCDateStamps 
-Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
 
-Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
 -Dlog4j.configurationFile= 
-Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
 
-Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
 -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
-classpath 
'/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/tez/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/udfs/*:.:'
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
version 11.0 and will likely be removed in a future release.
Unrecognized VM option 'UseGCLogFileRotation'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{code}

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
> {code}
> JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
> -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
> {code}
> on JDK11 I got something like:
> {code}
> + exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
> -Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
> -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
> -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
> -XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 
> -Dhttp.maxConnections=10 -Dasync.profiler.home=/grid/0/async-profiler -server 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
> -XX:+PrintGCDateStamps 
> -Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
>  
> -Djava.io.tmpdir=/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/tmp/
>  -Dlog4j.configurationFile= 
> -Dllap.daemon.log.dir=/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09
>  
> -Dllap.daemon.log.file=llap-daemon-lbodor-ctr-e141-1563959304486-69251-01-11.hwx.site.log
>  -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO 
> -classpath 
> '/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib/conf/:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib//lib/*:/grid/ssd/yarn/nm/usercache/lbodor/appcache/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/lib

[jira] [Assigned] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-23938:
---

Assignee: László Bodor

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23893?focusedWorklogId=463601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463601
 ]

ASF GitHub Bot logged work on HIVE-23893:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 11:47
Start Date: 27/Jul/20 11:47
Worklog Time Spent: 10m 
  Work Description: letsflyinthesky commented on a change in pull request 
#1322:
URL: https://github.com/apache/hive/pull/1322#discussion_r460832421



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##
@@ -782,6 +790,89 @@ protected ExprWalkerInfo mergeChildrenPred(Node nd, 
OpWalkerInfo owi,
 }
   }
 
+  protected static Object splitFilter(FilterOperator op,
+  ExprWalkerInfo ewi, OpWalkerInfo owi) throws SemanticException {
+
+RowSchema inputRS = op.getSchema();
+
+Map> pushDownPreds = ewi.getFinalCandidates();
+Map> unPushDownPreds = 
ewi.getNonFinalCandidates();
+
+// combine all deterministic predicates into a single expression
+List deterministicPreds = new ArrayList();
+Iterator> iterator1 = pushDownPreds.values().iterator();
+while (iterator1.hasNext()) {
+  for (ExprNodeDesc pred : iterator1.next()) {
+deterministicPreds = ExprNodeDescUtils.split(pred, deterministicPreds);

Review comment:
   yes, may we could remove iterator here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463601)
Time Spent: 40m  (was: 0.5h)

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> ---
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: zhishui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-07-27 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165647#comment-17165647
 ] 

Zhihua Deng edited comment on HIVE-23893 at 7/27/20, 11:39 AM:
---

[~zhishui] Thanks for your concern, I leave some comments in your pr. In fact, 
I'm wondering that we cannot push the deterministic conditions when there are 
some nondeterministic conditions along, as the test case 
_rand_partitionpruner3.q_ shows that if i push *not(key > 50 or key < 10)* 
down,  the results are seen differently compared to the elder.  the ealier 
evaluation of *not(key > 50 or key < 10)* may affect the evaluation of *RAND(1) 
< 0.1*,  which make the result unpredicatable. But if we enable cbo, the 
deterministic conditions are seen to be pushed down, as the case shows: 
_explain extended select a.*, b.* from a join b on a.k = b.k where a.hs = 11 
and b.hs <= 10 and rand(1) < 0.1_, maybe it's better for the rbo of hive to 
work consistently to the cbo.


was (Author: dengzh):
[~zhishui] Thanks for your concern, I leave some comments in your pr. In fact, 
I'm wondering if we can push the deterministic conditions when there are some 
nondeterministic conditions along, as the test case _rand_partitionpruner3.q_ 
shows that if i push *not(key > 50 or key < 10)* down,  the results are seen 
differently compared to the elder.  the ealier evaluation of *not(key > 50 or 
key < 10)* may affect the evaluation of *RAND(1) < 0.1*,  which make the result 
unpredicatable. But if we enable cbo, the deterministic conditions are seen to 
be pushed down, as the case shows: _explain extended select a.*, b.* from a 
join b on a.k = b.k where a.hs = 11 and b.hs <= 10 and rand(1) < 0.1_, maybe 
it's better for the rbo of hive to work consistently to the cbo.

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> ---
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: zhishui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-07-27 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165647#comment-17165647
 ] 

Zhihua Deng commented on HIVE-23893:


[~zhishui] Thanks for your concern, I leave some comments in your pr. In fact, 
I'm wondering if we can push the deterministic conditions when there are some 
nondeterministic conditions along, as the test case _rand_partitionpruner3.q_ 
shows that if i push *not(key > 50 or key < 10)* down,  the results are seen 
differently compared to the elder.  the ealier evaluation of *not(key > 50 or 
key < 10)* may affect the evaluation of *RAND(1) < 0.1*,  which make the result 
unpredicatable. But if we enable cbo, the deterministic conditions are seen to 
be pushed down, as the case shows: _explain extended select a.*, b.* from a 
join b on a.k = b.k where a.hs = 11 and b.hs <= 10 and rand(1) < 0.1_, maybe 
it's better for the rbo of hive to work consistently to the cbo.

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> ---
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: zhishui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23893?focusedWorklogId=463597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463597
 ]

ASF GitHub Bot logged work on HIVE-23893:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 11:32
Start Date: 27/Jul/20 11:32
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1322:
URL: https://github.com/apache/hive/pull/1322#discussion_r460826125



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##
@@ -782,6 +790,89 @@ protected ExprWalkerInfo mergeChildrenPred(Node nd, 
OpWalkerInfo owi,
 }
   }
 
+  protected static Object splitFilter(FilterOperator op,
+  ExprWalkerInfo ewi, OpWalkerInfo owi) throws SemanticException {
+
+RowSchema inputRS = op.getSchema();
+
+Map> pushDownPreds = ewi.getFinalCandidates();
+Map> unPushDownPreds = 
ewi.getNonFinalCandidates();
+
+// combine all deterministic predicates into a single expression
+List deterministicPreds = new ArrayList();
+Iterator> iterator1 = pushDownPreds.values().iterator();
+while (iterator1.hasNext()) {
+  for (ExprNodeDesc pred : iterator1.next()) {
+deterministicPreds = ExprNodeDescUtils.split(pred, deterministicPreds);
+  }
+}
+
+if (deterministicPreds.isEmpty()) {
+  return null;
+}
+
+List nondeterministicPreds = new ArrayList();
+Iterator> iterator2 = 
unPushDownPreds.values().iterator();
+while (iterator2.hasNext()) {
+  for (ExprNodeDesc pred : iterator2.next()) {
+nondeterministicPreds = ExprNodeDescUtils.split(pred, 
nondeterministicPreds);
+  }
+}
+
+assert !nondeterministicPreds.isEmpty();
+
+ExprNodeDesc deterministicCondn = 
ExprNodeDescUtils.mergePredicates(deterministicPreds);
+ExprNodeDesc nondeterministicCondn = 
ExprNodeDescUtils.mergePredicates(nondeterministicPreds);
+
+Operator deterministicFilter =
+OperatorFactory.get(new FilterDesc(deterministicCondn, false), new 
RowSchema(inputRS.getSignature()));
+
+deterministicFilter.setChildOperators(new ArrayList>());
+deterministicFilter.getChildOperators().add(op);
+
+List> originalParents = op
+.getParentOperators();
+for (Operator parent : originalParents) {
+  List> childOperators = parent
+  .getChildOperators();
+  int pos = childOperators.indexOf(op);
+  childOperators.remove(pos);
+  childOperators.add(pos, deterministicFilter);
+
+  int pPos = op.getParentOperators().indexOf(parent);
+  deterministicFilter.getParentOperators().add(pPos, parent);
+}
+
+op.getParentOperators().clear();
+op.getParentOperators().add(deterministicFilter);
+op.getConf().setPredicate(nondeterministicCondn);
+
+if (HiveConf.getBoolVar(owi.getParseContext().getConf(),
+HiveConf.ConfVars.HIVEPPDREMOVEDUPLICATEFILTERS)) {
+  // remove the candidate filter ops
+  for (FilterOperator fop : owi.getCandidateFilterOps()) {
+List> children = 
fop.getChildOperators();
+List> parents = 
fop.getParentOperators();
+for (Operator parent : parents) {
+  parent.getChildOperators().addAll(children);
+  parent.removeChild(fop);
+}
+for (Operator child : children) {
+  child.getParentOperators().addAll(parents);
+  child.removeParent(fop);
+}
+  }
+  owi.getCandidateFilterOps().clear();
+}
+
+ewi = ExprWalkerProcFactory.extractPushdownPreds(owi, op,
+deterministicFilter.getConf().getPredicate());
+
+owi.putPrunedPreds(deterministicFilter, ewi);

Review comment:
   the deterministicFilter should add to OpWalkerInfo.candidateFilterOps 
and no need to extractPushdownPreds again

##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##
@@ -782,6 +790,89 @@ protected ExprWalkerInfo mergeChildrenPred(Node nd, 
OpWalkerInfo owi,
 }
   }
 
+  protected static Object splitFilter(FilterOperator op,
+  ExprWalkerInfo ewi, OpWalkerInfo owi) throws SemanticException {
+
+RowSchema inputRS = op.getSchema();
+
+Map> pushDownPreds = ewi.getFinalCandidates();
+Map> unPushDownPreds = 
ewi.getNonFinalCandidates();
+
+// combine all deterministic predicates into a single expression
+List deterministicPreds = new ArrayList();
+Iterator> iterator1 = pushDownPreds.values().iterator();
+while (iterator1.hasNext()) {
+  for (ExprNodeDesc pred : iterator1.next()) {
+deterministicPreds = ExprNodeDescUtils.split(pred, deterministicPreds);
+  }
+}
+
+if (deterministicPreds.isEmpty()) {
+  return null;
+}
+
+List nondeterministicPreds = new ArrayList();
+Iterator> iterator2 = 
unPushDownPreds.values().iterator();
+while (iterator2.hasNext()) {
+  for (ExprNodeDesc

[jira] [Comment Edited] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-07-27 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165494#comment-17165494
 ] 

zhishui edited comment on HIVE-23893 at 7/27/20, 9:59 AM:
--

[~dengzh][~pgaref]the cause is that hive dose not split deterministic 
predicates and nondeterministic predicates,you could look my branch: 
https://github.com/letsflyinthesky/hive/commit/ee24112887f523d0cbba4a6f91d958f3d48cd984
 and i have test with case you provided。


was (Author: zhishui):
[~dengzh][~pgaref]the cause is that hive dose not split deterministic 
predicates and nondeterministic predicates,you could look my branch: 
https://github.com/letsflyinthesky/hive/commit/ee24112887f523d0cbba4a6f91d958f3d48cd984

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> ---
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: zhishui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165584#comment-17165584
 ] 

Syed Shameerur Rahman edited comment on HIVE-23873 at 7/27/20, 9:59 AM:


[~chiran54321] Any update on this? I can pick this if you are busy with other 
stuffs.


was (Author: srahman):
[~chiran54321] Any update on this? I can pick this if you don't have the 
bandwidth.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "hive.sql.database.type" = "ORACLE", 
> "hive

[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165584#comment-17165584
 ] 

Syed Shameerur Rahman commented on HIVE-23873:
--

[~chiran54321] Any update on this? I can pick this if you don't have the 
bandwidth.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "hive.sql.database.type" = "ORACLE", 
> "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", 
> "hive.sql.jdbc.url" = "jdbc:oracle:thin:@orachehostname/XE", 
> "hive.sql.dbcp.username" = "chiran", 
> "hive.sq

[jira] [Comment Edited] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-07-27 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165494#comment-17165494
 ] 

zhishui edited comment on HIVE-23893 at 7/27/20, 9:53 AM:
--

[~dengzh][~pgaref]the cause is that hive dose not split deterministic 
predicates and nondeterministic predicates,you could look my branch: 
https://github.com/letsflyinthesky/hive/commit/ee24112887f523d0cbba4a6f91d958f3d48cd984


was (Author: zhishui):
the cause is that hive dose not split deterministic predicates and 
nondeterministic predicates,you could look my branch: 
https://github.com/letsflyinthesky/hive/commit/ee24112887f523d0cbba4a6f91d958f3d48cd984

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> ---
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: zhishui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23892) Remove interpretation for character RexLiteral

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23892?focusedWorklogId=463565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463565
 ]

ASF GitHub Bot logged work on HIVE-23892:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 09:44
Start Date: 27/Jul/20 09:44
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1305:
URL: https://github.com/apache/hive/pull/1305#discussion_r460765929



##
File path: ql/src/test/results/clientpositive/llap/vector_const.q.out
##
@@ -40,7 +40,7 @@ STAGE PLANS:
   alias: varchar_const_1
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
   Select Operator
-expressions: 'FF' (type: varchar(3))
+expressions: 'FF' (type: varchar(4))

Review comment:
   interesting change; will this also mean that:
   ```
   CONCAT(CAST('F' AS CHAR(200)),CAST('F' AS CHAR(200)))
   ```
   will not be processable because it would need `CHAR(400)` - which is not 
supported?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ExprNodeConverter.java
##
@@ -341,26 +342,22 @@ public static ExprNodeConstantDesc 
toExprNodeConstantDesc(RexLiteral literal) {
   case DECIMAL:
 return new 
ExprNodeConstantDesc(TypeInfoFactory.getDecimalTypeInfo(lType.getPrecision(),
 lType.getScale()), 
HiveDecimal.create((BigDecimal)literal.getValue3()));
-  case VARCHAR:
   case CHAR: {
-if (literal.getValue() instanceof HiveNlsString) {
-  HiveNlsString mxNlsString = (HiveNlsString) literal.getValue();
-  switch (mxNlsString.interpretation) {
-  case STRING:
-return new ExprNodeConstantDesc(TypeInfoFactory.stringTypeInfo, 
literal.getValue3());
-  case CHAR: {
-int precision = lType.getPrecision();
-HiveChar value = new HiveChar((String) literal.getValue3(), 
precision);
-return new ExprNodeConstantDesc(new CharTypeInfo(precision), 
value);
-  }
-  case VARCHAR: {
-int precision = lType.getPrecision();
-HiveVarchar value = new HiveVarchar((String) literal.getValue3(), 
precision);
-return new ExprNodeConstantDesc(new VarcharTypeInfo(precision), 
value);
-  }
-  }
+Preconditions.checkState(literal.getValue() instanceof NlsString,
+"char values must use NlsString for correctness");
+int precision = lType.getPrecision();
+HiveChar value = new HiveChar((String) literal.getValue3(), precision);
+return new ExprNodeConstantDesc(new CharTypeInfo(precision), value);
+  }
+  case VARCHAR: {
+Preconditions.checkState(literal.getValue() instanceof NlsString,
+"varchar/string values must use NlsString for correctness");
+int precision = lType.getPrecision();
+if (precision == Integer.MAX_VALUE) {
+  return new ExprNodeConstantDesc(TypeInfoFactory.stringTypeInfo, 
literal.getValue3());

Review comment:
   I don't know why I've not choosen this path in HIVE-21316...
   
   maybe I missed that `MAX_VARCHAR_LENGTH`  and  `MAX_CHAR_LENGTH` is both 
below 64K

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/RexNodeExprFactory.java
##
@@ -374,7 +373,7 @@ protected Object 
interpretConstantAsPrimitive(PrimitiveTypeInfo targetType, Obje
 HiveChar newValue = new HiveChar(constValue, length);
 HiveChar maxCharConst = new HiveChar(constValue, 
HiveChar.MAX_CHAR_LENGTH);
 if (maxCharConst.equals(newValue)) {
-  return makeHiveUnicodeString(Interpretation.CHAR, 
newValue.getValue());
+  return makeHiveUnicodeString(newValue.getValue());

Review comment:
   this will discard type distinction between char/varchar - but because 
`CHAR` is already padded at this point; it will work correctly!
   awesome! :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463565)
Time Spent: 0.5h  (was: 20m)

> Remove interpretation for character RexLiteral
> --
>
> Key: HIVE-23892
> URL: https://issues.apache.org/jira/browse/HIVE-23892
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  

[jira] [Commented] (HIVE-14661) Hive should extract deterministic conditions from where clause and use them for partition pruning

2020-07-27 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-14661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165571#comment-17165571
 ] 

zhishui commented on HIVE-14661:


I has fix this issuce with commit 
https://github.com/letsflyinthesky/hive/commit/ee24112887f523d0cbba4a6f91d958f3d48cd984,
 you could pull and  have a try 

> Hive should extract deterministic conditions from where clause and use them 
> for partition pruning
> -
>
> Key: HIVE-14661
> URL: https://issues.apache.org/jira/browse/HIVE-14661
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Priority: Major
>
> Currently, if a non-deterministic function is used in where clause, partition 
> pruning doesn't work. This can be reproduced as below:
> {code:sql}
> create table part1 (id int, content string) partitioned by (p int);
> alter table part1 add partition(p=1);
> alter table part1 add partition(p=2);
> create table part2 (id int, another_content string);
> set hive.mapred.mode=strict;
> set hive.cbo.enable=false;
> explain select p1.id, p1.content, p2.another_content from part1 p1 join part2 
> p2 on p1.id=p2.id where p1.p=1 and rand < 0.5;
> {code}
> The last query would fail with below error:
> {noformat}
> 16/08/23 23:55:52 ERROR ql.Driver: [main]: FAILED: SemanticException [Error 
> 10041]: No partition predicate found for Alias "p1" Table "part1"
> org.apache.hadoop.hive.ql.parse.SemanticException: No partition predicate 
> found for Alias "p1" Table "part1"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23937) Take null ordering into consideration when pushing TNK through inner joins

2020-07-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23937:



> Take null ordering into consideration when pushing TNK through inner joins
> --
>
> Key: HIVE-23937
> URL: https://issues.apache.org/jira/browse/HIVE-23937
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23930) Upgrade to tez 0.10.x

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23930?focusedWorklogId=463554&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463554
 ]

ASF GitHub Bot logged work on HIVE-23930:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 09:14
Start Date: 27/Jul/20 09:14
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1311:
URL: https://github.com/apache/hive/pull/1311#issuecomment-664228175


   @abstractdog  you could kill the prechecks in the Jenkinsfile 
temporarily...doesn't this needs jdk11 as well?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463554)
Time Spent: 20m  (was: 10m)

> Upgrade to tez 0.10.x
> -
>
> Key: HIVE-23930
> URL: https://issues.apache.org/jira/browse/HIVE-23930
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Tez 0.10.1 is not yet released, but this ticket is for tracking the effort 
> and the needed hive changes.
> Currently, Hive depends on 0.9.1
> TODOs: 
> - check why HIVE-23689 broke some unit tests intermittently (0.9.2 ->0.9.3 
> bump), because a 0.10.x upgrade will also contain those tez changes which 
> could be related
> - maintain the needed hive changes (reflecting tez api changes):
> HIVE-23190: LLAP: modify IndexCache to pass filesystem object to 
> TezSpillRecord



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23719) MetaStoreDirectSql need to optime QuerySQL

2020-07-27 Thread YulongZ (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YulongZ resolved HIVE-23719.

Resolution: Fixed

> MetaStoreDirectSql need to optime QuerySQL
> --
>
> Key: HIVE-23719
> URL: https://issues.apache.org/jira/browse/HIVE-23719
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.2
>Reporter: YulongZ
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23891) Using UNION sql clause and speculative execution can cause file duplication in Tez

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23891?focusedWorklogId=463546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463546
 ]

ASF GitHub Bot logged work on HIVE-23891:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 08:41
Start Date: 27/Jul/20 08:41
Worklog Time Spent: 10m 
  Work Description: georgepachitariu commented on pull request #1294:
URL: https://github.com/apache/hive/pull/1294#issuecomment-664205898


   Hello @kgyrtkirk,
   Can you please guide me on what I should do next?
   
   I saw that there is a new test which is timing-out in 
`continuous-integration/jenkins/pr-merge`, but I think that is unrelated to my 
change.
   Thank you, George.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463546)
Time Spent: 1h 40m  (was: 1.5h)

> Using UNION sql clause and speculative execution can cause file duplication 
> in Tez
> --
>
> Key: HIVE-23891
> URL: https://issues.apache.org/jira/browse/HIVE-23891
> Project: Hive
>  Issue Type: Bug
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23891.1.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Hello, 
> the specific scenario when this can happen:
>  - the execution engine is Tez;
>  - speculative execution is on;
>  - the query inserts into a table and the last step is a UNION sql clause;
> The problem is that Tez creates an extra layer of subdirectories when there 
> is a UNION. Later, when deduplicating, Hive doesn't take that into account 
> and only deduplicates folders but not the files inside.
> So for a query like this:
> {code:sql}
> insert overwrite table union_all
> select * from union_first_part
> union all
> select * from union_second_part;
> {code}
> The folder structure afterwards will be like this (a possible example):
> {code:java}
> .../union_all/HIVE_UNION_SUBDIR_1/00_0
> .../union_all/HIVE_UNION_SUBDIR_1/00_1
> .../union_all/HIVE_UNION_SUBDIR_2/00_1
> {code}
> The attached patch increases the number of folder levels that Hive will check 
> recursively for duplicates when we have a UNION in Tez.
> Feel free to reach out if you have any questions :).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-07-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-23408:
---

Assignee: László Bodor

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23916) Fix Atlas client dependency version

2020-07-27 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165534#comment-17165534
 ] 

Aasha Medhi commented on HIVE-23916:


+1

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23916.01.patch, HIVE-23916.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22255) Hive don't trigger Major Compaction automatically if table contains only base files

2020-07-27 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165533#comment-17165533
 ] 

Karen Coppage commented on HIVE-22255:
--

[~asetty] The point is that old base directories are cleaned up –– so it only 
matters that the cleaner runs. Compaction itself will be skipped. Basically, 
with this change
 # we queue (major, doesn't really matter) compaction
 # the compaction Worker thread picks up the table from the queue, sees that 
there is one active base and no deltas, (so compaction will be skipped and) the 
table will immediately be placed in "ready for cleaning" status
 # Cleaner runs and deletes obsolete bases

> Hive don't trigger Major Compaction automatically if table contains only base 
> files 
> 
>
> Key: HIVE-22255
> URL: https://issues.apache.org/jira/browse/HIVE-22255
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 3.1.2
> Environment: Hive-3.1.1
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-22255.01.patch, HIVE-22255.02.patch, 
> HIVE-22255.patch
>
>
> user may run into the issue if the table consists of all base files but no 
> delta, then the following condition will yield false and automatic major 
> compaction will be skipped.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]
>  
> Steps to Reproduce:
>  # create Acid table 
> {code:java}
> //  create table myacid(id int);
> {code}
>  # Run multiple insert table 
> {code:java}
> // insert overwrite table myacid values(1);insert overwrite table myacid 
> values(2),(3),(4){code}
>  # DFS ls output
> {code:java}
> // dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
> ++
> |                     DFS Output                     |
> ++
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        610 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/bucket_0 |
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        633 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/bucket_0 |
> ++{code}
>  
> you will see that Major compaction will not be trigger until you run alter 
> table compact MAJOR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-27 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-23835:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Merged to master, thanks for the patch !

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23835.01.patch, HIVE-23835.02.patch, 
> HIVE-23835.03.patch, HIVE-23835.04.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23324) Parallelise compaction directory cleaning process

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23324?focusedWorklogId=463535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463535
 ]

ASF GitHub Bot logged work on HIVE-23324:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 07:48
Start Date: 27/Jul/20 07:48
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1275:
URL: https://github.com/apache/hive/pull/1275#discussion_r460706465



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -66,53 +69,62 @@
   private long cleanerCheckInterval = 0;
 
   private ReplChangeManager replChangeManager;
+  private ExecutorService cleanerExecutor;
 
   @Override
   public void init(AtomicBoolean stop) throws Exception {
 super.init(stop);
 replChangeManager = ReplChangeManager.getInstance(conf);
-  }
-
-  @Override
-  public void run() {
 if (cleanerCheckInterval == 0) {

Review comment:
   i think, this if check is redundant.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463535)
Time Spent: 10h 10m  (was: 10h)

> Parallelise compaction directory cleaning process
> -
>
> Key: HIVE-23324
> URL: https://issues.apache.org/jira/browse/HIVE-23324
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Initiator processes the various compaction candidates in parallel, so we 
> could follow a similar approach in Cleaner where we currently clean the 
> directories sequentially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23916) Fix Atlas client dependency version

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?focusedWorklogId=463537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463537
 ]

ASF GitHub Bot logged work on HIVE-23916:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 07:49
Start Date: 27/Jul/20 07:49
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1318:
URL: https://github.com/apache/hive/pull/1318#discussion_r460706988



##
File path: pom.xml
##
@@ -112,7 +112,7 @@
 1.5.7
 
 0.10.0
-2.0.0
+2.1.0

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463537)
Time Spent: 0.5h  (was: 20m)

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23916.01.patch, HIVE-23916.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23894) SubmitDag should not be retried incase of query cancel

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23894?focusedWorklogId=463534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463534
 ]

ASF GitHub Bot logged work on HIVE-23894:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 07:45
Start Date: 27/Jul/20 07:45
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1293:
URL: https://github.com/apache/hive/pull/1293#issuecomment-664178334


   > Thanks for the review @pgaref
   > I am afraid we may not be able to simulate the kill query scenario at the 
specific place using UT. Please let me know ur thought.
   
   Hey @nareshpr  -- np, changes are pretty straightforward so I believe we 
should push :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463534)
Time Spent: 0.5h  (was: 20m)

> SubmitDag should not be retried incase of query cancel
> --
>
> Key: HIVE-23894
> URL: https://issues.apache.org/jira/browse/HIVE-23894
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Incase of query cancel, running tasks will be interrupted & TezTask shutdown 
> flag will be set.
> Below code is not required to be retried incase of Task shutdown
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java#L572-L586]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23324) Parallelise compaction directory cleaning process

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23324?focusedWorklogId=463533&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463533
 ]

ASF GitHub Bot logged work on HIVE-23324:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 07:45
Start Date: 27/Jul/20 07:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1275:
URL: https://github.com/apache/hive/pull/1275#discussion_r460704756



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -89,23 +93,28 @@ public void run() {
 handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
 startedAt = System.currentTimeMillis();
 long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
+List cleanerList = new ArrayList<>();
 for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
-  clean(compactionInfo, minOpenTxnId);
+  
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
 }
+CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
   } catch (Throwable t) {
 LOG.error("Caught an exception in the main loop of compactor cleaner, 
" +
-StringUtils.stringifyException(t));
-  }
-  finally {
+StringUtils.stringifyException(t));
+if (cleanerExecutor != null) {

Review comment:
   @adesh-rao, exception from (see how it's done in Initiator)   
 try {
   Thread.sleep(cleanerCheckInterval - elapsedTime);
 } catch (InterruptedException ie) {
   // What can I do about it?
 }
   rest looks good to me





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463533)
Time Spent: 10h  (was: 9h 50m)

> Parallelise compaction directory cleaning process
> -
>
> Key: HIVE-23324
> URL: https://issues.apache.org/jira/browse/HIVE-23324
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Initiator processes the various compaction candidates in parallel, so we 
> could follow a similar approach in Cleaner where we currently clean the 
> directories sequentially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-07-27 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23916:

Attachment: HIVE-23916.02.patch

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23916.01.patch, HIVE-23916.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-07-27 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165494#comment-17165494
 ] 

zhishui commented on HIVE-23893:


the cause is that hive dose not split deterministic predicates and 
nondeterministic predicates,you could look my branch: 
https://github.com/letsflyinthesky/hive/commit/ee24112887f523d0cbba4a6f91d958f3d48cd984

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> ---
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: zhishui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23936) Provide approximate number of input records to be processed in broadcast reader

2020-07-27 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165491#comment-17165491
 ] 

Rajesh Balamohan commented on HIVE-23936:
-

E.g in hive, where approximate input records counter can be used: 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTableLoader.java#L128]

> Provide approximate number of input records to be processed in broadcast 
> reader
> ---
>
> Key: HIVE-23936
> URL: https://issues.apache.org/jira/browse/HIVE-23936
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
>
> There are cases when broadcasted data is loaded into hashtable in upstream 
> applications (e.g Hive). Apps tends to predict the number of entries in the 
> hashtable diligently, but there are cases where these estimates can be very 
> complicated at compile time.
>  
> Tez can help in such cases, by providing "approximate number of input records 
> counter", to be processed in UnorderedKVInput. This is to avoid expensive 
> rehash when hashtable sizes are not estimated correctly. It would be good to 
> start with broadcast first and then to move on to unordered partitioned case 
> later.
>  
> This would help in predicting the number of entries at runtime & can get 
> better estimates for hashtable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23893?focusedWorklogId=463524&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463524
 ]

ASF GitHub Bot logged work on HIVE-23893:
-

Author: ASF GitHub Bot
Created on: 27/Jul/20 07:16
Start Date: 27/Jul/20 07:16
Worklog Time Spent: 10m 
  Work Description: letsflyinthesky opened a new pull request #1322:
URL: https://github.com/apache/hive/pull/1322


   …ll be push down and nondeterministic filter which keeps current position
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 463524)
Time Spent: 20m  (was: 10m)

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> ---
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: zhishui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)