[jira] [Work logged] (HIVE-24791) Backward compatibility issue in _dumpmetadata

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24791?focusedWorklogId=554088=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554088
 ]

ASF GitHub Bot logged work on HIVE-24791:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 06:27
Start Date: 18/Feb/21 06:27
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1988:
URL: https://github.com/apache/hive/pull/1988#discussion_r578161861



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/DumpMetaData.java
##
@@ -131,7 +131,8 @@ private void loadDumpFromFile() throws SemanticException {
   lineContents[2].equals(Utilities.nullStringOutput) ? null : 
Long.valueOf(lineContents[2]),
   lineContents[3].equals(Utilities.nullStringOutput) ? null : new 
Path(lineContents[3]),
   lineContents[4].equals(Utilities.nullStringOutput) ? null : 
Long.valueOf(lineContents[4]),
-  Boolean.valueOf(lineContents[6]));
+  (lineContents.length < 8 || 
lineContents[6].equals(Utilities.nullStringOutput)) ?

Review comment:
   Add a test





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554088)
Time Spent: 0.5h  (was: 20m)

> Backward compatibility issue in _dumpmetadata
> -
>
> Key: HIVE-24791
> URL: https://issues.apache.org/jira/browse/HIVE-24791
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24791.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24791) Backward compatibility issue in _dumpmetadata

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24791?focusedWorklogId=554087=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554087
 ]

ASF GitHub Bot logged work on HIVE-24791:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 06:27
Start Date: 18/Feb/21 06:27
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1988:
URL: https://github.com/apache/hive/pull/1988#discussion_r578161679



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/DumpMetaData.java
##
@@ -131,7 +131,8 @@ private void loadDumpFromFile() throws SemanticException {
   lineContents[2].equals(Utilities.nullStringOutput) ? null : 
Long.valueOf(lineContents[2]),
   lineContents[3].equals(Utilities.nullStringOutput) ? null : new 
Path(lineContents[3]),
   lineContents[4].equals(Utilities.nullStringOutput) ? null : 
Long.valueOf(lineContents[4]),
-  Boolean.valueOf(lineContents[6]));
+  (lineContents.length < 8 || 
lineContents[6].equals(Utilities.nullStringOutput)) ?

Review comment:
   check for length < 7





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554087)
Time Spent: 20m  (was: 10m)

> Backward compatibility issue in _dumpmetadata
> -
>
> Key: HIVE-24791
> URL: https://issues.apache.org/jira/browse/HIVE-24791
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24791.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24751) Workload Manager sees `No privilege` exception even when authorization is not enabled

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24751?focusedWorklogId=554058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554058
 ]

ASF GitHub Bot logged work on HIVE-24751:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 04:44
Start Date: 18/Feb/21 04:44
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on pull request #1964:
URL: https://github.com/apache/hive/pull/1964#issuecomment-781045969


   @pvargacl, @szlta, @sankarh  Can you please review this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554058)
Time Spent: 0.5h  (was: 20m)

> Workload Manager sees `No privilege` exception even when authorization is not 
> enabled
> -
>
> Key: HIVE-24751
> URL: https://issues.apache.org/jira/browse/HIVE-24751
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> At present it is not checked whether authorization is enabled or not for Kill 
> Query access. 
>  This causes Workload Manager thread to end up with No privilege Exception 
> when trying to kill a query in an environment where authorization is disabled.
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: No privilege
>  at 
> org.apache.hive.service.server.KillQueryImpl.killQuery(KillQueryImpl.java:188)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.WorkloadManager.lambda$scheduleWork$3(WorkloadManager.java:454)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.hive.service.cli.HiveSQLException: No privilege
>  at 
> org.apache.hive.service.server.KillQueryImpl.killQuery(KillQueryImpl.java:167)
>  ... 6 more{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24639) Raises SemanticException other than ClassCastException when filter has non-boolean expressions

2021-02-17 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-24639.

Resolution: Duplicate

> Raises SemanticException other than ClassCastException when filter has 
> non-boolean expressions
> --
>
> Key: HIVE-24639
> URL: https://issues.apache.org/jira/browse/HIVE-24639
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Sometimes we see ClassCastException in filters when fetching some rows of a 
> table or executing the query.  The 
> GenericUDFOPOr/GenericUDFOPAnd/FilterOperator assume that the output of their 
> conditions should be a boolean,  but there is no garanteed.  For example: 
> _select * from ccn_table where src + 1;_ 
> will throw ClassCastException:
> {code:java}
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Boolean
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:125)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:553)
> ...{code}
> We'd better to validate the filter during analyzing instead of at runtime and 
> bring more meaningful messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24639) Raises SemanticException other than ClassCastException when filter has non-boolean expressions

2021-02-17 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24639:
---
Fix Version/s: 4.0.0

> Raises SemanticException other than ClassCastException when filter has 
> non-boolean expressions
> --
>
> Key: HIVE-24639
> URL: https://issues.apache.org/jira/browse/HIVE-24639
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Sometimes we see ClassCastException in filters when fetching some rows of a 
> table or executing the query.  The 
> GenericUDFOPOr/GenericUDFOPAnd/FilterOperator assume that the output of their 
> conditions should be a boolean,  but there is no garanteed.  For example: 
> _select * from ccn_table where src + 1;_ 
> will throw ClassCastException:
> {code:java}
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Boolean
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:125)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:553)
> ...{code}
> We'd better to validate the filter during analyzing instead of at runtime and 
> bring more meaningful messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24516) Txnhandler onrename might ignore exceptions

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24516?focusedWorklogId=554007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554007
 ]

ASF GitHub Bot logged work on HIVE-24516:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 00:52
Start Date: 18/Feb/21 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1762:
URL: https://github.com/apache/hive/pull/1762


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554007)
Time Spent: 0.5h  (was: 20m)

> Txnhandler onrename might ignore exceptions
> ---
>
> Key: HIVE-24516
> URL: https://issues.apache.org/jira/browse/HIVE-24516
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is a followup on HIVE-24193. Table not exists errors shouldn't be 
> ignored in the first place.
> {code}
> } catch (SQLException e) {
> LOG.debug("Going to rollback: " + callSig);
> rollbackDBConn(dbConn);
> checkRetryable(dbConn, e, callSig);
> if (e.getMessage().contains("does not exist")) {
>   LOG.warn("Cannot perform " + callSig + " since metastore table does 
> not exist");
> } else {
>   throw new MetaException("Unable to " + callSig + ":" + 
> StringUtils.stringifyException(e));
> }
>   }
> {code}
> This error handling might have been put there for backard compatibility for 
> missing acid metadata tables, but this is not needed anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24710?focusedWorklogId=554005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554005
 ]

ASF GitHub Bot logged work on HIVE-24710:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 00:51
Start Date: 18/Feb/21 00:51
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on pull request #1940:
URL: https://github.com/apache/hive/pull/1940#issuecomment-780953443


   Thanks for the review @ashutoshc 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554005)
Time Spent: 40m  (was: 0.5h)

> Optimise PTF iteration for count(*) to reduce CPU and IO cost
> -
>
> Key: HIVE-24710
> URL: https://issues.apache.org/jira/browse/HIVE-24710
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> E.g query
> {noformat}
> select x, y, count(*) over (partition by x order by y range between 86400 
> PRECEDING and CURRENT ROW) r0 from foo
> {noformat}
> 1. In such cases, there is no need to iterate over the rowcontainers often 
> (internally it does O(n^2) operations taking forever when window frame is 
> really large). This can be optimised to reduce CPU burn and IO.
> 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when 
> parameters are empty. This codepath can also be optimised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost

2021-02-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-24710:
---

Assignee: Rajesh Balamohan

> Optimise PTF iteration for count(*) to reduce CPU and IO cost
> -
>
> Key: HIVE-24710
> URL: https://issues.apache.org/jira/browse/HIVE-24710
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> E.g query
> {noformat}
> select x, y, count(*) over (partition by x order by y range between 86400 
> PRECEDING and CURRENT ROW) r0 from foo
> {noformat}
> 1. In such cases, there is no need to iterate over the rowcontainers often 
> (internally it does O(n^2) operations taking forever when window frame is 
> really large). This can be optimised to reduce CPU burn and IO.
> 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when 
> parameters are empty. This codepath can also be optimised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost

2021-02-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-24710.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Thanks for the review [~ashutoshc]. Merged the PR.

> Optimise PTF iteration for count(*) to reduce CPU and IO cost
> -
>
> Key: HIVE-24710
> URL: https://issues.apache.org/jira/browse/HIVE-24710
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> E.g query
> {noformat}
> select x, y, count(*) over (partition by x order by y range between 86400 
> PRECEDING and CURRENT ROW) r0 from foo
> {noformat}
> 1. In such cases, there is no need to iterate over the rowcontainers often 
> (internally it does O(n^2) operations taking forever when window frame is 
> really large). This can be optimised to reduce CPU burn and IO.
> 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when 
> parameters are empty. This codepath can also be optimised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24710?focusedWorklogId=554001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554001
 ]

ASF GitHub Bot logged work on HIVE-24710:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 00:49
Start Date: 18/Feb/21 00:49
Worklog Time Spent: 10m 
  Work Description: rbalamohan merged pull request #1940:
URL: https://github.com/apache/hive/pull/1940


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554001)
Time Spent: 0.5h  (was: 20m)

> Optimise PTF iteration for count(*) to reduce CPU and IO cost
> -
>
> Key: HIVE-24710
> URL: https://issues.apache.org/jira/browse/HIVE-24710
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> E.g query
> {noformat}
> select x, y, count(*) over (partition by x order by y range between 86400 
> PRECEDING and CURRENT ROW) r0 from foo
> {noformat}
> 1. In such cases, there is no need to iterate over the rowcontainers often 
> (internally it does O(n^2) operations taking forever when window frame is 
> really large). This can be optimised to reduce CPU burn and IO.
> 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when 
> parameters are empty. This codepath can also be optimised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24710?focusedWorklogId=553960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553960
 ]

ASF GitHub Bot logged work on HIVE-24710:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 23:18
Start Date: 17/Feb/21 23:18
Worklog Time Spent: 10m 
  Work Description: ashutoshc commented on pull request #1940:
URL: https://github.com/apache/hive/pull/1940#issuecomment-780917826


   +1 LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553960)
Time Spent: 20m  (was: 10m)

> Optimise PTF iteration for count(*) to reduce CPU and IO cost
> -
>
> Key: HIVE-24710
> URL: https://issues.apache.org/jira/browse/HIVE-24710
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> E.g query
> {noformat}
> select x, y, count(*) over (partition by x order by y range between 86400 
> PRECEDING and CURRENT ROW) r0 from foo
> {noformat}
> 1. In such cases, there is no need to iterate over the rowcontainers often 
> (internally it does O(n^2) operations taking forever when window frame is 
> really large). This can be optimised to reduce CPU burn and IO.
> 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when 
> parameters are empty. This codepath can also be optimised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24764) insert overwrite on a partition resets row count stats in other partitions

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24764?focusedWorklogId=553957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553957
 ]

ASF GitHub Bot logged work on HIVE-24764:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 23:07
Start Date: 17/Feb/21 23:07
Worklog Time Spent: 10m 
  Work Description: ashutoshc commented on pull request #1967:
URL: https://github.com/apache/hive/pull/1967#issuecomment-780913343


   +1 LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553957)
Time Spent: 40m  (was: 0.5h)

> insert overwrite on a partition resets row count stats in other partitions
> --
>
> Key: HIVE-24764
> URL: https://issues.apache.org/jira/browse/HIVE-24764
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After insert overwrite on a partition, stats on other partitions are messed 
> up. Subsequent queries end up with plans with PARTIAL stats. In certain 
> cases, this leads to suboptimal query plans.
> {noformat}
> drop table if exists test_stats;
> drop table if exists test_stats_2;
> create table test_stats(i int, j bigint);
> create table test_stats_2(i int) partitioned by (j bigint);
> insert into test_stats values (1, 1), (2, 2), (3, 3), (4, 4), (5, NULL);
> -- select * from test_stats;
> 1   1
> 2   2
> 3   3
> 4   4
> 5   
> insert overwrite table test_stats_2 partition(j)  select i, j from test_stats 
> where j is not null;
> -- After executing this statement, stat gets messed up.
> insert overwrite table test_stats_2 partition(j)  select i, j from test_stats 
> where j is null;
> -- select * from test_stats_2;
> 1   1
> 2   2
> 3   3
> 4   4
> 5   
> -- This would return "PARTIAL" stats instead of "COMPLETE"
> explain select i, count(*) as c from test_stats_2 group by i order by c desc 
> limit 10;
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: hive_20210208093110_62ced99e-f068-42d4-9ba8-d45fccd6c0a2:68
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: hive_20210208093110_62ced99e-f068-42d4-9ba8-d45fccd6c0a2:68
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test_stats_2
>   Statistics: Num rows: 125 Data size: 500 Basic stats: 
> PARTIAL Column stats: COMPLETE
>   Select Operator
> expressions: i (type: int)
> outputColumnNames: i
> Statistics: Num rows: 125 Data size: 500 Basic stats: 
> PARTIAL Column stats: COMPLETE
> Group By Operator
>   aggregations: count()
>   keys: i (type: int)
>   minReductionHashAggr: 0.99
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 125 Data size: 1500 Basic stats: 
> PARTIAL Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 125 Data size: 1500 Basic 
> stats: PARTIAL Column stats: COMPLETE
> value expressions: _col1 (type: bigint)
> Execution mode: vectorized, llap
> LLAP IO: may be used (ACID table)
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 62 Data size: 744 Basic stats: PARTIAL 
> Column stats: COMPLETE
> Top N Key Operator
>   sort order: -
>   keys: _col1 (type: bigint)
>   null sort order: a
>   Statistics: Num rows: 62 Data size: 744 Basic stats: 
> PARTIAL Column stats: COMPLETE
> 

[jira] [Work logged] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24786?focusedWorklogId=553949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553949
 ]

ASF GitHub Bot logged work on HIVE-24786:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 22:54
Start Date: 17/Feb/21 22:54
Worklog Time Spent: 10m 
  Work Description: t3rmin4t0r commented on a change in pull request #1983:
URL: https://github.com/apache/hive/pull/1983#discussion_r577852301



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -581,21 +596,99 @@ public long getRetryInterval() {
 } else {
   httpClientBuilder = HttpClientBuilder.create();
 }
-// In case the server's idletimeout is set to a lower value, it might 
close it's side of
-// connection. However we retry one more time on NoHttpResponseException
+
+// Beeline <--> LB <--> Reverse Proxy <-> Hiveserver2
+// In case of deployments like above, the LoadBalancer (LB) can be 
configured with Idle Timeout after which the LB
+// will send TCP RST to Client (Beeline) and Backend (Reverse Proxy). If 
user is connected to beeline, idle for
+// sometime and resubmits a query after the idle timeout there is a broken 
pipe between beeline and LB. When Beeline
+// tries to submit the query one of two things happen, it either hangs or 
times out (if socketTimeout is defined in
+// the jdbc param). The hang is because of the default infinite socket 
timeout for which there is no auto-recovery
+// (user have to manually interrupt the query). If the socketTimeout jdbc 
param was specified, beeline will receive
+// SocketTimeoutException (Read Timeout) or NoHttpResponseException both 
of which can be retried if maxRetries is
+// also specified by the user (jdbc param).
+// The following retry handler handles the above cases in addition to 
retries for idempotent and unsent requests.
 httpClientBuilder.setRetryHandler(new HttpRequestRetryHandler() {
+  // This handler is mostly a copy of DefaultHttpRequestRetryHandler 
except it also retries some exceptions
+  // which could be thrown in certain cases where idle timeout from 
intermediate proxy triggers a connection reset.
+  private final List> nonRetriableClasses = 
Arrays.asList(
+  InterruptedIOException.class,
+  UnknownHostException.class,
+  ConnectException.class,
+  SSLException.class);
+  // socket exceptions could happen because of timeout, broken pipe or 
server not responding in which case it is
+  // better to reopen the connection and retry if user specified maxRetries
+  private final List> retriableClasses = 
Arrays.asList(
+  SocketTimeoutException.class,
+  SocketException.class,
+  NoHttpResponseException.class
+  );
+
   @Override
   public boolean retryRequest(IOException exception, int executionCount, 
HttpContext context) {
-if (executionCount > 1) {
-  LOG.info("Retry attempts to connect to server exceeded.");
+Args.notNull(exception, "Exception parameter");
+Args.notNull(context, "HTTP context");
+if (executionCount > maxRetries) {
+  // Do not retry if over max retry count
+  LOG.error("Max retries (" + maxRetries + ") exhausted.", exception);
+  return false;
+}
+if (this.retriableClasses.contains(exception.getClass())) {
+  LOG.info("Retrying " + exception.getClass() + " as it is in 
retriable classes list.");
+  return true;
+}
+if (this.nonRetriableClasses.contains(exception.getClass())) {
+  LOG.info("Not retrying as the class (" + exception.getClass() + ") 
is non-retriable class.");
+  return false;
+} else {
+  for (final Class rejectException : 
this.nonRetriableClasses) {
+if (rejectException.isInstance(exception)) {
+  LOG.info("Not retrying as the class (" + exception.getClass() + 
") is an instance of is non-retriable class.");;
+  return false;
+}
+  }
+}
+final HttpClientContext clientContext = 
HttpClientContext.adapt(context);
+final HttpRequest request = clientContext.getRequest();
+
+if(requestIsAborted(request)){
+  LOG.info("Not retrying as request is aborted.");
   return false;
 }
-if (exception instanceof org.apache.http.NoHttpResponseException) {
-  LOG.info("Could not connect to the server. Retrying one more time.");
+
+if (handleAsIdempotent(request)) {
+  LOG.info("Retrying idempotent request. Attempt " + executionCount + 
" of " + maxRetries);
+  // Retry if the request is considered idempotent
+  return true;
+}
+
+if (!clientContext.isRequestSent()) 

[jira] [Work logged] (HIVE-24778) Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety properties

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24778?focusedWorklogId=553948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553948
 ]

ASF GitHub Bot logged work on HIVE-24778:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 22:52
Start Date: 17/Feb/21 22:52
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1982:
URL: https://github.com/apache/hive/pull/1982#discussion_r578001413



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java
##
@@ -45,7 +45,7 @@
   public TimestampCastRestrictorResolver(UDFMethodResolver parentResolver) {
 this.parentResolver = parentResolver;
 SessionState ss = SessionState.get();
-if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
+if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) {

Review comment:
   This means that for all the other kind of strict checks we are 
vulnerable to conversions happening during init right?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553948)
Time Spent: 40m  (was: 0.5h)

> Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety 
> properties
> 
>
> Key: HIVE-24778
> URL: https://issues.apache.org/jira/browse/HIVE-24778
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The majority of strict type checks can be controlled by 
> {{hive.strict.checks.type.safety}} property. HIVE-24157 introduced another 
> property, namely  {{hive.strict.timestamp.conversion}}, to control the 
> implicit comparisons between numerics and timestamps.
> The name and description of {{hive.strict.checks.type.safety}} imply that the 
> property covers all strict checks so having others for specific cases appears 
> confusing and can easily lead to unexpected behavior.
> The goal of this issue is to unify those properties to facilitate 
> configuration and improve code reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24693) Convert timestamps to zoned times without string operations

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=553894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553894
 ]

ASF GitHub Bot logged work on HIVE-24693:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 21:34
Start Date: 17/Feb/21 21:34
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1938:
URL: https://github.com/apache/hive/pull/1938


   Replaces #1918



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553894)
Time Spent: 6h  (was: 5h 50m)

> Convert timestamps to zoned times without string operations
> ---
>
> Key: HIVE-24693
> URL: https://issues.apache.org/jira/browse/HIVE-24693
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a 
> timestamp object into a binary value.  The way in which it does this,... it 
> calls {{toString()}} on the timestamp object, and then parses the String.  
> This particular timestamp do not carry a timezone, so the string is something 
> like:
> {{2021-21-03 12:32:23....}}
> The parse code tries to parse the string assuming there is a time zone, and 
> if not, falls-back and applies the provided "default time zone".  As was 
> noted in [HIVE-24353], if something fails to parse, it is very expensive to 
> try to parse again.  So, for each timestamp in the Parquet file, it:
> * Builds a string from the time stamp
> * Parses it (throws an exception, parses again)
> There is no need to do this kind of string manipulations/parsing, it should 
> just be using the epoch millis/seconds/time stored internal to the Timestamp 
> object.
> {code:java}
>   // Converts Timestamp to TimestampTZ.
>   public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
> return parse(ts.toString(), defaultTimeZone);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24693) Convert timestamps to zoned times without string operations

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=553886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553886
 ]

ASF GitHub Bot logged work on HIVE-24693:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 21:08
Start Date: 17/Feb/21 21:08
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1938:
URL: https://github.com/apache/hive/pull/1938


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553886)
Time Spent: 5h 50m  (was: 5h 40m)

> Convert timestamps to zoned times without string operations
> ---
>
> Key: HIVE-24693
> URL: https://issues.apache.org/jira/browse/HIVE-24693
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a 
> timestamp object into a binary value.  The way in which it does this,... it 
> calls {{toString()}} on the timestamp object, and then parses the String.  
> This particular timestamp do not carry a timezone, so the string is something 
> like:
> {{2021-21-03 12:32:23....}}
> The parse code tries to parse the string assuming there is a time zone, and 
> if not, falls-back and applies the provided "default time zone".  As was 
> noted in [HIVE-24353], if something fails to parse, it is very expensive to 
> try to parse again.  So, for each timestamp in the Parquet file, it:
> * Builds a string from the time stamp
> * Parses it (throws an exception, parses again)
> There is no need to do this kind of string manipulations/parsing, it should 
> just be using the epoch millis/seconds/time stored internal to the Timestamp 
> object.
> {code:java}
>   // Converts Timestamp to TimestampTZ.
>   public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
> return parse(ts.toString(), defaultTimeZone);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24693) Convert timestamps to zoned times without string operations

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=553885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553885
 ]

ASF GitHub Bot logged work on HIVE-24693:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 21:08
Start Date: 17/Feb/21 21:08
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1938:
URL: https://github.com/apache/hive/pull/1938#issuecomment-780854241


   ` The build of this commit was aborted`
   
   Le sigh



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553885)
Time Spent: 5h 40m  (was: 5.5h)

> Convert timestamps to zoned times without string operations
> ---
>
> Key: HIVE-24693
> URL: https://issues.apache.org/jira/browse/HIVE-24693
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a 
> timestamp object into a binary value.  The way in which it does this,... it 
> calls {{toString()}} on the timestamp object, and then parses the String.  
> This particular timestamp do not carry a timezone, so the string is something 
> like:
> {{2021-21-03 12:32:23....}}
> The parse code tries to parse the string assuming there is a time zone, and 
> if not, falls-back and applies the provided "default time zone".  As was 
> noted in [HIVE-24353], if something fails to parse, it is very expensive to 
> try to parse again.  So, for each timestamp in the Parquet file, it:
> * Builds a string from the time stamp
> * Parses it (throws an exception, parses again)
> There is no need to do this kind of string manipulations/parsing, it should 
> just be using the epoch millis/seconds/time stored internal to the Timestamp 
> object.
> {code:java}
>   // Converts Timestamp to TimestampTZ.
>   public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
> return parse(ts.toString(), defaultTimeZone);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24743?focusedWorklogId=553835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553835
 ]

ASF GitHub Bot logged work on HIVE-24743:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 19:31
Start Date: 17/Feb/21 19:31
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #1956:
URL: https://github.com/apache/hive/pull/1956#discussion_r577886973



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3195,17 +3196,24 @@ public void alterPartitions(String dbName, String 
tableName,
   public List 
getPartitionsByNames(String dbName, String tableName,
   List partitionNames) throws HiveException {
 try {
-  return getMSC().getPartitionsByNames(dbName, tableName, partitionNames);
+  GetPartitionsByNamesRequest req = new GetPartitionsByNamesRequest();
+  req.setDb_name(dbName);
+  req.setTbl_name(tableName);
+  req.setNames(partitionNames);
+  return getPartitionsByNames(req, null);
 } catch (Exception e) {
   LOG.error("Failed getPartitionsByNames", e);
   throw new HiveException(e);
 }
   }
 
-public List 
getPartitionsByNames(GetPartitionsByNamesRequest req)
+public List 
getPartitionsByNames(GetPartitionsByNamesRequest req,
+  Table table)
 throws HiveException {
 try {
-  Table table = getTable(req.getDb_name(), req.getTbl_name());
+  if( table == null ) {
+table = getTable(req.getDb_name(), req.getTbl_name());

Review comment:
   Ok, removed the extra getTable call now. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553835)
Time Spent: 1.5h  (was: 1h 20m)

> [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2
> --
>
> Key: HIVE-24743
> URL: https://issues.apache.org/jira/browse/HIVE-24743
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As part of ( HIVE-23821: Send tableId in request for all the new HMS 
> get_partition APIs ) we added logic to send tableId in the request for 
> several get_partition APIs, but looks like it was missed out for 
> getPartitionsByNames. TableId and validWriteIdList are used to maintain 
> consistency, when HMS API response is being served from a remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24788) Backport HIVE-23338 to branch-3.1

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24788?focusedWorklogId=553834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553834
 ]

ASF GitHub Bot logged work on HIVE-24788:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 19:29
Start Date: 17/Feb/21 19:29
Worklog Time Spent: 10m 
  Work Description: h-vetinari commented on pull request #1986:
URL: https://github.com/apache/hive/pull/1986#issuecomment-780796536


   This is my first time contributing to hive. I was surprised that another CI 
run appeared 24h later. It fails, but says
   ```
   There are 0 new tests failing, 26 existing failing and 53 skipped.
   ```
   
   That sounds like the failures are pre-existing? Anything I need to do here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553834)
Time Spent: 0.5h  (was: 20m)

> Backport HIVE-23338 to branch-3.1
> -
>
> Key: HIVE-24788
> URL: https://issues.apache.org/jira/browse/HIVE-24788
> Project: Hive
>  Issue Type: Task
>Reporter: H. Vetinari
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> jackson has a whole bunch of CVEs open against 2.9.x, which makes working 
> with HIVE in security aware environments quite difficult.
> This has been fixed in HIVE-23338 already, but since 4.0.0 hasn't been 
> released yet (and is not on the horizon, as far as I can tell), this should 
> be backported to {{branch-3.1}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24733) Handle replication when db location and managed location is set to custom location on source

2021-02-17 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286095#comment-17286095
 ] 

Aasha Medhi commented on HIVE-24733:


+1

> Handle replication when db location and managed location is set to custom 
> location on source
> 
>
> Key: HIVE-24733
> URL: https://issues.apache.org/jira/browse/HIVE-24733
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {color:#172b4d} {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24743?focusedWorklogId=553780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553780
 ]

ASF GitHub Bot logged work on HIVE-24743:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 18:23
Start Date: 17/Feb/21 18:23
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #1956:
URL: https://github.com/apache/hive/pull/1956#discussion_r577841886



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3195,17 +3196,24 @@ public void alterPartitions(String dbName, String 
tableName,
   public List 
getPartitionsByNames(String dbName, String tableName,
   List partitionNames) throws HiveException {
 try {
-  return getMSC().getPartitionsByNames(dbName, tableName, partitionNames);
+  GetPartitionsByNamesRequest req = new GetPartitionsByNamesRequest();
+  req.setDb_name(dbName);
+  req.setTbl_name(tableName);
+  req.setNames(partitionNames);
+  return getPartitionsByNames(req, null);
 } catch (Exception e) {
   LOG.error("Failed getPartitionsByNames", e);
   throw new HiveException(e);
 }
   }
 
-public List 
getPartitionsByNames(GetPartitionsByNamesRequest req)
+public List 
getPartitionsByNames(GetPartitionsByNamesRequest req,
+  Table table)
 throws HiveException {
 try {
-  Table table = getTable(req.getDb_name(), req.getTbl_name());
+  if( table == null ) {
+table = getTable(req.getDb_name(), req.getTbl_name());

Review comment:
   TableId is still required for the remote cache to decide whether it can 
serve the data from cache or has to refresh it from HMS. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553780)
Time Spent: 1h 20m  (was: 1h 10m)

> [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2
> --
>
> Key: HIVE-24743
> URL: https://issues.apache.org/jira/browse/HIVE-24743
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As part of ( HIVE-23821: Send tableId in request for all the new HMS 
> get_partition APIs ) we added logic to send tableId in the request for 
> several get_partition APIs, but looks like it was missed out for 
> getPartitionsByNames. TableId and validWriteIdList are used to maintain 
> consistency, when HMS API response is being served from a remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24743?focusedWorklogId=553779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553779
 ]

ASF GitHub Bot logged work on HIVE-24743:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 18:20
Start Date: 17/Feb/21 18:20
Worklog Time Spent: 10m 
  Work Description: yongzhi commented on a change in pull request #1956:
URL: https://github.com/apache/hive/pull/1956#discussion_r57783



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3195,17 +3196,24 @@ public void alterPartitions(String dbName, String 
tableName,
   public List 
getPartitionsByNames(String dbName, String tableName,
   List partitionNames) throws HiveException {
 try {
-  return getMSC().getPartitionsByNames(dbName, tableName, partitionNames);
+  GetPartitionsByNamesRequest req = new GetPartitionsByNamesRequest();
+  req.setDb_name(dbName);
+  req.setTbl_name(tableName);
+  req.setNames(partitionNames);
+  return getPartitionsByNames(req, null);
 } catch (Exception e) {
   LOG.error("Failed getPartitionsByNames", e);
   throw new HiveException(e);
 }
   }
 
-public List 
getPartitionsByNames(GetPartitionsByNamesRequest req)
+public List 
getPartitionsByNames(GetPartitionsByNamesRequest req,
+  Table table)
 throws HiveException {
 try {
-  Table table = getTable(req.getDb_name(), req.getTbl_name());
+  if( table == null ) {
+table = getTable(req.getDb_name(), req.getTbl_name());

Review comment:
   As if you do not use cached client, you need not table id, it will be 
different for the client direct use HMS apis (no need to call getTable). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553779)
Time Spent: 1h 10m  (was: 1h)

> [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2
> --
>
> Key: HIVE-24743
> URL: https://issues.apache.org/jira/browse/HIVE-24743
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As part of ( HIVE-23821: Send tableId in request for all the new HMS 
> get_partition APIs ) we added logic to send tableId in the request for 
> several get_partition APIs, but looks like it was missed out for 
> getPartitionsByNames. TableId and validWriteIdList are used to maintain 
> consistency, when HMS API response is being served from a remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2021-02-17 Thread Ashish Doneriya (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Doneriya updated HIVE-21915:
---
Description: 
The HQL syntax is like this:

CREATE TEMPORARY TABLE tez_union_all_loss_data AS
 SELECT xxx, yyy, zzz,1 as tag
 FROM ods_1

UNION ALL

SELECT xxx, yyy, zzz, tag
 FROM
 (
 SELECT xxx
 ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
 ,zzz
 ,2 as tag
 FROM ods_2
 LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
 ) tbl 
 ;

 

With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
appear. In our case however, all the rows with tag = 1 are lost.

Dig deeper we can find that the generated two maps have identical task tmp 
paths. And that results from when UDTF is present, the FileSinkOperator will be 
processed twice generating the tmp path in GenTezUtils.removeUnionOperators();

 

  was:
The HQL syntax is like this:

CREATE TEMPORARY TABLE tez_union_all_loss_data AS
SELECT xxx, yyy, zzz,1 as tag
FROM ods_1

UNION ALL

SELECT xxx, yyy, zzz, tag
FROM
(
SELECT xxx
,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
,zzz
,2 as tag
FROM ods_2
LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
) tbl 
;

 

With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
appear. In our case however, all the rows with tag = 1 are lost.

Dig deeper we can find that the generated two maps have identical task tmp 
paths. And that results from when UDTF is present, the FileSinkOperator will be 
processed twice generating the tmp path in GenTezUtils.removeUnionOperators();

 


> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, 
> HIVE-21915.03.patch, HIVE-21915.04.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
>  SELECT xxx, yyy, zzz,1 as tag
>  FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
>  FROM
>  (
>  SELECT xxx
>  ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
>  ,zzz
>  ,2 as tag
>  FROM ods_2
>  LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
>  ) tbl 
>  ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24775) Incorrect null handling when rebuilding Materialized view incrementally

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24775?focusedWorklogId=553740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553740
 ]

ASF GitHub Bot logged work on HIVE-24775:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 17:01
Start Date: 17/Feb/21 17:01
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1981:
URL: https://github.com/apache/hive/pull/1981#discussion_r50512



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##
@@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) {
   projExprs.add(rightRef);
   joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,

Review comment:
   Instead of doing the transformation in the rewrite to AST method, let's 
do it here. That will be more consistent and decrease rewriting at the AST 
level.
   
   In particular, this should be `SqlStdOperatorTable.IS_NOT_DISTINCT_FROM`.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##
@@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) {
   projExprs.add(rightRef);
   joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,
   ImmutableList.of(leftRef, rightRef)));
-  filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,

Review comment:
   In this case, this filter condition should be the same one that it is 
introduced for the join operator (with 
`SqlStdOperatorTable.IS_NOT_DISTINCT_FROM`)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553740)
Time Spent: 0.5h  (was: 20m)

> Incorrect null handling when rebuilding Materialized view incrementally
> ---
>
> Key: HIVE-24775
> URL: https://issues.apache.org/jira/browse/HIVE-24775
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE t1 (a int, b varchar(256), c decimal(10,2), d int) STORED AS orc 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO t1 VALUES
>  (NULL, 'null_value', 100.77, 7),
>  (1, 'calvin', 978.76, 3),
>  (1, 'charlie', 9.8, 1);
> CREATE MATERIALIZED VIEW mat1 TBLPROPERTIES ('transactional'='true') AS
>   SELECT a, b, sum(d)
>   FROM t1
>   WHERE c > 10.0
>   GROUP BY a, b;
> INSERT INTO t1 VALUES
>  (NULL, 'null_value', 100.88, 8),
>  (1, 'charlie', 15.8, 1);
> ALTER MATERIALIZED VIEW mat1 REBUILD;
> SELECT * FROM mat1
> ORDER BY a, b;
> {code}
> View contains:
> {code}
> 1 calvin  3
> 1 charlie 1
> NULL  null_value  8
> NULL  null_value  7
> {code}
> but it should contain:
> {code}
> 1 calvin  3
> 1 charlie 1
> NULL  null_value  15
> {code}
> Rows with aggregate key columns having NULL values are not aggregated because 
> incremental materialized view rebuild plan is altered by 
> [applyPreJoinOrderingTransforms|https://github.com/apache/hive/blob/76732ad27e139fbdef25b820a07cf35934771083/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1975]:
>   IS NOT NULL filter added for each of these columns on top of the view scan 
> when joining with the branch pulls the rows inserted after the last rebuild:
> {code}
> HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(AND(IS NULL($0), IS NULL($1)), $5, 
> +($5, $2))])
>   HiveFilter(condition=[OR(AND(IS NULL($0), IS NULL($1)), AND(=($0, $3), 
> =($1, $4)))])
> HiveJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[right], 
> algorithm=[none], cost=[not available])
>   HiveProject(a=[$0], b=[$1], _c2=[$2])
> HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))])
>   HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
>   HiveProject(a=[$0], b=[$1], $f2=[$2])
> HiveAggregate(group=[{0, 1}], agg#0=[sum($3)])
>   HiveFilter(condition=[AND(<(1, $6.writeid), >($2, 10))])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24775) Incorrect null handling when rebuilding Materialized view incrementally

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24775?focusedWorklogId=553738=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553738
 ]

ASF GitHub Bot logged work on HIVE-24775:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 17:00
Start Date: 17/Feb/21 17:00
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1981:
URL: https://github.com/apache/hive/pull/1981#discussion_r577759809



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##
@@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) {
   projExprs.add(rightRef);
   joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,
   ImmutableList.of(leftRef, rightRef)));
-  filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,
-  ImmutableList.of(leftRef)));
 }
 // 3) Add the expressions that correspond to the aggregation
 // functions
-RexNode caseFilterCond = RexUtil.composeConjunction(rexBuilder, 
filterConjs);
+List filterConjs = new ArrayList<>();
 for (int i = 0, leftPos = groupCount, rightPos = totalCount + groupCount;
  leftPos < totalCount; i++, leftPos++, rightPos++) {
   // case when mv2.deptno IS NULL AND mv2.deptname IS NULL then s else 
source.s + mv2.s end
   RexNode leftRef = rexBuilder.makeInputRef(
   joinLeftInput.getRowType().getFieldList().get(leftPos).getType(), 
leftPos);
+  filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,

Review comment:
   This `IS_NULL` filter should not be here since we do not have any 
guarantees the aggregate results may or may not produce nulls?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##
@@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) {
   projExprs.add(rightRef);
   joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,

Review comment:
   Instead of doing the transformation in the rewrite to AST method, let's 
do it here. That will be more consistent and decrease rewriting at the AST 
level.
   
   In particular, this should be `SqlStdOperatorTable .IS_NOT_DISTINCT_FROM`.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##
@@ -158,7 +157,8 @@ public void onMatch(RelOptRuleCall call) {
 + " recognized: " + aggCall);
   }
   projExprs.add(rexBuilder.makeCall(SqlStdOperatorTable.CASE,
-  ImmutableList.of(caseFilterCond, rightRef, elseReturn)));
+  ImmutableList.of(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,
+  ImmutableList.of(leftRef)), rightRef, elseReturn)));
 }
 RexNode joinCond = RexUtil.composeConjunction(rexBuilder, joinConjs);
 RexNode filterCond = RexUtil.composeConjunction(rexBuilder, filterConjs);

Review comment:
   The .OR condition below would change too. Each branch of the OR is 
supposed to filter either insert/update operation. Can we use `NOT` on top of 
the join condition to create the second disjunct for the OR?
   
   ```
   // (mv2.deptno <=> source.deptno AND mv2.deptname <=> source.deptname)
   //  OR NOT(mv2.deptno <=> source.deptno AND mv2.deptname <=> source.deptname)
   ```

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##
@@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) {
   projExprs.add(rightRef);
   joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,
   ImmutableList.of(leftRef, rightRef)));
-  filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,

Review comment:
   In this case, this filter condition should be the same one that it is 
introduced for the join operator (with `SqlStdOperatorTable 
.IS_NOT_DISTINCT_FROM`)

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -1118,6 +1118,18 @@ Table materializeCTE(String cteName, CTEClause cte) 
throws HiveException {
   }
 
   private void fixUpASTAggregateIncrementalRebuild(ASTNode newAST) throws 
SemanticException {
+// Replace equality operators with null safe equality operators in join 
condition

Review comment:
   This will probably change if the handling is done above. I am hoping the 
method can stay almost as it was, except for the condition in old L1255 to 
infer the insert vs update branch, which could probably be done based on `NOT`?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##
@@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) {
  

[jira] [Work logged] (HIVE-24733) Handle replication when db location and managed location is set to custom location on source

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24733?focusedWorklogId=553721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553721
 ]

ASF GitHub Bot logged work on HIVE-24733:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 16:38
Start Date: 17/Feb/21 16:38
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1942:
URL: https://github.com/apache/hive/pull/1942#discussion_r577765531



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -485,6 +485,9 @@ private Task getReplLoadRootTask(String sourceDb, String 
replicadb, boolean isIn
   metricCollector = new BootstrapLoadMetricCollector(replicadb, 
tuple.dumpLocation, 0,
 confTemp);
 }
+/* When 'hive.repl.retain.custom.db.locations.on.target' is enabled, the 
first iteration of repl load would
+   run only database creation task, and only in next iteration of Repl 
Load Task execution, remaining tasks will be
+   executed. Hence disabling this to perform the test on task 
optimization.  */

Review comment:
   Why is this set to false in BaseReplicationScenarios





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553721)
Time Spent: 1h 20m  (was: 1h 10m)

> Handle replication when db location and managed location is set to custom 
> location on source
> 
>
> Key: HIVE-24733
> URL: https://issues.apache.org/jira/browse/HIVE-24733
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {color:#172b4d} {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24728) Low level reader for llap cache hydration

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24728?focusedWorklogId=553712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553712
 ]

ASF GitHub Bot logged work on HIVE-24728:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 16:30
Start Date: 17/Feb/21 16:30
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #1990:
URL: https://github.com/apache/hive/pull/1990


   
   
   ### What changes were proposed in this pull request?
   This is a subtask for the cache hydration feature, it provides a way to read 
ORC files based on already calculated positions.
   
   
   
   ### Why are the changes needed?
   
   LLAP cache hydration will enable save/load the cache contents. The buffer 
positions were already calculated, so we need a way to read and load them into 
the cache.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manual tests were conducted. q tests will be added when the feature is ready.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553712)
Remaining Estimate: 0h
Time Spent: 10m

> Low level reader for llap cache hydration
> -
>
> Key: HIVE-24728
> URL: https://issues.apache.org/jira/browse/HIVE-24728
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24728) Low level reader for llap cache hydration

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24728:
--
Labels: pull-request-available  (was: )

> Low level reader for llap cache hydration
> -
>
> Key: HIVE-24728
> URL: https://issues.apache.org/jira/browse/HIVE-24728
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24725) Collect top priority items from llap cache policy

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24725?focusedWorklogId=553704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553704
 ]

ASF GitHub Bot logged work on HIVE-24725:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 16:10
Start Date: 17/Feb/21 16:10
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #1947:
URL: https://github.com/apache/hive/pull/1947#discussion_r577741830



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4564,6 +4564,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "The meaning of this parameter is the inverse of the number of time 
ticks (cache\n" +
 " operations, currently) that cause the combined recency-frequency of 
a block in cache\n" +
 " to be halved."),
+LLAP_LRFU_CUTOFF_PERCENTAGE("hive.llap.io.lrfu.cutoff.percentage", 0.10f,

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553704)
Time Spent: 40m  (was: 0.5h)

> Collect top priority items from llap cache policy
> -
>
> Key: HIVE-24725
> URL: https://issues.apache.org/jira/browse/HIVE-24725
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24726) Track required data for cache hydration

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24726?focusedWorklogId=553700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553700
 ]

ASF GitHub Bot logged work on HIVE-24726:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 16:07
Start Date: 17/Feb/21 16:07
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #1961:
URL: https://github.com/apache/hive/pull/1961#discussion_r577739515



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/MemoryLimitedPathCache.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.cache;
+
+import com.google.common.cache.Cache;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.Weigher;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Memory limited version of the path cache.
+ */
+public class MemoryLimitedPathCache implements PathCache {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(MemoryLimitedPathCache.class);
+  private Cache internalCache;
+
+  public MemoryLimitedPathCache(Configuration conf) {
+internalCache = CacheBuilder.newBuilder()
+.maximumWeight(HiveConf.getSizeVar(conf, 
HiveConf.ConfVars.LLAP_IO_PATH_CACHE_SIZE))
+.weigher(new PathWeigher())
+.build();
+  }
+
+  @Override
+  public void touch(Object key, String val) {
+if (key != null) {
+  internalCache.put(key, val);
+}
+  }
+
+  @Override
+  public String resolve(Object key) {
+return key != null ? internalCache.getIfPresent(key) : null;
+  }
+
+  private static class PathWeigher implements Weigher {
+
+@Override
+public int weigh(Object key, String value) {
+  // String memory footprint

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553700)
Time Spent: 40m  (was: 0.5h)

> Track required data for cache hydration
> ---
>
> Key: HIVE-24726
> URL: https://issues.apache.org/jira/browse/HIVE-24726
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24726) Track required data for cache hydration

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24726?focusedWorklogId=553699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553699
 ]

ASF GitHub Bot logged work on HIVE-24726:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 16:07
Start Date: 17/Feb/21 16:07
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #1961:
URL: https://github.com/apache/hive/pull/1961#discussion_r577739161



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -112,6 +115,8 @@
   private final BufferUsageManager bufferManager;
   private final Configuration daemonConf;
   private final LowLevelCacheMemoryManager memoryManager;
+  private LowLevelCachePolicy realCachePolicy;

Review comment:
   Both will be required in HIVE-24727, I've reverted them for now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553699)
Time Spent: 0.5h  (was: 20m)

> Track required data for cache hydration
> ---
>
> Key: HIVE-24726
> URL: https://issues.apache.org/jira/browse/HIVE-24726
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=553691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553691
 ]

ASF GitHub Bot logged work on HIVE-24739:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 15:54
Start Date: 17/Feb/21 15:54
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1946:
URL: https://github.com/apache/hive/pull/1946


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553691)
Time Spent: 5h 10m  (was: 5h)

> Clarify Usage of Thrift TServerEventHandler and Count Number of Messages 
> Processed
> --
>
> Key: HIVE-24739
> URL: https://issues.apache.org/jira/browse/HIVE-24739
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Make the messages emitted from {{TServerEventHandler}} more meaningful.  
> Also, track the number of messages that each client sends to aid in 
> troubleshooting.
> I run into this issue all the time with and this would greatly help clarify 
> the logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=553688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553688
 ]

ASF GitHub Bot logged work on HIVE-24739:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 15:44
Start Date: 17/Feb/21 15:44
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1946:
URL: https://github.com/apache/hive/pull/1946


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553688)
Time Spent: 5h  (was: 4h 50m)

> Clarify Usage of Thrift TServerEventHandler and Count Number of Messages 
> Processed
> --
>
> Key: HIVE-24739
> URL: https://issues.apache.org/jira/browse/HIVE-24739
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Make the messages emitted from {{TServerEventHandler}} more meaningful.  
> Also, track the number of messages that each client sends to aid in 
> troubleshooting.
> I run into this issue all the time with and this would greatly help clarify 
> the logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24778) Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety properties

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24778?focusedWorklogId=553677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553677
 ]

ASF GitHub Bot logged work on HIVE-24778:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 15:27
Start Date: 17/Feb/21 15:27
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1982:
URL: https://github.com/apache/hive/pull/1982#discussion_r577705382



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java
##
@@ -45,7 +45,7 @@
   public TimestampCastRestrictorResolver(UDFMethodResolver parentResolver) {
 this.parentResolver = parentResolver;
 SessionState ss = SessionState.get();
-if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
+if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) {

Review comment:
   Hey @zabetak -- had a chat with @kgyrtkirk  regarding this.
   Seems like udfs do not communicate their accepted format/s as they may 
introduce converters/Bridges/etc during initialization.
   This is why HIVE-24157 introduced this  restriction as part of the in the 
resolver





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553677)
Time Spent: 0.5h  (was: 20m)

> Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety 
> properties
> 
>
> Key: HIVE-24778
> URL: https://issues.apache.org/jira/browse/HIVE-24778
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The majority of strict type checks can be controlled by 
> {{hive.strict.checks.type.safety}} property. HIVE-24157 introduced another 
> property, namely  {{hive.strict.timestamp.conversion}}, to control the 
> implicit comparisons between numerics and timestamps.
> The name and description of {{hive.strict.checks.type.safety}} imply that the 
> property covers all strict checks so having others for specific cases appears 
> confusing and can easily lead to unexpected behavior.
> The goal of this issue is to unify those properties to facilitate 
> configuration and improve code reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24782?focusedWorklogId=553676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553676
 ]

ASF GitHub Bot logged work on HIVE-24782:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 15:26
Start Date: 17/Feb/21 15:26
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #1989:
URL: https://github.com/apache/hive/pull/1989#issuecomment-780632859


   Thanks for the patch @jayp12323. Submitted to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553676)
Time Spent: 0.5h  (was: 20m)

> Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
> ---
>
> Key: HIVE-24782
> URL: https://issues.apache.org/jira/browse/HIVE-24782
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Jason Phelps
>Assignee: Jason Phelps
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24782-001.patch, HIVE-24782-002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In HIVE-22889, it introduced the following lines:
> {code:java}
> // remove the leading and trailing quotes. hcatalog can miss on some 
> cases.
> if (execString.length() > 1 && execString.startsWith("\"") && 
> execString.endsWith("\"")) {
>   execString = execString.substring(1, execString.length() - 1);
> }
> {code}
> When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because 
> execString is null but not wrapped in the handy Null check that is nearby
> {code:java}
> if (execString != null) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24782?focusedWorklogId=553675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553675
 ]

ASF GitHub Bot logged work on HIVE-24782:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 15:26
Start Date: 17/Feb/21 15:26
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #1989:
URL: https://github.com/apache/hive/pull/1989


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553675)
Time Spent: 20m  (was: 10m)

> Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
> ---
>
> Key: HIVE-24782
> URL: https://issues.apache.org/jira/browse/HIVE-24782
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Jason Phelps
>Assignee: Jason Phelps
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24782-001.patch, HIVE-24782-002.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In HIVE-22889, it introduced the following lines:
> {code:java}
> // remove the leading and trailing quotes. hcatalog can miss on some 
> cases.
> if (execString.length() > 1 && execString.startsWith("\"") && 
> execString.endsWith("\"")) {
>   execString = execString.substring(1, execString.length() - 1);
> }
> {code}
> When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because 
> execString is null but not wrapped in the handy Null check that is nearby
> {code:java}
> if (execString != null) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components

2021-02-17 Thread Jason Phelps (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285843#comment-17285843
 ] 

Jason Phelps commented on HIVE-24782:
-

Forgot to bring the comment along. Uploaded another patch

> Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
> ---
>
> Key: HIVE-24782
> URL: https://issues.apache.org/jira/browse/HIVE-24782
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Jason Phelps
>Assignee: Jason Phelps
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24782-001.patch, HIVE-24782-002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HIVE-22889, it introduced the following lines:
> {code:java}
> // remove the leading and trailing quotes. hcatalog can miss on some 
> cases.
> if (execString.length() > 1 && execString.startsWith("\"") && 
> execString.endsWith("\"")) {
>   execString = execString.substring(1, execString.length() - 1);
> }
> {code}
> When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because 
> execString is null but not wrapped in the handy Null check that is nearby
> {code:java}
> if (execString != null) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components

2021-02-17 Thread Jason Phelps (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Phelps updated HIVE-24782:

Attachment: HIVE-24782-002.patch

> Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
> ---
>
> Key: HIVE-24782
> URL: https://issues.apache.org/jira/browse/HIVE-24782
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Jason Phelps
>Assignee: Jason Phelps
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24782-001.patch, HIVE-24782-002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HIVE-22889, it introduced the following lines:
> {code:java}
> // remove the leading and trailing quotes. hcatalog can miss on some 
> cases.
> if (execString.length() > 1 && execString.startsWith("\"") && 
> execString.endsWith("\"")) {
>   execString = execString.substring(1, execString.length() - 1);
> }
> {code}
> When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because 
> execString is null but not wrapped in the handy Null check that is nearby
> {code:java}
> if (execString != null) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24782?focusedWorklogId=553622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553622
 ]

ASF GitHub Bot logged work on HIVE-24782:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 14:21
Start Date: 17/Feb/21 14:21
Worklog Time Spent: 10m 
  Work Description: jayp12323 opened a new pull request #1989:
URL: https://github.com/apache/hive/pull/1989


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553622)
Remaining Estimate: 0h
Time Spent: 10m

> Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
> ---
>
> Key: HIVE-24782
> URL: https://issues.apache.org/jira/browse/HIVE-24782
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Jason Phelps
>Assignee: Jason Phelps
>Priority: Major
> Attachments: HIVE-24782-001.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HIVE-22889, it introduced the following lines:
> {code:java}
> // remove the leading and trailing quotes. hcatalog can miss on some 
> cases.
> if (execString.length() > 1 && execString.startsWith("\"") && 
> execString.endsWith("\"")) {
>   execString = execString.substring(1, execString.length() - 1);
> }
> {code}
> When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because 
> execString is null but not wrapped in the handy Null check that is nearby
> {code:java}
> if (execString != null) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24782:
--
Labels: pull-request-available  (was: )

> Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
> ---
>
> Key: HIVE-24782
> URL: https://issues.apache.org/jira/browse/HIVE-24782
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Jason Phelps
>Assignee: Jason Phelps
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24782-001.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HIVE-22889, it introduced the following lines:
> {code:java}
> // remove the leading and trailing quotes. hcatalog can miss on some 
> cases.
> if (execString.length() > 1 && execString.startsWith("\"") && 
> execString.endsWith("\"")) {
>   execString = execString.substring(1, execString.length() - 1);
> }
> {code}
> When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because 
> execString is null but not wrapped in the handy Null check that is nearby
> {code:java}
> if (execString != null) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24786?focusedWorklogId=553592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553592
 ]

ASF GitHub Bot logged work on HIVE-24786:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 12:58
Start Date: 17/Feb/21 12:58
Worklog Time Spent: 10m 
  Work Description: prasanthj commented on pull request #1983:
URL: https://github.com/apache/hive/pull/1983#issuecomment-780539112


   @szlta Thanks for the review!
   I tested the PR on different environment and ran into different set of 
issues (likely because of OS difference and JDK difference). Encountered a hang 
issue under some scenarios (socket read timeout). The hang is because there was 
no timeout defined on the socket created by httpclient and hence it will 
infinitely wait until server can write something to it (which will never happen 
because LB disconnected the connection to server). HIVE-12371 provided a way to 
specify socketTimeout via JDBC param but that was applied only for binary 
transport and not for http. 
   I made couple of changes to the PR and tested it to make sure the hang never 
happens
   1) For httpmode use the socketTimeout specified by the user via jdbc param
   2) The default retry handler does not retry InterruptedIOException. Since 
SocketTimeoutException (read timeout) is subclass of InterruptedIOException it 
wasn't retried. So added a custom retry handler with list of classes that will 
be retried in addition to the default retry (idempotent and unsent requests). 
   Could you please take another look at the PR as it has changed?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553592)
Time Spent: 20m  (was: 10m)

> JDBC HttpClient should retry for idempotent and unsent http methods
> ---
>
> Key: HIVE-24786
> URL: https://issues.apache.org/jira/browse/HIVE-24786
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When hiveserver2 is behind multiple proxies there is possibility of "broken 
> pipe", "connect timeout" and "read timeout" exceptions if one of the 
> intermediate proxies or load balancers decided to reset the underlying tcp 
> socket after idle timeout. When the connection is broken and when the query 
> is submitted after idle timeout from beeline (or client) perspective the 
> connection is open but http methods (POST/GET) fails with socket related 
> exceptions. Since these methods are not sent to the server these are safe for 
> client side retries. 
>  
> Also HIVE-12371 seems to apply the socket timeout only to binary transport. 
> Same can be passed on to http client as well to avoid retry hang issues with 
> infinite timeouts. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24751) Workload Manager sees `No privilege` exception even when authorization is not enabled

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24751?focusedWorklogId=553589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553589
 ]

ASF GitHub Bot logged work on HIVE-24751:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 12:52
Start Date: 17/Feb/21 12:52
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1964:
URL: https://github.com/apache/hive/pull/1964#discussion_r577585597



##
File path: service/src/java/org/apache/hive/service/server/KillQueryImpl.java
##
@@ -116,9 +117,21 @@ public static void killChildYarnJobs(Configuration conf, 
String tag, String doAs
 
   private static boolean isAdmin() {
 boolean isAdmin = false;
-if (SessionState.get().getAuthorizerV2() != null) {
+SessionState ss = SessionState.get();
+if(!HiveConf.getBoolVar(ss.getConf(), 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED)) {

Review comment:
   space between if and (

##
File path: service/src/java/org/apache/hive/service/server/KillQueryImpl.java
##
@@ -116,9 +117,21 @@ public static void killChildYarnJobs(Configuration conf, 
String tag, String doAs
 
   private static boolean isAdmin() {
 boolean isAdmin = false;
-if (SessionState.get().getAuthorizerV2() != null) {
+SessionState ss = SessionState.get();
+if(!HiveConf.getBoolVar(ss.getConf(), 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED)) {
+  // If authorization is disabled, hs2 process owner should have kill 
privileges
   try {
-SessionState.get().getAuthorizerV2()
+String currentUser = ss.getUserName();
+String loginUser = 
UserGroupInformation.getCurrentUser().getShortUserName();
+return (currentUser != null) && currentUser.equals(loginUser);

Review comment:
   if loginUser will never be null then use  "return 
StringUtils.equals(currentUser, loginUser)". this will handle null values also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553589)
Time Spent: 20m  (was: 10m)

> Workload Manager sees `No privilege` exception even when authorization is not 
> enabled
> -
>
> Key: HIVE-24751
> URL: https://issues.apache.org/jira/browse/HIVE-24751
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At present it is not checked whether authorization is enabled or not for Kill 
> Query access. 
>  This causes Workload Manager thread to end up with No privilege Exception 
> when trying to kill a query in an environment where authorization is disabled.
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: No privilege
>  at 
> org.apache.hive.service.server.KillQueryImpl.killQuery(KillQueryImpl.java:188)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.WorkloadManager.lambda$scheduleWork$3(WorkloadManager.java:454)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.hive.service.cli.HiveSQLException: No privilege
>  at 
> org.apache.hive.service.server.KillQueryImpl.killQuery(KillQueryImpl.java:167)
>  ... 6 more{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24376) SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin mode

2021-02-17 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24376:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin  
> mode
> --
>
> Key: HIVE-24376
> URL: https://issues.apache.org/jira/browse/HIVE-24376
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Priority: Major
>
> the mode name is also a bit confusing..but here is what happens:
> {code}
> TS[A1] -> ...
> TS[A2] -> JOIN
> TS[B] -> JOIN
> {code}
> we have an SJ edge between TS[B] -> TS[A2] to communicate informations about 
> the join keys; lets assume the reducation ratio was r.
> RemoveSemijoin right now does the following:
> * removes the semijoin edge (so TS[A2] will become a full scan)
> * merges TS[A1] and TS[A2]
> w.r.t to read data from disk: this is great - we accessed A twice; from which 
> 1 was a full scan - and now we only read it once.
> but from row traffic perspective: TS[A2] emits more rows from now on because 
> we dont have the r ratio semijoin reduction anymore.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24376) SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin mode

2021-02-17 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24376:
---

Assignee: Zoltan Haindrich

> SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin  
> mode
> --
>
> Key: HIVE-24376
> URL: https://issues.apache.org/jira/browse/HIVE-24376
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> the mode name is also a bit confusing..but here is what happens:
> {code}
> TS[A1] -> ...
> TS[A2] -> JOIN
> TS[B] -> JOIN
> {code}
> we have an SJ edge between TS[B] -> TS[A2] to communicate informations about 
> the join keys; lets assume the reducation ratio was r.
> RemoveSemijoin right now does the following:
> * removes the semijoin edge (so TS[A2] will become a full scan)
> * merges TS[A1] and TS[A2]
> w.r.t to read data from disk: this is great - we accessed A twice; from which 
> 1 was a full scan - and now we only read it once.
> but from row traffic perspective: TS[A2] emits more rows from now on because 
> we dont have the r ratio semijoin reduction anymore.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24778) Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety properties

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24778?focusedWorklogId=553520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553520
 ]

ASF GitHub Bot logged work on HIVE-24778:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 10:37
Start Date: 17/Feb/21 10:37
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1982:
URL: https://github.com/apache/hive/pull/1982#discussion_r577501389



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
##
@@ -84,12 +84,12 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   "The function TIMESTAMP takes only primitive types");
 }
 
-if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
+if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) {
   PrimitiveCategory category = argumentOI.getPrimitiveCategory();
   PrimitiveGrouping group = 
PrimitiveObjectInspectorUtils.getPrimitiveGrouping(category);
   if (group == PrimitiveGrouping.NUMERIC_GROUP) {
 throw new UDFArgumentException(
-"Casting NUMERIC types to TIMESTAMP is prohibited (" + 
ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION + ")");
+"Casting NUMERIC types to TIMESTAMP is prohibited (" + 
ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY + ")");
   }

Review comment:
   Should this be here or rather in 
`TypeCheckProcFactory.DefaultExprProcessor#validateUDF`?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java
##
@@ -45,7 +45,7 @@
   public TimestampCastRestrictorResolver(UDFMethodResolver parentResolver) {
 this.parentResolver = parentResolver;
 SessionState ss = SessionState.get();
-if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
+if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) {

Review comment:
   If I understand well this class is used to restrict casting 
timestamp/date to boolean, double, byte, float, integer, long, short values. I 
am not sure why we should deal with these checks at this point but I if we 
really need this then I guess it makes sense to extend it so that we apply the 
same checks for all types under `hive.strict.checks.type.safety` property. 
Should we create another JIRA for this?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseCompare.java
##
@@ -166,10 +166,10 @@ protected void checkConversionAllowed(ObjectInspector 
argOI, ObjectInspector com
   return;
 }
 SessionState ss = SessionState.get();
-if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
+if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) {
   if (primitiveGroupOf(compareOI) == PrimitiveGrouping.NUMERIC_GROUP) {
 throw new UDFArgumentException(
-"Casting DATE/TIMESTAMP to NUMERIC is prohibited (" + 
ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION + ")");
+"Casting DATE/TIMESTAMP to NUMERIC is prohibited (" + 
ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY + ")");

Review comment:
   Why do we need this `checkConversionAllowed` method? If conversion is 
incompatible/dangerous shouldn't this be caught by 
`TypeCheckProcFactory.DefaultExprProcessor#validateUDF`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553520)
Time Spent: 20m  (was: 10m)

> Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety 
> properties
> 
>
> Key: HIVE-24778
> URL: https://issues.apache.org/jira/browse/HIVE-24778
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The majority of strict type checks can be controlled by 
> {{hive.strict.checks.type.safety}} property. HIVE-24157 introduced another 
> property, namely  {{hive.strict.timestamp.conversion}}, to control the 
> implicit comparisons between numerics and timestamps.
> The name and description of {{hive.strict.checks.type.safety}} imply that the 
> property covers all strict checks so having others for 

[jira] [Work logged] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24743?focusedWorklogId=553479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553479
 ]

ASF GitHub Bot logged work on HIVE-24743:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 08:41
Start Date: 17/Feb/21 08:41
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #1956:
URL: https://github.com/apache/hive/pull/1956#discussion_r577420383



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3195,17 +3196,24 @@ public void alterPartitions(String dbName, String 
tableName,
   public List 
getPartitionsByNames(String dbName, String tableName,
   List partitionNames) throws HiveException {
 try {
-  return getMSC().getPartitionsByNames(dbName, tableName, partitionNames);
+  GetPartitionsByNamesRequest req = new GetPartitionsByNamesRequest();
+  req.setDb_name(dbName);
+  req.setTbl_name(tableName);
+  req.setNames(partitionNames);
+  return getPartitionsByNames(req, null);
 } catch (Exception e) {
   LOG.error("Failed getPartitionsByNames", e);
   throw new HiveException(e);
 }
   }
 
-public List 
getPartitionsByNames(GetPartitionsByNamesRequest req)
+public List 
getPartitionsByNames(GetPartitionsByNamesRequest req,
+  Table table)
 throws HiveException {
 try {
-  Table table = getTable(req.getDb_name(), req.getTbl_name());
+  if( table == null ) {
+table = getTable(req.getDb_name(), req.getTbl_name());

Review comment:
   Table Id is not cached right now in the local cache, so it would be the 
same. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553479)
Time Spent: 1h  (was: 50m)

> [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2
> --
>
> Key: HIVE-24743
> URL: https://issues.apache.org/jira/browse/HIVE-24743
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As part of ( HIVE-23821: Send tableId in request for all the new HMS 
> get_partition APIs ) we added logic to send tableId in the request for 
> several get_partition APIs, but looks like it was missed out for 
> getPartitionsByNames. TableId and validWriteIdList are used to maintain 
> consistency, when HMS API response is being served from a remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24791) Backward compatibility issue in _dumpmetadata

2021-02-17 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24791:
---
Attachment: HIVE-24791.01.patch

> Backward compatibility issue in _dumpmetadata
> -
>
> Key: HIVE-24791
> URL: https://issues.apache.org/jira/browse/HIVE-24791
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24791.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24791) Backward compatibility issue in _dumpmetadata

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24791?focusedWorklogId=553471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553471
 ]

ASF GitHub Bot logged work on HIVE-24791:
-

Author: ASF GitHub Bot
Created on: 17/Feb/21 08:17
Start Date: 17/Feb/21 08:17
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1988:
URL: https://github.com/apache/hive/pull/1988


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553471)
Remaining Estimate: 0h
Time Spent: 10m

> Backward compatibility issue in _dumpmetadata
> -
>
> Key: HIVE-24791
> URL: https://issues.apache.org/jira/browse/HIVE-24791
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24791) Backward compatibility issue in _dumpmetadata

2021-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24791:
--
Labels: pull-request-available  (was: )

> Backward compatibility issue in _dumpmetadata
> -
>
> Key: HIVE-24791
> URL: https://issues.apache.org/jira/browse/HIVE-24791
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24791) Backward compatibility issue in _dumpmetadata

2021-02-17 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma reassigned HIVE-24791:
--


> Backward compatibility issue in _dumpmetadata
> -
>
> Key: HIVE-24791
> URL: https://issues.apache.org/jira/browse/HIVE-24791
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)