[jira] [Commented] (HIVE-16490) Hive should not use private HDFS APIs for encryption

2020-07-13 Thread Uma Maheswara Rao G (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157145#comment-17157145
 ] 

Uma Maheswara Rao G commented on HIVE-16490:


Any update on this?

{code}
public HdfsEncryptionShim(URI uri, Configuration conf) throws IOException {
  DistributedFileSystem dfs = (DistributedFileSystem)FileSystem.get(uri, 
conf);

  this.conf = conf;
  this.keyProvider = dfs.getClient().getKeyProvider();
  this.hdfsAdmin = new HdfsAdmin(uri, conf);
}
{code}
Looks like Hadoop23Shims.java is still using DFSClient api directly. HDFS-11687 
added getKeyProvider in HdfsAdmin class. Please use it and can void 
DistributedFileSystem casting as well.
Casting and DFS checks already present in HdfsAdmin constructor, so we don't to 
do here. 
Thanks

> Hive should not use private HDFS APIs for encryption
> 
>
> Key: HIVE-16490
> URL: https://issues.apache.org/jira/browse/HIVE-16490
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Affects Versions: 2.2.0
>Reporter: Andrew Wang
>Assignee: Naveen Gangam
>Priority: Critical
>
> When compiling against bleeding edge versions of Hive and Hadoop, we 
> discovered that HIVE-16047 references a private HDFS API, DFSClient, to get 
> at various encryption related information. The private API was recently 
> changed by HADOOP-14104, which broke Hive compilation.
> It'd be better to instead use publicly supported APIs. HDFS-11687 has been 
> filed to add whatever encryption APIs are needed by Hive. This JIRA is to 
> move Hive over to these new APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=458453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458453
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 04:42
Start Date: 14/Jul/20 04:42
Worklog Time Spent: 10m 
  Work Description: rbalamohan edited a comment on pull request #1250:
URL: https://github.com/apache/hive/pull/1250#issuecomment-657962495


   Observed good improvement in Q67 and few other queries.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458453)
Time Spent: 0.5h  (was: 20m)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=458452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458452
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 04:42
Start Date: 14/Jul/20 04:42
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on pull request #1250:
URL: https://github.com/apache/hive/pull/1250#issuecomment-657962495


   Observed good improvement in Q67.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458452)
Time Spent: 20m  (was: 10m)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23843:
--
Labels: pull-request-available  (was: )

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=458451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458451
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 04:34
Start Date: 14/Jul/20 04:34
Worklog Time Spent: 10m 
  Work Description: rbalamohan opened a new pull request #1250:
URL: https://github.com/apache/hive/pull/1250


   https://issues.apache.org/jira/browse/HIVE-23843
   
   TestVectorGroupByOperator has been modified to run with and without 
hive.vectorized.groupby.agg.enable.lrucache.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458451)
Remaining Estimate: 0h
Time Spent: 10m

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=458449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458449
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 04:28
Start Date: 14/Jul/20 04:28
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1231:
URL: https://github.com/apache/hive/pull/1231#discussion_r454092186



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -409,26 +409,54 @@ private boolean 
removeRSInsertedByEnforceBucketing(FileSinkOperator fsOp) {
   // and grand child
   if (found) {
 Operator rsParent = 
rsToRemove.getParentOperators().get(0);
-Operator rsChild = 
rsToRemove.getChildOperators().get(0);
-Operator rsGrandChild = 
rsChild.getChildOperators().get(0);
-
-if (rsChild instanceof SelectOperator) {
-  // if schema size cannot be matched, then it could be because of 
constant folding
-  // converting partition column expression to constant expression. 
The constant
-  // expression will then get pruned by column pruner since it will 
not reference to
-  // any columns.
-  if (rsParent.getSchema().getSignature().size() !=
-  rsChild.getSchema().getSignature().size()) {
+List> rsChildren = 
rsToRemove.getChildOperators();
+
+Operator rsChildToRemove = null;
+
+for (Operator rsChild : rsChildren) {

Review comment:
   In which case would we have a RS with multiple children? I thought this 
would never happen. Can we leave a comment explaining it? Otherwise, we should 
add a Precondition with number of children 1.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458449)
Time Spent: 0.5h  (was: 20m)

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=458448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458448
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 04:27
Start Date: 14/Jul/20 04:27
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1231:
URL: https://github.com/apache/hive/pull/1231#discussion_r454092186



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -409,26 +409,54 @@ private boolean 
removeRSInsertedByEnforceBucketing(FileSinkOperator fsOp) {
   // and grand child
   if (found) {
 Operator rsParent = 
rsToRemove.getParentOperators().get(0);
-Operator rsChild = 
rsToRemove.getChildOperators().get(0);
-Operator rsGrandChild = 
rsChild.getChildOperators().get(0);
-
-if (rsChild instanceof SelectOperator) {
-  // if schema size cannot be matched, then it could be because of 
constant folding
-  // converting partition column expression to constant expression. 
The constant
-  // expression will then get pruned by column pruner since it will 
not reference to
-  // any columns.
-  if (rsParent.getSchema().getSignature().size() !=
-  rsChild.getSchema().getSignature().size()) {
+List> rsChildren = 
rsToRemove.getChildOperators();
+
+Operator rsChildToRemove = null;
+
+for (Operator rsChild : rsChildren) {

Review comment:
   In which case would we have a RS with multiple children? Can we leave a 
comment explaining it? Otherwise, we should add a Precondition with number of 
children 1.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458448)
Time Spent: 20m  (was: 10m)

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23807) Wrong results with vectorization enabled

2020-07-13 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157135#comment-17157135
 ] 

Jesus Camacho Rodriguez commented on HIVE-23807:


+1

> Wrong results with vectorization enabled
> 
>
> Key: HIVE-23807
> URL: https://issues.apache.org/jira/browse/HIVE-23807
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.3.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: compatibility, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
> CREATE TABLE `test13`(
>   `portfolio_valuation_date` string,
>   `price_cut_off_datetime` string,
>   `portfolio_id_valuation_source` string,
>   `contributor_full_path` string,
>   `position_market_value` double,
>   `mandate_name` string)
> STORED AS ORC;
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.26,  "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.33,   "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.03,  "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.16,   "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.08,   "foo");
> set hive.fetch.task.conversion=none;
> set hive.explain.user=false;
> set hive.vectorized.execution.enabled=false;
> select Cast(`test13`.`price_cut_off_datetime` AS date) from test13; <-- 
> produces NULL
> set hive.vectorized.execution.enabled=true;
> select Cast(`test13`.`price_cut_off_datetime` AS date) from test13; <-- 
> produces non-null values
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-13 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-23843:
---

Assignee: Rajesh Balamohan

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-13 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23835:

Attachment: HIVE-23835.01.patch

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23835.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?focusedWorklogId=458440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458440
 ]

ASF GitHub Bot logged work on HIVE-23835:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 03:43
Start Date: 14/Jul/20 03:43
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1249:
URL: https://github.com/apache/hive/pull/1249


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458440)
Remaining Estimate: 0h
Time Spent: 10m

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23835:
--
Labels: pull-request-available  (was: )

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-13 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23835:

Status: Patch Available  (was: In Progress)

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-13 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23835 started by Pravin Sinha.
---
> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23432) Add Ranger Replication Metrics

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23432:
--
Labels: pull-request-available  (was: )

> Add Ranger Replication Metrics 
> ---
>
> Key: HIVE-23432
> URL: https://issues.apache.org/jira/browse/HIVE-23432
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23432.01.patch, HIVE-23432.02.patch, 
> HIVE-23432.03.patch, HIVE-23432.04.patch, HIVE-23432.05.patch, 
> HIVE-23432.06.patch, HIVE-23432.07.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?focusedWorklogId=458392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458392
 ]

ASF GitHub Bot logged work on HIVE-23438:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 00:32
Start Date: 14/Jul/20 00:32
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1014:
URL: https://github.com/apache/hive/pull/1014#issuecomment-657898549


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458392)
Remaining Estimate: 0h
Time Spent: 10m

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.001.branch-2.3.patch, 
> HIVE-23438.branch-2.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into 

[jira] [Work logged] (HIVE-23432) Add Ranger Replication Metrics

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23432?focusedWorklogId=458393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458393
 ]

ASF GitHub Bot logged work on HIVE-23432:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 00:32
Start Date: 14/Jul/20 00:32
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1015:
URL: https://github.com/apache/hive/pull/1015#issuecomment-657898546


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458393)
Remaining Estimate: 0h
Time Spent: 10m

> Add Ranger Replication Metrics 
> ---
>
> Key: HIVE-23432
> URL: https://issues.apache.org/jira/browse/HIVE-23432
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23432.01.patch, HIVE-23432.02.patch, 
> HIVE-23432.03.patch, HIVE-23432.04.patch, HIVE-23432.05.patch, 
> HIVE-23432.06.patch, HIVE-23432.07.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23438:
--
Labels: pull-request-available  (was: )

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23438.001.branch-2.3.patch, 
> HIVE-23438.branch-2.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=458388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458388
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 00:23
Start Date: 14/Jul/20 00:23
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1205:
URL: https://github.com/apache/hive/pull/1205#issuecomment-657896053


   @kgyrtkirk could you please take another look at the changes? thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458388)
Time Spent: 1h 50m  (was: 1h 40m)

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=458387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458387
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 00:19
Start Date: 14/Jul/20 00:19
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1149:
URL: https://github.com/apache/hive/pull/1149


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458387)
Time Spent: 2h 50m  (was: 2h 40m)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=458386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458386
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 00:18
Start Date: 14/Jul/20 00:18
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1242:
URL: https://github.com/apache/hive/pull/1242#issuecomment-657894768


   The failed tests seem unrelated...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458386)
Time Spent: 1h  (was: 50m)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1104)
>  ~[hive-exec-2.
> 

[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=458383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458383
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 00:16
Start Date: 14/Jul/20 00:16
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1242:
URL: https://github.com/apache/hive/pull/1242


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458383)
Time Spent: 40m  (was: 0.5h)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1104)
>  ~[hive-exec-2.
> 3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=458384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458384
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 00:16
Start Date: 14/Jul/20 00:16
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1242:
URL: https://github.com/apache/hive/pull/1242


   …et to true
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458384)
Time Spent: 50m  (was: 40m)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.NullPointerException
> at 
> 

[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23841:
--
Labels: patch-available pull-request-available  (was: patch-available)

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>  Labels: patch-available, pull-request-available
> Attachments: HIVE-23841.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I also submitted a pull request on github at:
>  
> [https://github.com/apache/hive/pull/1248]
>  
> (same patch)
> h1. Description
>  
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}} are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion. This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively.
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.[]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?focusedWorklogId=458379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458379
 ]

ASF GitHub Bot logged work on HIVE-23841:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 23:54
Start Date: 13/Jul/20 23:54
Worklog Time Spent: 10m 
  Work Description: adriannistor commented on pull request #1248:
URL: https://github.com/apache/hive/pull/1248#issuecomment-657887992


   I see this failure:
   
   
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1248/1/tests
   
   about `testExecuteStatementParallel – 
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService`
   
   But somehow this failure / test does not seem related to the change made.
   
   Let me know if you have any suggestions (e.g., maybe the fix needs 2 locks, 
both `INSTANCE`, which is really `this` and `lock` --- seems improbable).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458379)
Remaining Estimate: 0h
Time Spent: 10m

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>  Labels: patch-available
> Attachments: HIVE-23841.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I also submitted a pull request on github at:
>  
> [https://github.com/apache/hive/pull/1248]
>  
> (same patch)
> h1. Description
>  
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}} are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion. This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively.
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.[]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?focusedWorklogId=458380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458380
 ]

ASF GitHub Bot logged work on HIVE-23841:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 23:55
Start Date: 13/Jul/20 23:55
Worklog Time Spent: 10m 
  Work Description: adriannistor edited a comment on pull request #1248:
URL: https://github.com/apache/hive/pull/1248#issuecomment-657887992


   I see this failure:
   
   
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1248/1/tests
   
   about `testExecuteStatementParallel – 
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService`
   
   But somehow this failure / test does not seem related to the change made.
   
   Let me know if you have any suggestions (e.g., maybe the fix needs 2 locks, 
both `INSTANCE`, which is really `this` and `lock` --- though seems improbable).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458380)
Time Spent: 20m  (was: 10m)

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>  Labels: patch-available, pull-request-available
> Attachments: HIVE-23841.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I also submitted a pull request on github at:
>  
> [https://github.com/apache/hive/pull/1248]
>  
> (same patch)
> h1. Description
>  
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}} are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion. This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively.
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.[]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=458343=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458343
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 22:03
Start Date: 13/Jul/20 22:03
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1242:
URL: https://github.com/apache/hive/pull/1242


   …et to true
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458343)
Time Spent: 0.5h  (was: 20m)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.NullPointerException
> at 
> 

[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=458341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458341
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 22:02
Start Date: 13/Jul/20 22:02
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1242:
URL: https://github.com/apache/hive/pull/1242


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458341)
Time Spent: 20m  (was: 10m)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1104)
>  ~[hive-exec-2.
> 3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread Adrian Nistor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Nistor updated HIVE-23841:
-
Description: 
I also submitted a pull request on github at:

 

[https://github.com/apache/hive/pull/1248]

 

(same patch)
h1. Description

 

Field {{writers}} is a {{HashSet}} ([line 
70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
 i.e., not thread-safe.

Accesses to field {{writers}} are protected by synchronization on {{lock}}, 
e.g., at lines: 
[141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
 
[212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
 and 
[212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].

However, the {{writers.remove()}} at [line 
249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
 is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.

Synchronizing on 2 different objects does not ensure mutual exclusion. This is 
because 2 threads synchronizing on different objects can still execute in 
parallel at the same time.

Note that lines 
[215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
 and 
[249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
 are modifying {{writers}} with {{put()}} and {{remove()}}, respectively.
h1. The Code for This Fix

This fix is very simple: just change {{synchronized (INSTANCE)}} to 
{{synchronized (lock)}}, just like the methods containing the other lines 
listed above.[]

  was:
Field {{writers}} is a {{HashSet}} ([line 
70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
 i.e., not thread-safe.

Accesses to field {{writers}}  are protected by synchronization on {{lock}}, 
e.g., at lines: 
[141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
 
[212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
 and 
[212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].

However, the {{writers.remove()}} at  [line 
249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
 is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.

Synchronizing on 2 different objects does not ensure mutual exclusion.  This is 
because 2 threads synchronizing on different objects can still execute in 
parallel at the same time.

Note that lines 
[215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
 and 
[249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
 are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. 


h1. The Code for This Fix

This fix is very simple: just change {{synchronized (INSTANCE)}} to 
{{synchronized (lock)}}, just like the methods containing the other lines 
listed above.



> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>  Labels: patch-available
> Attachments: HIVE-23841.patch
>
>
> I also submitted a pull request on github at:
>  
> [https://github.com/apache/hive/pull/1248]
>  
> (same patch)
> h1. Description
>  
> Field {{writers}} is a {{HashSet}} ([line 
> 

[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread Adrian Nistor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Nistor updated HIVE-23841:
-
Attachment: HIVE-23841.patch
Labels: patch-available  (was: )
Status: Patch Available  (was: Open)

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>  Labels: patch-available
> Attachments: HIVE-23841.patch
>
>
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}}  are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at  [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion.  This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. 
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread Adrian Nistor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Nistor updated HIVE-23841:
-
Status: Open  (was: Patch Available)

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}}  are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at  [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion.  This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. 
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread Adrian Nistor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Nistor updated HIVE-23841:
-
Attachment: HIVE-23841.patch
Status: Patch Available  (was: Open)

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}}  are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at  [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion.  This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. 
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread Adrian Nistor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Nistor updated HIVE-23841:
-
Attachment: (was: HIVE-23841.patch)

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}}  are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at  [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion.  This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. 
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread Adrian Nistor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Nistor updated HIVE-23841:
-
Attachment: (was: HIVE-23841.patch)

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
>
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}}  are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at  [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion.  This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. 
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread Adrian Nistor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Nistor updated HIVE-23841:
-
Flags: Patch

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
> Attachments: HIVE-23841.patch
>
>
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}}  are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at  [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion.  This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. 
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.

2020-07-13 Thread Adrian Nistor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Nistor updated HIVE-23841:
-
Attachment: HIVE-23841.patch

> Field writers is an HashSet, i.e., not thread-safe.  Field writers is 
> typically protected by synchronization on lock, but not in 1 location.
> 
>
> Key: HIVE-23841
> URL: https://issues.apache.org/jira/browse/HIVE-23841
> Project: Hive
>  Issue Type: Bug
> Environment: Any environment
>Reporter: Adrian Nistor
>Priority: Major
> Attachments: HIVE-23841.patch
>
>
> Field {{writers}} is a {{HashSet}} ([line 
> 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]),
>  i.e., not thread-safe.
> Accesses to field {{writers}}  are protected by synchronization on {{lock}}, 
> e.g., at lines: 
> [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144],
>  
> [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213],
>  and 
> [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215].
> However, the {{writers.remove()}} at  [line 
> 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}.
> Synchronizing on 2 different objects does not ensure mutual exclusion.  This 
> is because 2 threads synchronizing on different objects can still execute in 
> parallel at the same time.
> Note that lines 
> [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215]
>  and 
> [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249]
>  are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. 
> h1. The Code for This Fix
> This fix is very simple: just change {{synchronized (INSTANCE)}} to 
> {{synchronized (lock)}}, just like the methods containing the other lines 
> listed above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23807) Wrong results with vectorization enabled

2020-07-13 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156960#comment-17156960
 ] 

Vineet Garg commented on HIVE-23807:


[~jcamachorodriguez] Can you take a look at the request (#1234). I have 
verified that the tests which are failing aren't related (I used 
http://ci.hive.apache.org/job/hive-precommit/job/branch-2/lastCompletedBuild/testReport/
 as baseline and re-ran rest of the failures locally).

> Wrong results with vectorization enabled
> 
>
> Key: HIVE-23807
> URL: https://issues.apache.org/jira/browse/HIVE-23807
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.3.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: compatibility, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
> CREATE TABLE `test13`(
>   `portfolio_valuation_date` string,
>   `price_cut_off_datetime` string,
>   `portfolio_id_valuation_source` string,
>   `contributor_full_path` string,
>   `position_market_value` double,
>   `mandate_name` string)
> STORED AS ORC;
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.26,  "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.33,   "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.03,  "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.16,   "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.08,   "foo");
> set hive.fetch.task.conversion=none;
> set hive.explain.user=false;
> set hive.vectorized.execution.enabled=false;
> select Cast(`test13`.`price_cut_off_datetime` AS date) from test13; <-- 
> produces NULL
> set hive.vectorized.execution.enabled=true;
> select Cast(`test13`.`price_cut_off_datetime` AS date) from test13; <-- 
> produces non-null values
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23808) "MSCK REPAIR.. DROP Partitions fail" with kryo Exception

2020-07-13 Thread Antal Sinkovits (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156919#comment-17156919
 ] 

Antal Sinkovits commented on HIVE-23808:


[~srahman] this was reproducible on master.

> "MSCK REPAIR.. DROP Partitions fail" with kryo Exception 
> -
>
> Key: HIVE-23808
> URL: https://issues.apache.org/jira/browse/HIVE-23808
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.2.0
>Reporter: Rajkumar Singh
>Assignee: Antal Sinkovits
>Priority: Major
>
> Steps to the repo:
> 1. Create External partition table
> 2. Remove some partition manually be using hdfs dfs -rm command
> 3. run "MSCK REPAIR.. DROP Partitions" and it will fail with following 
> exception
> {code:java}
> 2020-07-06 10:42:11,434 WARN  
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork:
>  [HiveServer2-Background-Pool: Thread-210]: Exception thrown while processing 
> using a batch size 2
> org.apache.hadoop.hive.metastore.utils.MetastoreException: 
> MetaException(message:Index: 117, Size: 0)
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:479) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:432) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork.run(RetryUtilities.java:91)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.Msck.dropPartitionsInBatches(Msck.java:496) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck.repair(Msck.java:223) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.ddl.misc.msck.MsckOperation.execute(MsckOperation.java:74)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at java.security.AccessController.doPrivileged(Native Method) 
> [?:1.8.0_242]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_242]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  [hadoop-common-3.1.1.7.1.1.0-565.jar:?]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_242]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_242]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> 

[jira] [Comment Edited] (HIVE-22224) Support Parquet-Avro Timestamp Type

2020-07-13 Thread satish (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156902#comment-17156902
 ] 

satish edited comment on HIVE-4 at 7/13/20, 6:28 PM:
-

[~bdscheller] [~cdmikechen] checking to see if you guys have any suggestions to 
try


was (Author: satishkotha):
[~bdscheller] [~cdmikechen] checking to see if this has been addressed 

> Support Parquet-Avro Timestamp Type
> ---
>
> Key: HIVE-4
> URL: https://issues.apache.org/jira/browse/HIVE-4
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 2.3.5, 2.3.6
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: parquet, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When user create an external table and import a parquet-avro data with 1.8.2 
> version which supported logical_type in Hive2.3 or before version, Hive can 
> not read timestamp type column data correctly.
> Hive will read it as LongWritable which it actually stores as 
> long(logical_type=timestamp-millis).So we may add some codes in 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.java
>  to let Hive cast long type to timestamp type.
> Some code like below:
>  
> public Timestamp getPrimitiveJavaObject(Object o) {
>   if (o instanceof LongWritable) {
>     return new Timestamp(((LongWritable) o).get());
>   }
>   return o == null ? null : ((TimestampWritable) o).getTimestamp();
> }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22224) Support Parquet-Avro Timestamp Type

2020-07-13 Thread satish (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156902#comment-17156902
 ] 

satish commented on HIVE-4:
---

[~bdscheller][~cdmikechen] checking to see if this has been addressed 

> Support Parquet-Avro Timestamp Type
> ---
>
> Key: HIVE-4
> URL: https://issues.apache.org/jira/browse/HIVE-4
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 2.3.5, 2.3.6
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: parquet, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When user create an external table and import a parquet-avro data with 1.8.2 
> version which supported logical_type in Hive2.3 or before version, Hive can 
> not read timestamp type column data correctly.
> Hive will read it as LongWritable which it actually stores as 
> long(logical_type=timestamp-millis).So we may add some codes in 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.java
>  to let Hive cast long type to timestamp type.
> Some code like below:
>  
> public Timestamp getPrimitiveJavaObject(Object o) {
>   if (o instanceof LongWritable) {
>     return new Timestamp(((LongWritable) o).get());
>   }
>   return o == null ? null : ((TimestampWritable) o).getTimestamp();
> }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22224) Support Parquet-Avro Timestamp Type

2020-07-13 Thread satish (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156902#comment-17156902
 ] 

satish edited comment on HIVE-4 at 7/13/20, 6:28 PM:
-

[~bdscheller] [~cdmikechen] checking to see if this has been addressed 


was (Author: satishkotha):
[~bdscheller][~cdmikechen] checking to see if this has been addressed 

> Support Parquet-Avro Timestamp Type
> ---
>
> Key: HIVE-4
> URL: https://issues.apache.org/jira/browse/HIVE-4
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 2.3.5, 2.3.6
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: parquet, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When user create an external table and import a parquet-avro data with 1.8.2 
> version which supported logical_type in Hive2.3 or before version, Hive can 
> not read timestamp type column data correctly.
> Hive will read it as LongWritable which it actually stores as 
> long(logical_type=timestamp-millis).So we may add some codes in 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.java
>  to let Hive cast long type to timestamp type.
> Some code like below:
>  
> public Timestamp getPrimitiveJavaObject(Object o) {
>   if (o instanceof LongWritable) {
>     return new Timestamp(((LongWritable) o).get());
>   }
>   return o == null ? null : ((TimestampWritable) o).getTimestamp();
> }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=458184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458184
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 16:43
Start Date: 13/Jul/20 16:43
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1245:
URL: https://github.com/apache/hive/pull/1245#issuecomment-657667687


   1-2 months I'd say.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458184)
Time Spent: 1h 10m  (was: 1h)

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=458181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458181
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 16:27
Start Date: 13/Jul/20 16:27
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #1245:
URL: https://github.com/apache/hive/pull/1245#issuecomment-657660148


   @belugabehr  hold your horses, I'm still not there yet :D
   How long has this been failing, do you reckon?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458181)
Time Spent: 1h  (was: 50m)

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-07-13 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156806#comment-17156806
 ] 

Vineet Garg commented on HIVE-22782:


[~ashish-kumar-sharma] Thanks for volunteering. I have assigned the issue to 
you.

> Consolidate metastore call to fetch constraints
> ---
>
> Key: HIVE-22782
> URL: https://issues.apache.org/jira/browse/HIVE-22782
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Ashish Sharma
>Priority: Major
>
> Currently separate calls are made to metastore to fetch constraints like Pk, 
> fk, not null etc. Since planner always retrieve these constraints we should 
> retrieve all of them in one call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-07-13 Thread Vineet Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-22782:
--

Assignee: Ashish Sharma  (was: Vineet Garg)

> Consolidate metastore call to fetch constraints
> ---
>
> Key: HIVE-22782
> URL: https://issues.apache.org/jira/browse/HIVE-22782
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Ashish Sharma
>Priority: Major
>
> Currently separate calls are made to metastore to fetch constraints like Pk, 
> fk, not null etc. Since planner always retrieve these constraints we should 
> retrieve all of them in one call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23727:
--
Priority: Minor  (was: Major)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=458157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458157
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 15:39
Start Date: 13/Jul/20 15:39
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1245:
URL: https://github.com/apache/hive/pull/1245#issuecomment-657634118


   Thanks so much for getting this fixed!  Been tormenting me for a while now :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458157)
Time Spent: 50m  (was: 40m)

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?focusedWorklogId=458140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458140
 ]

ASF GitHub Bot logged work on HIVE-23737:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 15:25
Start Date: 13/Jul/20 15:25
Worklog Time Spent: 10m 
  Work Description: shameersss1 edited a comment on pull request #1195:
URL: https://github.com/apache/hive/pull/1195#issuecomment-657450269


   @t3rmin4t0r @prasanthj  Could you please take a review the PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458140)
Time Spent: 0.5h  (was: 20m)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23808) "MSCK REPAIR.. DROP Partitions fail" with kryo Exception

2020-07-13 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156778#comment-17156778
 ] 

Syed Shameerur Rahman edited comment on HIVE-23808 at 7/13/20, 3:18 PM:


In case of MSCK command MsckPartitionExpressionProxy is used instead of 
PartitionExpressionForMetastore so i guess there was never a problem with 
serialization/deserialization.


was (Author: srahman):
In case of MSCK command MsckPartitionExpressionProxy is used instead of 
PartitionExpressionForMetastore so i guess there was never a problem with 
serialization.

> "MSCK REPAIR.. DROP Partitions fail" with kryo Exception 
> -
>
> Key: HIVE-23808
> URL: https://issues.apache.org/jira/browse/HIVE-23808
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.2.0
>Reporter: Rajkumar Singh
>Assignee: Antal Sinkovits
>Priority: Major
>
> Steps to the repo:
> 1. Create External partition table
> 2. Remove some partition manually be using hdfs dfs -rm command
> 3. run "MSCK REPAIR.. DROP Partitions" and it will fail with following 
> exception
> {code:java}
> 2020-07-06 10:42:11,434 WARN  
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork:
>  [HiveServer2-Background-Pool: Thread-210]: Exception thrown while processing 
> using a batch size 2
> org.apache.hadoop.hive.metastore.utils.MetastoreException: 
> MetaException(message:Index: 117, Size: 0)
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:479) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:432) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork.run(RetryUtilities.java:91)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.Msck.dropPartitionsInBatches(Msck.java:496) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck.repair(Msck.java:223) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.ddl.misc.msck.MsckOperation.execute(MsckOperation.java:74)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at java.security.AccessController.doPrivileged(Native Method) 
> [?:1.8.0_242]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_242]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  [hadoop-common-3.1.1.7.1.1.0-565.jar:?]
> at 
> 

[jira] [Commented] (HIVE-23808) "MSCK REPAIR.. DROP Partitions fail" with kryo Exception

2020-07-13 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156778#comment-17156778
 ] 

Syed Shameerur Rahman commented on HIVE-23808:
--

In case of MSCK command MsckPartitionExpressionProxy is used instead of 
PartitionExpressionForMetastore so i guess there was never a problem with 
serialization.

> "MSCK REPAIR.. DROP Partitions fail" with kryo Exception 
> -
>
> Key: HIVE-23808
> URL: https://issues.apache.org/jira/browse/HIVE-23808
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.2.0
>Reporter: Rajkumar Singh
>Assignee: Antal Sinkovits
>Priority: Major
>
> Steps to the repo:
> 1. Create External partition table
> 2. Remove some partition manually be using hdfs dfs -rm command
> 3. run "MSCK REPAIR.. DROP Partitions" and it will fail with following 
> exception
> {code:java}
> 2020-07-06 10:42:11,434 WARN  
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork:
>  [HiveServer2-Background-Pool: Thread-210]: Exception thrown while processing 
> using a batch size 2
> org.apache.hadoop.hive.metastore.utils.MetastoreException: 
> MetaException(message:Index: 117, Size: 0)
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:479) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:432) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork.run(RetryUtilities.java:91)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.Msck.dropPartitionsInBatches(Msck.java:496) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck.repair(Msck.java:223) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.ddl.misc.msck.MsckOperation.execute(MsckOperation.java:74)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at java.security.AccessController.doPrivileged(Native Method) 
> [?:1.8.0_242]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_242]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  [hadoop-common-3.1.1.7.1.1.0-565.jar:?]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_242]
> at 

[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=458127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458127
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 15:12
Start Date: 13/Jul/20 15:12
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1241:
URL: https://github.com/apache/hive/pull/1241#discussion_r453722539



##
File path: standalone-metastore/metastore-server/pom.xml
##
@@ -726,8 +726,7 @@
 
   JDO
   false
-  
${basedir}/src/main/resources/datanucleus-log4j.properties
-  
+  
${basedir}/src/main/resources/datanucleus-log4j.properties

Review comment:
   Remove this change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458127)
Time Spent: 1h 10m  (was: 1h)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23808) "MSCK REPAIR.. DROP Partitions fail" with kryo Exception

2020-07-13 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits resolved HIVE-23808.

Resolution: Duplicate

> "MSCK REPAIR.. DROP Partitions fail" with kryo Exception 
> -
>
> Key: HIVE-23808
> URL: https://issues.apache.org/jira/browse/HIVE-23808
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.2.0
>Reporter: Rajkumar Singh
>Assignee: Antal Sinkovits
>Priority: Major
>
> Steps to the repo:
> 1. Create External partition table
> 2. Remove some partition manually be using hdfs dfs -rm command
> 3. run "MSCK REPAIR.. DROP Partitions" and it will fail with following 
> exception
> {code:java}
> 2020-07-06 10:42:11,434 WARN  
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork:
>  [HiveServer2-Background-Pool: Thread-210]: Exception thrown while processing 
> using a batch size 2
> org.apache.hadoop.hive.metastore.utils.MetastoreException: 
> MetaException(message:Index: 117, Size: 0)
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:479) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:432) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork.run(RetryUtilities.java:91)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.Msck.dropPartitionsInBatches(Msck.java:496) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck.repair(Msck.java:223) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.ddl.misc.msck.MsckOperation.execute(MsckOperation.java:74)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at java.security.AccessController.doPrivileged(Native Method) 
> [?:1.8.0_242]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_242]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  [hadoop-common-3.1.1.7.1.1.0-565.jar:?]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_242]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_242]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_242]
> at 

[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=458122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458122
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 15:05
Start Date: 13/Jul/20 15:05
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1241:
URL: https://github.com/apache/hive/pull/1241#issuecomment-657615065


   @kgyrtkirk I'm not sure what you mean by `java 8 + java 11 testing`.  If we 
go with JDK 11 for the build, it should be backwards compatible with Java 8 
(which is defined in the Maven POM file for Hive 3.x and master currently).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458122)
Time Spent: 1h  (was: 50m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=458120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458120
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 15:04
Start Date: 13/Jul/20 15:04
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1241:
URL: https://github.com/apache/hive/pull/1241#issuecomment-657614154


   @kgyrtkirk 
   
   Yup!  OpenJDK and Oracle are the only ones Cloudera supports.  For Java 11, 
it's OpenJDK 11 only.
   
   
https://docs.cloudera.com/cdp/latest/release-guide/topics/cdpdc-java-requirements.html



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458120)
Time Spent: 50m  (was: 40m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23808) "MSCK REPAIR.. DROP Partitions fail" with kryo Exception

2020-07-13 Thread Antal Sinkovits (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156772#comment-17156772
 ] 

Antal Sinkovits commented on HIVE-23808:


It seems that https://issues.apache.org/jira/browse/HIVE-22957 fixed this issue

> "MSCK REPAIR.. DROP Partitions fail" with kryo Exception 
> -
>
> Key: HIVE-23808
> URL: https://issues.apache.org/jira/browse/HIVE-23808
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.2.0
>Reporter: Rajkumar Singh
>Assignee: Antal Sinkovits
>Priority: Major
>
> Steps to the repo:
> 1. Create External partition table
> 2. Remove some partition manually be using hdfs dfs -rm command
> 3. run "MSCK REPAIR.. DROP Partitions" and it will fail with following 
> exception
> {code:java}
> 2020-07-06 10:42:11,434 WARN  
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork:
>  [HiveServer2-Background-Pool: Thread-210]: Exception thrown while processing 
> using a batch size 2
> org.apache.hadoop.hive.metastore.utils.MetastoreException: 
> MetaException(message:Index: 117, Size: 0)
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:479) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:432) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork.run(RetryUtilities.java:91)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.Msck.dropPartitionsInBatches(Msck.java:496) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck.repair(Msck.java:223) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.ddl.misc.msck.MsckOperation.execute(MsckOperation.java:74)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at java.security.AccessController.doPrivileged(Native Method) 
> [?:1.8.0_242]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_242]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  [hadoop-common-3.1.1.7.1.1.0-565.jar:?]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_242]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_242]
> at 
> 

[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=458117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458117
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 15:01
Start Date: 13/Jul/20 15:01
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1241:
URL: https://github.com/apache/hive/pull/1241#issuecomment-657612371


   how about we phasing in java 8 + java 11 testing at the same time?
   @belugabehr  would that make sense?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458117)
Time Spent: 40m  (was: 0.5h)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=458116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458116
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 15:00
Start Date: 13/Jul/20 15:00
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1241:
URL: https://github.com/apache/hive/pull/1241#issuecomment-657611666


   oh...I see kgyrtkirk/hive-dev-box#5 is needed to test this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458116)
Time Spent: 0.5h  (was: 20m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23839) Use LongAdder instead of AtomicLong

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23839?focusedWorklogId=458096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458096
 ]

ASF GitHub Bot logged work on HIVE-23839:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 14:41
Start Date: 13/Jul/20 14:41
Worklog Time Spent: 10m 
  Work Description: dai commented on pull request #1246:
URL: https://github.com/apache/hive/pull/1246#issuecomment-657601071


   @belugabehr Thank you for pointing that out.
If we need to use methods like `compareAndSet(a,b)` and 
`incrementAndGet()`, then AtomicLong would be preferable.
   These cases just update a sum for collecting statistics.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458096)
Time Spent: 50m  (was: 40m)

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HIVE-23839
> URL: https://issues.apache.org/jira/browse/HIVE-23839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dai Wenqing
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> LongAdder performs better than AtomicLong in high concurrent environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-13 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: So now whenever a new feature gets added in Tez, We can simply override 
that in LlapContainerLauncher.java and do the required changes in Shuffle 
Handler (LLAP) to support it.
cc: [~ashutoshc] [~jeagles])

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-13 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: Test failure *TestStatsUpdaterThread* looks flaky. Passed locally.)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-13 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: HIVE-23737.01.patch is the first cut WIP patch. Need to add tests around 
the feature and do some clean up of old code. 
[~rajesh.balamohan] [~prasanth_j] [~gopalv] Could you guys please share your 
thoughts on the this initial patch.)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?focusedWorklogId=458082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458082
 ]

ASF GitHub Bot logged work on HIVE-23832:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 14:28
Start Date: 13/Jul/20 14:28
Worklog Time Spent: 10m 
  Work Description: laszlopinter86 commented on a change in pull request 
#1243:
URL: https://github.com/apache/hive/pull/1243#discussion_r453690376



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactOperation.java
##
@@ -99,7 +99,7 @@ private void waitForCompactionToFinish(CompactionResponse 
resp) throws HiveExcep
 wait: while (true) {
   //double wait time until 5min
   waitTimeMs = waitTimeMs*2;
-  waitTimeMs = Math.max(waitTimeMs, waitTimeOut);
+  waitTimeMs = Math.min(waitTimeMs, waitTimeOut);

Review comment:
   I've got your point. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458082)
Time Spent: 1h  (was: 50m)

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-13 Thread Hui Huang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156746#comment-17156746
 ] 

Hui Huang commented on HIVE-20441:
--

[~dengzh] No preblem, Thanks for your contribution~

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1104)
>  ~[hive-exec-2.
> 3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1359)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.
> 3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?focusedWorklogId=458081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458081
 ]

ASF GitHub Bot logged work on HIVE-23832:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 14:24
Start Date: 13/Jul/20 14:24
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1243:
URL: https://github.com/apache/hive/pull/1243#discussion_r453686774



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactOperation.java
##
@@ -99,7 +99,7 @@ private void waitForCompactionToFinish(CompactionResponse 
resp) throws HiveExcep
 wait: while (true) {
   //double wait time until 5min
   waitTimeMs = waitTimeMs*2;
-  waitTimeMs = Math.max(waitTimeMs, waitTimeOut);
+  waitTimeMs = Math.min(waitTimeMs, waitTimeOut);

Review comment:
   hive.compactor.wait.timeout default is 30ms (5 min) - that's max, 
   start wait time = (waitTimeMs = 1000) X 2 = 2000ms;





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458081)
Time Spent: 50m  (was: 40m)

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?focusedWorklogId=458079=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458079
 ]

ASF GitHub Bot logged work on HIVE-23832:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 14:23
Start Date: 13/Jul/20 14:23
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1243:
URL: https://github.com/apache/hive/pull/1243#discussion_r453686774



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactOperation.java
##
@@ -99,7 +99,7 @@ private void waitForCompactionToFinish(CompactionResponse 
resp) throws HiveExcep
 wait: while (true) {
   //double wait time until 5min
   waitTimeMs = waitTimeMs*2;
-  waitTimeMs = Math.max(waitTimeMs, waitTimeOut);
+  waitTimeMs = Math.min(waitTimeMs, waitTimeOut);

Review comment:
   default is 30ms - 5 min





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458079)
Time Spent: 0.5h  (was: 20m)

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?focusedWorklogId=458080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458080
 ]

ASF GitHub Bot logged work on HIVE-23832:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 14:23
Start Date: 13/Jul/20 14:23
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1243:
URL: https://github.com/apache/hive/pull/1243#discussion_r453686774



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactOperation.java
##
@@ -99,7 +99,7 @@ private void waitForCompactionToFinish(CompactionResponse 
resp) throws HiveExcep
 wait: while (true) {
   //double wait time until 5min
   waitTimeMs = waitTimeMs*2;
-  waitTimeMs = Math.max(waitTimeMs, waitTimeOut);
+  waitTimeMs = Math.min(waitTimeMs, waitTimeOut);

Review comment:
   default is 30ms (5 min) that's max, min = waitTimeMs = 1000 X 2 = 
2000ms;





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458080)
Time Spent: 40m  (was: 0.5h)

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?focusedWorklogId=458072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458072
 ]

ASF GitHub Bot logged work on HIVE-23832:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 14:17
Start Date: 13/Jul/20 14:17
Worklog Time Spent: 10m 
  Work Description: laszlopinter86 commented on a change in pull request 
#1243:
URL: https://github.com/apache/hive/pull/1243#discussion_r453682925



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactOperation.java
##
@@ -99,7 +99,7 @@ private void waitForCompactionToFinish(CompactionResponse 
resp) throws HiveExcep
 wait: while (true) {
   //double wait time until 5min
   waitTimeMs = waitTimeMs*2;
-  waitTimeMs = Math.max(waitTimeMs, waitTimeOut);
+  waitTimeMs = Math.min(waitTimeMs, waitTimeOut);

Review comment:
   Why do we need this change? Per the documentation, the minimum wait 
timeout is 2000 ms





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458072)
Time Spent: 20m  (was: 10m)

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23839) Use LongAdder instead of AtomicLong

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23839?focusedWorklogId=458059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458059
 ]

ASF GitHub Bot logged work on HIVE-23839:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 13:56
Start Date: 13/Jul/20 13:56
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1246:
URL: https://github.com/apache/hive/pull/1246#issuecomment-657576657


   > This class is usually preferable to AtomicLong when multiple threads 
update a common sum that is used for purposes such as collecting statistics, 
not for fine-grained synchronization control. Under low update contention, the 
two classes have similar characteristics.
   
   Are these use cases, that have been modified, meet this criteria?
   
   
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/LongAdder.html



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458059)
Time Spent: 40m  (was: 0.5h)

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HIVE-23839
> URL: https://issues.apache.org/jira/browse/HIVE-23839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dai Wenqing
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LongAdder performs better than AtomicLong in high concurrent environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23839) Use LongAdder instead of AtomicLong

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23839?focusedWorklogId=458054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458054
 ]

ASF GitHub Bot logged work on HIVE-23839:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 13:42
Start Date: 13/Jul/20 13:42
Worklog Time Spent: 10m 
  Work Description: dai opened a new pull request #1246:
URL: https://github.com/apache/hive/pull/1246


   LongAdder performs better than AtomicLong in high concurrent environment
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458054)
Time Spent: 20m  (was: 10m)

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HIVE-23839
> URL: https://issues.apache.org/jira/browse/HIVE-23839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dai Wenqing
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LongAdder performs better than AtomicLong in high concurrent environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23839) Use LongAdder instead of AtomicLong

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23839?focusedWorklogId=458055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458055
 ]

ASF GitHub Bot logged work on HIVE-23839:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 13:42
Start Date: 13/Jul/20 13:42
Worklog Time Spent: 10m 
  Work Description: dai closed pull request #1246:
URL: https://github.com/apache/hive/pull/1246


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458055)
Time Spent: 0.5h  (was: 20m)

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HIVE-23839
> URL: https://issues.apache.org/jira/browse/HIVE-23839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dai Wenqing
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LongAdder performs better than AtomicLong in high concurrent environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-13 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156696#comment-17156696
 ] 

Naveen Gangam commented on HIVE-23804:
--

[~aditya-shah] Looking at the code in master, it does not appear this will be 
an issue in this branch. This line of code automatically defaults to the hive 
catalog, if the catalog name has not been specified.
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L8502

This is what I was referring to in my earlier comment when I said this might 
require a code fix to address this.

Is this problem reproducible on master? Thanks


> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.3.7
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804-1.patch, HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23840) Use LLAP to get orc metadata

2020-07-13 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-23840:
-


> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23474) Deny Repl Dump if the database is a target of replication

2020-07-13 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23474:
---
Attachment: HIVE-23474.01.patch
Status: Patch Available  (was: Open)

> Deny Repl Dump if the database is a target of replication
> -
>
> Key: HIVE-23474
> URL: https://issues.apache.org/jira/browse/HIVE-23474
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23474.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=458014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458014
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 12:23
Start Date: 13/Jul/20 12:23
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk closed pull request #1245:
URL: https://github.com/apache/hive/pull/1245


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458014)
Time Spent: 0.5h  (was: 20m)

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=458015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458015
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 12:23
Start Date: 13/Jul/20 12:23
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1245:
URL: https://github.com/apache/hive/pull/1245


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458015)
Time Spent: 40m  (was: 0.5h)

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23474) Deny Repl Dump if the database is a target of replication

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23474:
--
Labels: pull-request-available  (was: )

> Deny Repl Dump if the database is a target of replication
> -
>
> Key: HIVE-23474
> URL: https://issues.apache.org/jira/browse/HIVE-23474
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23839) Use LongAdder instead of AtomicLong

2020-07-13 Thread Dai Wenqing (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dai Wenqing updated HIVE-23839:
---
Description: LongAdder performs better than AtomicLong in high concurrent 
environment.  (was: For the thread safety of counting,LongAdder is a better 
choice than AtomicLong.)

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HIVE-23839
> URL: https://issues.apache.org/jira/browse/HIVE-23839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dai Wenqing
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LongAdder performs better than AtomicLong in high concurrent environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23474) Deny Repl Dump if the database is a target of replication

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23474?focusedWorklogId=458008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458008
 ]

ASF GitHub Bot logged work on HIVE-23474:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 12:14
Start Date: 13/Jul/20 12:14
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #1247:
URL: https://github.com/apache/hive/pull/1247


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458008)
Remaining Estimate: 0h
Time Spent: 10m

> Deny Repl Dump if the database is a target of replication
> -
>
> Key: HIVE-23474
> URL: https://issues.apache.org/jira/browse/HIVE-23474
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23324) Parallelise compaction directory cleaning process

2020-07-13 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23324 started by Adesh Kumar Rao.
--
> Parallelise compaction directory cleaning process
> -
>
> Key: HIVE-23324
> URL: https://issues.apache.org/jira/browse/HIVE-23324
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Adesh Kumar Rao
>Priority: Major
>
> Initiator processes the various compaction candidates in parallel, so we 
> could follow a similar approach in Cleaner where we currently clean the 
> directories sequentially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23839) Use LongAdder instead of AtomicLong

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23839?focusedWorklogId=458007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458007
 ]

ASF GitHub Bot logged work on HIVE-23839:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 12:09
Start Date: 13/Jul/20 12:09
Worklog Time Spent: 10m 
  Work Description: dai opened a new pull request #1246:
URL: https://github.com/apache/hive/pull/1246


   For the thread safety of counting,LongAdder is a better choice than 
AtomicLong.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458007)
Remaining Estimate: 0h
Time Spent: 10m

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HIVE-23839
> URL: https://issues.apache.org/jira/browse/HIVE-23839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dai Wenqing
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For the thread safety of counting,LongAdder is a better choice than 
> AtomicLong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23839) Use LongAdder instead of AtomicLong

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23839:
--
Labels: pull-request-available  (was: )

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HIVE-23839
> URL: https://issues.apache.org/jira/browse/HIVE-23839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dai Wenqing
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For the thread safety of counting,LongAdder is a better choice than 
> AtomicLong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=458001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458001
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 11:50
Start Date: 13/Jul/20 11:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1245:
URL: https://github.com/apache/hive/pull/1245#issuecomment-657514367


   Please include the flaky test jenkins job run in a comment.
   Thanks,
   Peter



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458001)
Time Spent: 20m  (was: 10m)

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23806) Avoid clearing column stat states in all partition in case schema is extended

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23806?focusedWorklogId=457999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457999
 ]

ASF GitHub Bot logged work on HIVE-23806:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 11:47
Start Date: 13/Jul/20 11:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1215:
URL: https://github.com/apache/hive/pull/1215#discussion_r453592530



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##
@@ -501,6 +502,28 @@ public static boolean areSameColumns(List 
oldCols, List p, 
List s) {
+if (p == s) {
+  return true;
+}
+if (p.size() > s.size()) {
+  return false;
+}
+Iterator itP = p.iterator();

Review comment:
   of course! :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457999)
Time Spent: 40m  (was: 0.5h)

> Avoid clearing column stat states in all partition in case schema is extended
> -
>
> Key: HIVE-23806
> URL: https://issues.apache.org/jira/browse/HIVE-23806
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> in case there are many partitions; adding a new column without cascade may 
> take a while - because we want to make sure in schema evolution cases that we 
> don't reuse stats later-on by mistake...
> however this is not neccessary in case the schema is extended



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23838:
--
Labels: pull-request-available  (was: )

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=457996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457996
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 11:45
Start Date: 13/Jul/20 11:45
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1245:
URL: https://github.com/apache/hive/pull/1245


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457996)
Remaining Estimate: 0h
Time Spent: 10m

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457995
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 11:42
Start Date: 13/Jul/20 11:42
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r453587433



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/InsertHandler.java
##
@@ -76,16 +77,29 @@ public void handle(Context withinContext) throws Exception {
 withinContext.hiveConf);
 Iterable files = eventMessage.getFiles();
 
+boolean copyAtLoad = 
withinContext.hiveConf.getBoolVar(HiveConf.ConfVars.REPL_DATA_COPY_LAZY);
+
 /*
   * Insert into/overwrite operation shall operate on one or more 
partitions or even partitions from multiple tables.
   * But, Insert event is generated for each partition to which the data is 
inserted.
   * So, qlPtns list will have only one entry.
  */
 Partition ptn = (null == qlPtns || qlPtns.isEmpty()) ? null : 
qlPtns.get(0);
 if (files != null) {
-  // encoded filename/checksum of files, write into _files
-  for (String file : files) {
-writeFileEntry(qlMdTable, ptn, file, withinContext);
+  if (copyAtLoad) {
+// encoded filename/checksum of files, write into _files
+Path dataPath = null;
+if ((null == qlPtns) || qlPtns.isEmpty()) {
+  dataPath = new Path(withinContext.eventRoot, 
EximUtil.DATA_PATH_NAME);
+} else {
+  dataPath = new Path(withinContext.eventRoot, EximUtil.DATA_PATH_NAME 
+ File.separator

Review comment:
   Will have to create multiple Path object and hence not preferring.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyWork.java
##
@@ -51,4 +56,20 @@ public Path getFullyQualifiedSourcePath() {
   public Path getFullyQualifiedTargetPath() {
 return fullyQualifiedTargetPath;
   }
+
+  @Override
+  public String convertToString() {
+StringBuilder objInStr = new StringBuilder();
+objInStr.append(fullyQualifiedSourcePath)
+.append(URI_SEPARATOR)
+.append(fullyQualifiedTargetPath);
+return objInStr.toString();
+  }
+
+  @Override
+  public void loadFromString(String objectInStr) {

Review comment:
   no

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/CommitTxnHandler.java
##
@@ -81,10 +88,13 @@ private void createDumpFile(Context withinContext, 
org.apache.hadoop.hive.ql.met
 withinContext.hiveConf);
 
 if ((null == qlPtns) || qlPtns.isEmpty()) {
-  writeDumpFiles(qlMdTable, null, fileListArray.get(0), withinContext);
+  Path dataPath = new Path(withinContext.eventRoot, 
EximUtil.DATA_PATH_NAME);
+  writeDumpFiles(qlMdTable, null, fileListArray.get(0), withinContext, 
dataPath);
 } else {
   for (int idx = 0; idx < qlPtns.size(); idx++) {
-writeDumpFiles(qlMdTable, qlPtns.get(idx), fileListArray.get(idx), 
withinContext);
+Path dataPath = new Path(withinContext.eventRoot, 
EximUtil.DATA_PATH_NAME + File.separator

Review comment:
   Will have to create multiple  Path object and hence not preferring.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457995)
Time Spent: 3h  (was: 2h 50m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-13 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-23838:



> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23837?focusedWorklogId=457991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457991
 ]

ASF GitHub Bot logged work on HIVE-23837:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 11:39
Start Date: 13/Jul/20 11:39
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1244:
URL: https://github.com/apache/hive/pull/1244


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457991)
Remaining Estimate: 0h
Time Spent: 10m

> HbaseStorageHandler is not configured properly when the FileSinkOperator is 
> the child of a MergeJoinWork
> 
>
> Key: HIVE-23837
> URL: https://issues.apache.org/jira/browse/HIVE-23837
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the FileSinkOperator's root operator is a MergeJoinWork the 
> HbaseStorageHandler.configureJobConf will never get called, and the execution 
> will miss the HBASE_AUTH_TOKEN and the hbase jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23837:
--
Labels: pull-request-available  (was: )

> HbaseStorageHandler is not configured properly when the FileSinkOperator is 
> the child of a MergeJoinWork
> 
>
> Key: HIVE-23837
> URL: https://issues.apache.org/jira/browse/HIVE-23837
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the FileSinkOperator's root operator is a MergeJoinWork the 
> HbaseStorageHandler.configureJobConf will never get called, and the execution 
> will miss the HBASE_AUTH_TOKEN and the hbase jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?focusedWorklogId=457992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457992
 ]

ASF GitHub Bot logged work on HIVE-23824:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 11:39
Start Date: 13/Jul/20 11:39
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1238:
URL: https://github.com/apache/hive/pull/1238#discussion_r453588505



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
##
@@ -575,31 +626,8 @@ private OrcFileMetadata getFileFooterFromCacheOrDisk() 
throws IOException {
   tailBuffers = metadataCache.getFileMetadata(fileKey);
   if (tailBuffers != null) {
 try {
-  MemoryBuffer tailBuffer = tailBuffers.getSingleBuffer();
-  ByteBuffer bb = null;
-  if (tailBuffer != null) {
-bb = tailBuffer.getByteBufferDup();
-// TODO: remove the copy after ORC-158 and ORC-197
-// if (bb.isDirect()) {
-  ByteBuffer dupBb = tailBuffer.getByteBufferDup(); // Don't mess 
with the cached object.
-  bb = ByteBuffer.allocate(dupBb.remaining());
-  bb.put(dupBb);
-  bb.flip();
-// }
-  } else {
-// TODO: add the ability to extractFileTail to read from multiple 
buffers?
-MemoryBuffer[] tailBufferArray = tailBuffers.getMultipleBuffers();
-int totalSize = 0;
-for (MemoryBuffer buf : tailBufferArray) {
-  totalSize += buf.getByteBufferRaw().remaining();
-}
-bb = ByteBuffer.allocate(totalSize);
-for (MemoryBuffer buf : tailBufferArray) {
-  bb.put(buf.getByteBufferDup());
-}
-bb.flip();
-  }
-  OrcTail orcTail = ReaderImpl.extractFileTail(bb);
+  OrcTail orcTail = getOrcTailFromLlapBuffers(tailBuffers);

Review comment:
   Yes, it won't be evicted as it is locked. The getFileMetadata invocation 
will incRef it (if it is found of course)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457992)
Time Spent: 50m  (was: 40m)

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork

2020-07-13 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-23837:
--


> HbaseStorageHandler is not configured properly when the FileSinkOperator is 
> the child of a MergeJoinWork
> 
>
> Key: HIVE-23837
> URL: https://issues.apache.org/jira/browse/HIVE-23837
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> If the FileSinkOperator's root operator is a MergeJoinWork the 
> HbaseStorageHandler.configureJobConf will never get called, and the execution 
> will miss the HBASE_AUTH_TOKEN and the hbase jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-13 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156640#comment-17156640
 ] 

Zhihua Deng commented on HIVE-23727:


[~belugabehr],  [~kgyrtkirk] could you please take a look at this? 

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-07-13 Thread Ashish Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156633#comment-17156633
 ] 

Ashish Sharma commented on HIVE-22782:
--

[~vgarg]  the Jira has been idle for quite some time. If you are not working on 
it, Can I take over?

> Consolidate metastore call to fetch constraints
> ---
>
> Key: HIVE-22782
> URL: https://issues.apache.org/jira/browse/HIVE-22782
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> Currently separate calls are made to metastore to fetch constraints like Pk, 
> fk, not null etc. Since planner always retrieve these constraints we should 
> retrieve all of them in one call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22015) [CachedStore] Add table constraints in CachedStore

2020-07-13 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-22015:

Component/s: Standalone Metastore

> [CachedStore] Add table constraints in CachedStore
> --
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-22015) [CachedStore] Add table constraints in CachedStore

2020-07-13 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-22015.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master.
Thanks [~adeshrao] for the patch!

> [CachedStore] Add table constraints in CachedStore
> --
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?focusedWorklogId=457969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457969
 ]

ASF GitHub Bot logged work on HIVE-23824:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 10:34
Start Date: 13/Jul/20 10:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1238:
URL: https://github.com/apache/hive/pull/1238#discussion_r453555463



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
##
@@ -575,31 +626,8 @@ private OrcFileMetadata getFileFooterFromCacheOrDisk() 
throws IOException {
   tailBuffers = metadataCache.getFileMetadata(fileKey);
   if (tailBuffers != null) {
 try {
-  MemoryBuffer tailBuffer = tailBuffers.getSingleBuffer();
-  ByteBuffer bb = null;
-  if (tailBuffer != null) {
-bb = tailBuffer.getByteBufferDup();
-// TODO: remove the copy after ORC-158 and ORC-197
-// if (bb.isDirect()) {
-  ByteBuffer dupBb = tailBuffer.getByteBufferDup(); // Don't mess 
with the cached object.
-  bb = ByteBuffer.allocate(dupBb.remaining());
-  bb.put(dupBb);
-  bb.flip();
-// }
-  } else {
-// TODO: add the ability to extractFileTail to read from multiple 
buffers?
-MemoryBuffer[] tailBufferArray = tailBuffers.getMultipleBuffers();
-int totalSize = 0;
-for (MemoryBuffer buf : tailBufferArray) {
-  totalSize += buf.getByteBufferRaw().remaining();
-}
-bb = ByteBuffer.allocate(totalSize);
-for (MemoryBuffer buf : tailBufferArray) {
-  bb.put(buf.getByteBufferDup());
-}
-bb.flip();
-  }
-  OrcTail orcTail = ReaderImpl.extractFileTail(bb);
+  OrcTail orcTail = getOrcTailFromLlapBuffers(tailBuffers);

Review comment:
   What happens if the tail is evicted from the cache in the meantime 
(after the check, but before this line). Or it is locked during this code?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457969)
Time Spent: 0.5h  (was: 20m)

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?focusedWorklogId=457970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457970
 ]

ASF GitHub Bot logged work on HIVE-23824:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 10:34
Start Date: 13/Jul/20 10:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1238:
URL: https://github.com/apache/hive/pull/1238#discussion_r453556055



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
##
@@ -575,31 +626,8 @@ private OrcFileMetadata getFileFooterFromCacheOrDisk() 
throws IOException {
   tailBuffers = metadataCache.getFileMetadata(fileKey);
   if (tailBuffers != null) {
 try {
-  MemoryBuffer tailBuffer = tailBuffers.getSingleBuffer();
-  ByteBuffer bb = null;
-  if (tailBuffer != null) {
-bb = tailBuffer.getByteBufferDup();
-// TODO: remove the copy after ORC-158 and ORC-197
-// if (bb.isDirect()) {
-  ByteBuffer dupBb = tailBuffer.getByteBufferDup(); // Don't mess 
with the cached object.
-  bb = ByteBuffer.allocate(dupBb.remaining());
-  bb.put(dupBb);
-  bb.flip();
-// }
-  } else {
-// TODO: add the ability to extractFileTail to read from multiple 
buffers?
-MemoryBuffer[] tailBufferArray = tailBuffers.getMultipleBuffers();
-int totalSize = 0;
-for (MemoryBuffer buf : tailBufferArray) {
-  totalSize += buf.getByteBufferRaw().remaining();
-}
-bb = ByteBuffer.allocate(totalSize);
-for (MemoryBuffer buf : tailBufferArray) {
-  bb.put(buf.getByteBufferDup());
-}
-bb.flip();
-  }
-  OrcTail orcTail = ReaderImpl.extractFileTail(bb);
+  OrcTail orcTail = getOrcTailFromLlapBuffers(tailBuffers);

Review comment:
   Oh, I see now... it is decRefBuffer...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457970)
Time Spent: 40m  (was: 0.5h)

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Add table constraints in CachedStore

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457967
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 10:29
Start Date: 13/Jul/20 10:29
Worklog Time Spent: 10m 
  Work Description: sankarh merged pull request #1109:
URL: https://github.com/apache/hive/pull/1109


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457967)
Time Spent: 4h 40m  (was: 4.5h)

> [CachedStore] Add table constraints in CachedStore
> --
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22015) [CachedStore] Add table constraints in CachedStore

2020-07-13 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-22015:

Summary: [CachedStore] Add table constraints in CachedStore  (was: 
[CachedStore] Cache table constraints in CachedStore)

> [CachedStore] Add table constraints in CachedStore
> --
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?focusedWorklogId=457963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457963
 ]

ASF GitHub Bot logged work on HIVE-23824:
-

Author: ASF GitHub Bot
Created on: 13/Jul/20 10:27
Start Date: 13/Jul/20 10:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1238:
URL: https://github.com/apache/hive/pull/1238#discussion_r453552118



##
File path: llap-client/src/java/org/apache/hadoop/hive/llap/io/api/LlapIo.java
##
@@ -35,6 +41,15 @@
*/
   long purge();
 
+  /**
+   * Returns a deserialized OrcTail instance associated with the ORC file on 
the given path.
+   *  Raw content is either obtained from cache, or from disk if there is a 
cache miss.
+   * @param path Orc file path
+   * @param conf jobConf

Review comment:
   Missing javadoc for cachetag





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457963)
Time Spent: 20m  (was: 10m)

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >