date:20210419

[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585575
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 05:58
Start Date: 20/Apr/21 05:58
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616364689



##
File path: 
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt
##
@@ -34,20 +34,17 @@ public class  extends VectorExpression {
 
   private static final long serialVersionUID = 1L;
 
-  private final int colNum1;
   private final int colNum2;

Review comment:
   It will be better to have the two input columns together in the same 
place. But it is a good idea to move this to parent class and share the 
assigning code for all subclasses. Can we introduce two subclasses 
VectorUnaryExpression and VectorBinaryExpression( or similar name) for 
VectorExpression and then extend other classes from either one of these 
classes. That way we can also avoid passing -1 from classes that do not use the 
inputcolumn




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585575)
Time Spent: 50m  (was: 40m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585574
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 05:57
Start Date: 20/Apr/21 05:57
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616365379



##
File path: ql/src/test/results/clientpositive/llap/windowing_udaf.q.out
##
@@ -503,7 +503,7 @@ alice brown 25.2587496
 alice brown25.5293748
 alice brown25.63012987012987
 alice brown26.472439024390237
-alice brown27.100638297872322
+alice brown27.27881720430106

Review comment:
   Can we have a comment in the PR review of why this is changed after this 
patch




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585574)
Time Spent: 40m  (was: 0.5h)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585572
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 05:55
Start Date: 20/Apr/21 05:55
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616364689



##
File path: 
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt
##
@@ -34,20 +34,17 @@ public class  extends VectorExpression {
 
   private static final long serialVersionUID = 1L;
 
-  private final int colNum1;
   private final int colNum2;

Review comment:
   It will be better to have the two input columns together in the same 
place. But it is a good idea to move this to parent class and share the 
assigning code for all subclasses. Can we introduce two subclasses 
VectorUnaryExpression and VectorBinaryExpression( or similar name) for 
VectorExpression and then extend other classes from either one of these 
classes. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585572)
Time Spent: 0.5h  (was: 20m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585567
 ]

ASF GitHub Bot logged work on HIVE-25026:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 05:43
Start Date: 20/Apr/21 05:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2189:
URL: https://github.com/apache/hive/pull/2189#issuecomment-822989511


   > hi @kgyrtkirk @pvary
   > Please help check why the split18-postprocess failed
   
   @zhangheihei: You can follow the `Details` link next to the failed CI run, 
and that will send you a page where you can see the test details. On the top 
right of the page you have a `Tests` link where you can see the test results.
   
   In this specific case, you have a failure with the `TestNegativeCliDriver`, 
likely a flaky test. I would push a minimal commit without a real change, to 
retrigger the CI again. 
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585567)
Time Spent: 40m  (was: 0.5h)

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-25026.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=585563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585563
 ]

ASF GitHub Bot logged work on HIVE-24705:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 05:29
Start Date: 20/Apr/21 05:29
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on a change in pull 
request #1960:
URL: https://github.com/apache/hive/pull/1960#discussion_r616354694



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13707,14 +13709,27 @@ ASTNode analyzeCreateTable(
 
   /** Adds entities for create table/create view. */
   private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type,
-  boolean isTemporary, Map tblProps) throws 
SemanticException {
+  boolean isTemporary, Map tblProps, StorageFormat 
storageFormat) throws SemanticException {
 Database database  = getDatabase(qualifiedTabName[0]);
 outputs.add(new WriteEntity(database, WriteEntity.WriteType.DDL_SHARED));
 
 Table t = new Table(qualifiedTabName[0], qualifiedTabName[1]);
 t.setParameters(tblProps);
 t.setTableType(type);
 t.setTemporary(isTemporary);
+HiveStorageHandler storageHandler = null;
+if(storageFormat.getStorageHandler() != null) {
+  try {
+storageHandler = (HiveStorageHandler) ReflectionUtils.newInstance(
+conf.getClassByName(storageFormat.getStorageHandler()), 
SessionState.get().getConf());
+  } catch (ClassNotFoundException ex) {
+System.out.println("Class not found. Storage handler will be set to 
null: " + ex);

Review comment:
   Sorry, I was printing this to STDOUT for debugging during development. I 
forgot to change this to log before pushing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585563)
Time Spent: 1h 20m  (was: 1h 10m)

> Create/Alter/Drop tables based on storage handlers in HS2 should be 
> authorized by Ranger/Sentry
> ---
>
> Key: HIVE-24705
> URL: https://issues.apache.org/jira/browse/HIVE-24705
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With doAs=false in Hive3.x, whenever a user is trying to create a table based 
> on storage handlers on external storage for ex: HBase table, the end user we 
> are seeing is hive so we cannot really enforce the condition in Apache 
> Ranger/Sentry on the end-user. So, we need to enforce this condition in the 
> hive in the event of create/alter/drop tables based on storage handlers.
> Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler 
> e.t.c should implement a method getURIForAuthentication() which returns a URI 
> that is formed from table properties. This URI can be sent for authorization 
> to Ranger/Sentry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585550
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 04:26
Start Date: 20/Apr/21 04:26
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616334377



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
##
@@ -3005,6 +3003,19 @@ private boolean validatePTFOperator(PTFOperator op, 
VectorizationContext vContex
   }
 }
   }
+  if (vectorPTFDesc.getOrderExprNodeDescs().length > 1) {
+/*
+ * Currently, we need to rule out here all cases where a range 
boundary scanner can run,
+ * basically: 1. bounded start 2. bounded end which is not current row
+ */
+if (windowFrameDef.getWindowType() == WindowType.RANGE
+&& (!windowFrameDef.isStartUnbounded() || 
!windowFrameDef.getEnd().isCurrentRow())) {

Review comment:
   I am not much aware of the range boundary scanner, but do we have any 
issue vectorizing UNBOUNDED FOLLOWING? Can you help me understand how does 
range boundary scanner affects the vectorization




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585550)
Time Spent: 20m  (was: 10m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585537
 ]

ASF GitHub Bot logged work on HIVE-25026:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 03:21
Start Date: 20/Apr/21 03:21
Worklog Time Spent: 10m 
  Work Description: zhangheihei edited a comment on pull request #2189:
URL: https://github.com/apache/hive/pull/2189#issuecomment-822936070


   hi @kgyrtkirk  @pvary 
   Please help check why the split18-postprocess failed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585537)
Time Spent: 0.5h  (was: 20m)

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-25026.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585533
 ]

ASF GitHub Bot logged work on HIVE-25026:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 02:58
Start Date: 20/Apr/21 02:58
Worklog Time Spent: 10m 
  Work Description: zhangheihei commented on pull request #2189:
URL: https://github.com/apache/hive/pull/2189#issuecomment-822936070


   hi @kgyrtkirk.
   Please help check why the split18-postprocess failed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585533)
Time Spent: 20m  (was: 10m)

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-25026.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24909) Skip the repl events from getting logged in notification log

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24909?focusedWorklogId=585509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585509
 ]

ASF GitHub Bot logged work on HIVE-24909:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 01:30
Start Date: 20/Apr/21 01:30
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2101:
URL: https://github.com/apache/hive/pull/2101#discussion_r616280662



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -8582,7 +8582,8 @@ public GetOpenTxnsInfoResponse get_open_txns_info() 
throws TException {
   public OpenTxnsResponse open_txns(OpenTxnRequest rqst) throws TException {
 OpenTxnsResponse response = getTxnHandler().openTxns(rqst);
 List txnIds = response.getTxn_ids();
-if (txnIds != null && listeners != null && !listeners.isEmpty()) {
+boolean isHiveReplTxn = rqst.isSetReplPolicy() && rqst.getTxn_type() == 
TxnType.DEFAULT;

Review comment:
   rqst variable can be of different types (OpenTxnRequest/ 
CommitTxnRequest/ AbortTxnRequest). So, this statement may not be generalised.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585509)
Time Spent: 5h 20m  (was: 5h 10m)

> Skip the repl events from getting logged in notification log
> 
>
> Key: HIVE-24909
> URL: https://issues.apache.org/jira/browse/HIVE-24909
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently REPL dump events are logged and replicated as a part of replication 
> policy. Whenever one replication cycle completed, we always have one 
> transaction left open on the target corresponding to repl dump operation. 
> This will never be caught up without manually dealing with the transaction on 
> target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25005) Provide default implementation for HMS APIs

2021-04-19 Thread Vihang Karajgaonkar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-25005.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Provide default implementation for HMS APIs 
> 
>
> Key: HIVE-25005
> URL: https://issues.apache.org/jira/browse/HIVE-25005
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If there is a remote cache that implements HMS APIs, it would be useful to 
> have default implementation for all the APIs, so that any new HMS API will 
> not break the build for the remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25005) Provide default implementation for HMS APIs

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25005?focusedWorklogId=585398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585398
 ]

ASF GitHub Bot logged work on HIVE-25005:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 22:08
Start Date: 19/Apr/21 22:08
Worklog Time Spent: 10m 
  Work Description: vihangk1 merged pull request #2171:
URL: https://github.com/apache/hive/pull/2171


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585398)
Time Spent: 50m  (was: 40m)

> Provide default implementation for HMS APIs 
> 
>
> Key: HIVE-25005
> URL: https://issues.apache.org/jira/browse/HIVE-25005
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If there is a remote cache that implements HMS APIs, it would be useful to 
> have default implementation for all the APIs, so that any new HMS API will 
> not break the build for the remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25005) Provide default implementation for HMS APIs

2021-04-19 Thread Vihang Karajgaonkar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325344#comment-17325344
 ] 

Vihang Karajgaonkar commented on HIVE-25005:


Patch was merged into master. Thanks [~kishendas]!

> Provide default implementation for HMS APIs 
> 
>
> Key: HIVE-25005
> URL: https://issues.apache.org/jira/browse/HIVE-25005
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If there is a remote cache that implements HMS APIs, it would be useful to 
> have default implementation for all the APIs, so that any new HMS API will 
> not break the build for the remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25005) Provide default implementation for HMS APIs

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25005?focusedWorklogId=585396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585396
 ]

ASF GitHub Bot logged work on HIVE-25005:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 22:07
Start Date: 19/Apr/21 22:07
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on pull request #2171:
URL: https://github.com/apache/hive/pull/2171#issuecomment-822817878


   The test failures are unrelated since this patch doesn't really make 
functional changes other than adding a new abstract class. The test worked in 
the previous iteration of the patch which only had a formatting change. Kishen 
created  https://issues.apache.org/jira/browse/HIVE-25030 and 
https://issues.apache.org/jira/browse/HIVE-25031 to track the failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585396)
Time Spent: 40m  (was: 0.5h)

> Provide default implementation for HMS APIs 
> 
>
> Key: HIVE-25005
> URL: https://issues.apache.org/jira/browse/HIVE-25005
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If there is a remote cache that implements HMS APIs, it would be useful to 
> have default implementation for all the APIs, so that any new HMS API will 
> not break the build for the remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24957?focusedWorklogId=585381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585381
 ]

ASF GitHub Bot logged work on HIVE-24957:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 21:39
Start Date: 19/Apr/21 21:39
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #2186:
URL: https://github.com/apache/hive/pull/2186


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585381)
Time Spent: 0.5h  (was: 20m)

> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24957?focusedWorklogId=585380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585380
 ]

ASF GitHub Bot logged work on HIVE-24957:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 21:39
Start Date: 19/Apr/21 21:39
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2186:
URL: https://github.com/apache/hive/pull/2186#issuecomment-822803363


   Close and reopen to trigger checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585380)
Time Spent: 20m  (was: 10m)

> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24957?focusedWorklogId=585382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585382
 ]

ASF GitHub Bot logged work on HIVE-24957:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 21:39
Start Date: 19/Apr/21 21:39
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #2186:
URL: https://github.com/apache/hive/pull/2186


   ### What changes were proposed in this pull request and why?
   Check commit messages for HIVE-24999 and HIVE-24957.
   
   ### Does this PR introduce _any_ user-facing change?
   Plan changes when using explain (normally more efficient plans) and correct 
query results.
   
   ### How was this patch tested?
   Via existing tests:
   ```
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex="subquery.*"
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex="masking.*"
   mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver
   ```
   Via newly added tests:
   ```
   mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile="subquery_complex_correlation_predicates.q"
   mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile="subquery_in_invalid_intermediate_plan.q" -Dcalcite.debug
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585382)
Time Spent: 40m  (was: 0.5h)

> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25028) Hive: Select query with IS operator producing unexpected result

2021-04-19 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25028:
---
Reporter: Manthan B Y  (was: Soumyakanti Das)

> Hive: Select query with IS operator producing unexpected result
> ---
>
> Key: HIVE-25028
> URL: https://issues.apache.org/jira/browse/HIVE-25028
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>
> Hive: Select query with IS operator is producing unexpected result.
> The following was executed on postgres:
> {code:java}
> sqlancer=# create table if not exists emp(name text, age int);
> CREATE TABLE
> sqlancer=# insert into emp values ('a', 5), ('b', 15), ('c', 12);
> INSERT 0 3
> sqlancer=# select emp.age from emp where emp.age > 10;
>  age
> -
>   15
>   12
> (2 rows)sqlancer=# select emp.age > 10 is true from emp;
>  ?column?
> --
>  f
>  t
>  t
> (3 rows){code}
> This is happening because IS operator has higher precedence than comparison 
> operators in Hive. In most other databases, comparison operator has higher 
> precedence. The grammar needs to be changed to fix the precedence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=585329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585329
 ]

ASF GitHub Bot logged work on HIVE-24705:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 19:09
Start Date: 19/Apr/21 19:09
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1960:
URL: https://github.com/apache/hive/pull/1960#discussion_r610056943



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13707,14 +13709,27 @@ ASTNode analyzeCreateTable(
 
   /** Adds entities for create table/create view. */
   private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type,
-  boolean isTemporary, Map tblProps) throws 
SemanticException {
+  boolean isTemporary, Map tblProps, StorageFormat 
storageFormat) throws SemanticException {
 Database database  = getDatabase(qualifiedTabName[0]);
 outputs.add(new WriteEntity(database, WriteEntity.WriteType.DDL_SHARED));
 
 Table t = new Table(qualifiedTabName[0], qualifiedTabName[1]);
 t.setParameters(tblProps);
 t.setTableType(type);
 t.setTemporary(isTemporary);
+HiveStorageHandler storageHandler = null;
+if(storageFormat.getStorageHandler() != null) {
+  try {
+storageHandler = (HiveStorageHandler) ReflectionUtils.newInstance(
+conf.getClassByName(storageFormat.getStorageHandler()), 
SessionState.get().getConf());
+  } catch (ClassNotFoundException ex) {
+System.out.println("Class not found. Storage handler will be set to 
null: " + ex);

Review comment:
   Please remove System.out.println and use a log handler instead. or throw 
an exception if we should not continue.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/command/CommandAuthorizerV2.java
##
@@ -185,6 +191,33 @@ private static void addHivePrivObject(Entity privObject, 
Map tableProperties = new HashMap<>();
+  Configuration conf = new Configuration();
+  tableProperties.putAll(table.getSd().getSerdeInfo().getParameters());
+  tableProperties.putAll(table.getParameters());
+  try {
+if(table.getStorageHandler() instanceof 
HiveStorageAuthorizationHandler){
+  HiveStorageAuthorizationHandler authorizationHandler = 
(HiveStorageAuthorizationHandler) ReflectionUtils.newInstance(
+  
conf.getClassByName(table.getStorageHandler().getClass().getName()), 
SessionState.get().getConf());
+  storageuri = 
authorizationHandler.getURIForAuth(tableProperties).toString();
+}else{
+  //Custom storage handler that has not implemented the 
HiveStorageAuthorizationHandler
+  storageuri = 
table.getStorageHandler().getClass().getName()+"://"+
+  
HiveCustomStorageHandlerUtils.getTablePropsForCustomStorageHandler(tableProperties);
+}
+  }catch(Exception ex){
+ex.printStackTrace();

Review comment:
   can you either log this exception to the log file if this is concerning 
to the user or ignore it if it not an issue, with a comment instead of 
printStackTrace() ? This goes to STDOUT and not to the server log.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageAuthorizationHandler.java
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.metadata;
+
+import org.apache.hadoop.hive.common.classification.InterfaceAudience;
+import org.apache.hadoop.hive.common.classification.InterfaceStability;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Map;
+
+/**
+ * HiveStorageAuthorizationHandler defines a pluggable interface for
+ * authorization of storage based tables in Hive. A Storage authorization
+ * handler consists of a bundle of the following:
+ *
+ *
+ *getURI
+ *
+ *
+ * Storage authorization handler classes are plugged in using the STORED BY 
'classname'
+ * clause in CREATE TABLE.
+ */

[jira] [Work logged] (HIVE-25005) Provide default implementation for HMS APIs

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25005?focusedWorklogId=585327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585327
 ]

ASF GitHub Bot logged work on HIVE-25005:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 19:06
Start Date: 19/Apr/21 19:06
Worklog Time Spent: 10m 
  Work Description: kishendas commented on pull request #2171:
URL: https://github.com/apache/hive/pull/2171#issuecomment-822710757


   Created tickets for the flaky tests :   
https://issues.apache.org/jira/browse/HIVE-25030 and 
https://issues.apache.org/jira/browse/HIVE-25031 . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585327)
Time Spent: 0.5h  (was: 20m)

> Provide default implementation for HMS APIs 
> 
>
> Key: HIVE-25005
> URL: https://issues.apache.org/jira/browse/HIVE-25005
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If there is a remote cache that implements HMS APIs, it would be useful to 
> have default implementation for all the APIs, so that any new HMS API will 
> not break the build for the remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23756) drop table command fails with MySQLIntegrityConstraintViolationException:

2021-04-19 Thread Naveen Gangam (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325194#comment-17325194
 ] 

Naveen Gangam commented on HIVE-23756:
--

+1 on this patch.

> drop table command fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23756.1.patch
>
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table ([Ref|#L60]]). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25017) Fix response in GetLatestCommittedCompaction

2021-04-19 Thread Yu-Wen Lai (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325149#comment-17325149
 ] 

Yu-Wen Lai commented on HIVE-25017:
---

[~klcopp] Thank you for reviewing!

> Fix response in GetLatestCommittedCompaction
> 
>
> Key: HIVE-25017
> URL: https://issues.apache.org/jira/browse/HIVE-25017
> Project: Hive
>  Issue Type: Bug
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Dbname and Tablename are required for CompactionInfoStruct but the response 
> of getLatestCommittedCompactionInfo is not setting them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25010) Create qtest-iceberg module

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25010:
--
Labels: pull-request-available  (was: )

> Create qtest-iceberg module
> ---
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should create a qtest-iceberg module under itests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create qtest-iceberg module

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=585187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585187
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 15:05
Start Date: 19/Apr/21 15:05
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2193:
URL: https://github.com/apache/hive/pull/2193


   
   
   ### What changes were proposed in this pull request?
   New q test module to run Iceberg specific q tests. 
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585187)
Remaining Estimate: 0h
Time Spent: 10m

> Create qtest-iceberg module
> ---
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should create a qtest-iceberg module under itests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585161
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:20
Start Date: 19/Apr/21 14:20
Worklog Time Spent: 10m 
  Work Description: marton-bod edited a comment on pull request #2161:
URL: https://github.com/apache/hive/pull/2161#issuecomment-822504138


   This is roughly what the changes would need to look like once we have the 
new Tez version released:
   https://github.com/marton-bod/hive/pull/1
   (using the new Tez API instead of the listing)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585161)
Time Spent: 0.5h  (was: 20m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585160
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:19
Start Date: 19/Apr/21 14:19
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2161:
URL: https://github.com/apache/hive/pull/2161#issuecomment-822504138


   This is roughly what the changes would need to look like once we have the 
new Tez version released:
   https://github.com/marton-bod/hive/pull/1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585160)
Time Spent: 20m  (was: 10m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25029) Remove travis builds

2021-04-19 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325039#comment-17325039
 ] 

Zoltan Haindrich commented on HIVE-25029:
-

if we want to keep something similar - we can access a lot more horsepower thru 
github actions

> Remove travis builds
> 
>
> Key: HIVE-25029
> URL: https://issues.apache.org/jira/browse/HIVE-25029
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> travis only compiles the project - we already do much more than that during 
> precommit testing.
> (and it it sometimes delays build because travis cant allocate executors/etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25029) Remove travis builds

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25029?focusedWorklogId=585124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585124
 ]

ASF GitHub Bot logged work on HIVE-25029:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 13:33
Start Date: 19/Apr/21 13:33
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #2192:
URL: https://github.com/apache/hive/pull/2192


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585124)
Remaining Estimate: 0h
Time Spent: 10m

> Remove travis builds
> 
>
> Key: HIVE-25029
> URL: https://issues.apache.org/jira/browse/HIVE-25029
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> travis only compiles the project - we already do much more than that during 
> precommit testing.
> (and it it sometimes delays build because travis cant allocate executors/etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25029) Remove travis builds

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25029:
--
Labels: pull-request-available  (was: )

> Remove travis builds
> 
>
> Key: HIVE-25029
> URL: https://issues.apache.org/jira/browse/HIVE-25029
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> travis only compiles the project - we already do much more than that during 
> precommit testing.
> (and it it sometimes delays build because travis cant allocate executors/etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25029) Remove travis builds

2021-04-19 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25029:
---


> Remove travis builds
> 
>
> Key: HIVE-25029
> URL: https://issues.apache.org/jira/browse/HIVE-25029
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> travis only compiles the project - we already do much more than that during 
> precommit testing.
> (and it it sometimes delays build because travis cant allocate executors/etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-17059) Hive Runtime Error while processing row (tag=0)

2021-04-19 Thread Dhanooj (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-17059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325032#comment-17325032
 ] 

Dhanooj commented on HIVE-17059:


Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
cast to org.apache.hadoop.io.LongWritable
at

 

--it seems data mapping issue

> Hive Runtime Error while processing row (tag=0)
> ---
>
> Key: HIVE-17059
> URL: https://issues.apache.org/jira/browse/HIVE-17059
> Project: Hive
>  Issue Type: Bug
>Reporter: wenjie.yu
>Priority: Major
>
> I run the sql looks like below in HIVE  and got error:Hive Runtime Error 
> while processing row (tag=0)
> *QUERY:*
> select
>   dt as d_date
>   .. -- group by columns 
>   ,min(epoch_time) as min_epoch_time
>   ,count(*) as cnt
> from
>   DB.target_Table
> where
>   dt = '20170705'
> group by
>   dt
>   .. -- group by columns 
> *ERROR:*
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"_col0":"20170705","_col1":"-2144668477","_col2":"4724a50e-9238-4146-9394-a076acd41836","_col3":"client://plusApp/pdtil_Purchase","_col4":"924","_col5":"61","_col6":"app","_col7":"A","_col8":"comic","_col9":"4724a50e-9238-4146-9394-a076acd41836","_col10":"46f572ce-ed86-4c2a-bf7a-db2171654144"},"value":{"_col0":"46f572ce-ed86-4c2a-bf7a-db2171654144","_col1":"1499186628.124"}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{"_col0":"20170705","_col1":"-2144668477","_col2":"4724a50e-9238-4146-9394-a076acd41836","_col3":"client://plusApp/pdtil_Purchase","_col4":"924","_col5":"61","_col6":"app","_col7":"A","_col8":"comic","_col9":"4724a50e-9238-4146-9394-a076acd41836","_col10":"46f572ce-ed86-4c2a-bf7a-db2171654144"},"value":{"_col0":"46f572ce-ed86-4c2a-bf7a-db2171654144","_col1":"1499186628.124"}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
> ... 7 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.LongWritable
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:763)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
> ... 7 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
> cast to org.apache.hadoop.io.LongWritable
> at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:150)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:609)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:848)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:692)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:758)
> ... 8 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> {color:#8eb021}1, The table is big one. {color}
> 2, Hive Runtime Error while processing row (tag=0) 
> {"key":{"_col0":"20170705","_col1":"-2144668477","_col2":"4724a50e-9238-4146-9394-a076acd41836","_col3":"client://plusApp/pdtil_Purchase","_col4":"924","_col5":"61","_col6":"app","_col7":"A","_col8":"comic","_col9":"4724a50e-9238-4146-9394-a076acd41836","_col10":"46f572ce-ed86-4c2a-bf7a-db2171654144"},"value":{*"_col0":"46f572ce-ed86-4c2a-bf7a-db2171654144"*,"_col1":"1499186628.124"}}
>   i think col0 of Value should be a number because it is a count for group by 
> query.
>   but the value:46f572ce-ed86-4c2a-bf7a-db2171654144 looks be a value of 
> column(adid?)
> I don't know why. 
> CDH-5.8.0-1.cdh5.8.0.p0.42



--

[jira] [Updated] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24920:
--
Labels: pull-request-available  (was: )

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24920?focusedWorklogId=585115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585115
 ]

ASF GitHub Bot logged work on HIVE-24920:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 13:11
Start Date: 19/Apr/21 13:11
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #2191:
URL: https://github.com/apache/hive/pull/2191


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585115)
Remaining Estimate: 0h
Time Spent: 10m

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24577) Task resubmission bug

2021-04-19 Thread hezhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated HIVE-24577:
---
Attachment: HIVE-24577.patch

> Task resubmission bug
> -
>
> Key: HIVE-24577
> URL: https://issues.apache.org/jira/browse/HIVE-24577
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
> Environment: hive-2.3.4
>Reporter: guojh
>Assignee: hezhang
>Priority: Major
> Fix For: 2.3.8
>
> Attachments: HIVE-24577.patch
>
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), tasks submit to yarn with parallel. If the jobs completed 
> simultaneously, then Their children task may submit more than ones.
> In our production cluster, we have a query with the stage dependencies is 
> below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1, Stage-10, Stage-14
>   Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5
>   Stage-4
>   Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
>   Stage-3
>   Stage-5
>   Stage-6 depends on stages: Stage-5
>   Stage-18 is a root stage
>   Stage-9 depends on stages: Stage-18
>   Stage-10 depends on stages: Stage-9
>   Stage-19 is a root stage
>   Stage-13 depends on stages: Stage-19
>   Stage-14 depends on stages: Stage-13
> {code}
> There is a certain probability that Stage-10 and Stage-14 will complete at 
> the same time, then their children Stage-2 was submitted twice. As bellow log:
> {code:java}
> 2021-01-03T13:35:32,079  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Launching Job 6 out of 6
> 2021-01-03T13:35:32,080  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2021-01-03T13:35:32,082  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Launching Job 7 out of 6
> 2021-01-03T13:35:32,083  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24577) Task resubmission bug

2021-04-19 Thread hezhang (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325025#comment-17325025
 ] 

hezhang commented on HIVE-24577:


add patch to hive2.  

HIVE-25026 update for hive 3.

> Task resubmission bug
> -
>
> Key: HIVE-24577
> URL: https://issues.apache.org/jira/browse/HIVE-24577
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
> Environment: hive-2.3.4
>Reporter: guojh
>Assignee: hezhang
>Priority: Major
> Fix For: 2.3.8
>
> Attachments: HIVE-24577.patch
>
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), tasks submit to yarn with parallel. If the jobs completed 
> simultaneously, then Their children task may submit more than ones.
> In our production cluster, we have a query with the stage dependencies is 
> below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1, Stage-10, Stage-14
>   Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5
>   Stage-4
>   Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
>   Stage-3
>   Stage-5
>   Stage-6 depends on stages: Stage-5
>   Stage-18 is a root stage
>   Stage-9 depends on stages: Stage-18
>   Stage-10 depends on stages: Stage-9
>   Stage-19 is a root stage
>   Stage-13 depends on stages: Stage-19
>   Stage-14 depends on stages: Stage-13
> {code}
> There is a certain probability that Stage-10 and Stage-14 will complete at 
> the same time, then their children Stage-2 was submitted twice. As bellow log:
> {code:java}
> 2021-01-03T13:35:32,079  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Launching Job 6 out of 6
> 2021-01-03T13:35:32,080  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2021-01-03T13:35:32,082  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Launching Job 7 out of 6
> 2021-01-03T13:35:32,083  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24577) Task resubmission bug

2021-04-19 Thread hezhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated HIVE-24577:
---
Fix Version/s: 2.3.8

> Task resubmission bug
> -
>
> Key: HIVE-24577
> URL: https://issues.apache.org/jira/browse/HIVE-24577
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
> Environment: hive-2.3.4
>Reporter: guojh
>Assignee: hezhang
>Priority: Major
> Fix For: 2.3.8
>
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), tasks submit to yarn with parallel. If the jobs completed 
> simultaneously, then Their children task may submit more than ones.
> In our production cluster, we have a query with the stage dependencies is 
> below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1, Stage-10, Stage-14
>   Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5
>   Stage-4
>   Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
>   Stage-3
>   Stage-5
>   Stage-6 depends on stages: Stage-5
>   Stage-18 is a root stage
>   Stage-9 depends on stages: Stage-18
>   Stage-10 depends on stages: Stage-9
>   Stage-19 is a root stage
>   Stage-13 depends on stages: Stage-19
>   Stage-14 depends on stages: Stage-13
> {code}
> There is a certain probability that Stage-10 and Stage-14 will complete at 
> the same time, then their children Stage-2 was submitted twice. As bellow log:
> {code:java}
> 2021-01-03T13:35:32,079  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Launching Job 6 out of 6
> 2021-01-03T13:35:32,080  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2021-01-03T13:35:32,082  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Launching Job 7 out of 6
> 2021-01-03T13:35:32,083  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-19 Thread hezhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated HIVE-25026:
---
Attachment: HIVE-25026.patch

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-25026.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25028) Hive: Select query with IS operator producing unexpected result

2021-04-19 Thread Soumyakanti Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das reassigned HIVE-25028:
--


> Hive: Select query with IS operator producing unexpected result
> ---
>
> Key: HIVE-25028
> URL: https://issues.apache.org/jira/browse/HIVE-25028
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
> Hive: Select query with IS operator is producing unexpected result.
> The following was executed on postgres:
> {code:java}
> sqlancer=# create table if not exists emp(name text, age int);
> CREATE TABLE
> sqlancer=# insert into emp values ('a', 5), ('b', 15), ('c', 12);
> INSERT 0 3
> sqlancer=# select emp.age from emp where emp.age > 10;
>  age
> -
>   15
>   12
> (2 rows)sqlancer=# select emp.age > 10 is true from emp;
>  ?column?
> --
>  f
>  t
>  t
> (3 rows){code}
> This is happening because IS operator has higher precedence than comparison 
> operators in Hive. In most other databases, comparison operator has higher 
> precedence. The grammar needs to be changed to fix the precedence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25026:
--
Labels: pull-request-available  (was: )

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585094
 ]

ASF GitHub Bot logged work on HIVE-25026:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 12:10
Start Date: 19/Apr/21 12:10
Worklog Time Spent: 10m 
  Work Description: zhangheihei opened a new pull request #2189:
URL: https://github.com/apache/hive/pull/2189


   **Hive task job will gen duplicate data cause of same task resubmission**
   ```
   2021-04-05 06:05:52 CONSOLE# Number of reduce tasks is set to 0 since 
there's no reduce operator
   2021-04-05 06:05:52 CONSOLE# Launching Job 5 out of 4
   2021-04-05 06:05:52 CONSOLE# Number of reduce tasks is set to 0 since 
there's no reduce operator
   ```
   https://user-images.githubusercontent.com/13237066/115213523-2d945800-a134-11eb-94c3-52095c748283.png;
 width="300" height="300">
   For example,  hive sql explain 4 task. when hive.exec.parallel=true and 
task2/task3 is canExecuteInParallel,task4 will execute 2 times;
   
   1.  task1 is FINISHED, task2/task3 enter Runnable queue
   https://user-images.githubusercontent.com/13237066/115233371-65a69580-a14a-11eb-81fb-5a0c3582e3dc.png;
 width="400" height="150">
   2. task2/task3 is executed in parallel and ends at the same time. Now 
task2/task3 is FINISHED
   https://user-images.githubusercontent.com/13237066/115233876-06955080-a14b-11eb-9570-7334eff8dcad.png;
 width="400" height="150">
   3. task2 removed from running queue, task4 will enter runnable queue
   4. 
   4. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585094)
Remaining Estimate: 0h
Time Spent: 10m

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25002?focusedWorklogId=585067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585067
 ]

ASF GitHub Bot logged work on HIVE-25002:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 10:42
Start Date: 19/Apr/21 10:42
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2167:
URL: https://github.com/apache/hive/pull/2167#discussion_r615737849



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -211,7 +211,7 @@ private void stopWorkers() {
   }
 
   private List processOneTable(TableName fullTableName)
-  throws MetaException, NoSuchTxnException, NoSuchObjectException {
+  throws MetaException, NoSuchTxnException, NoSuchObjectException, 
TException {

Review comment:
   will check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585067)
Time Spent: 1h 20m  (was: 1h 10m)

> modify condition for target of replication in statsUpdaterThread and 
> PartitionManagementTask
> 
>
> Key: HIVE-25002
> URL: https://issues.apache.org/jira/browse/HIVE-25002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25002?focusedWorklogId=585063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585063
 ]

ASF GitHub Bot logged work on HIVE-25002:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 10:40
Start Date: 19/Apr/21 10:40
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2167:
URL: https://github.com/apache/hive/pull/2167#discussion_r615736521



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -211,7 +211,7 @@ private void stopWorkers() {
   }
 
   private List processOneTable(TableName fullTableName)
-  throws MetaException, NoSuchTxnException, NoSuchObjectException {
+  throws MetaException, NoSuchTxnException, NoSuchObjectException, 
TException {

Review comment:
   is the TException needed?>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585063)
Time Spent: 1h 10m  (was: 1h)

> modify condition for target of replication in statsUpdaterThread and 
> PartitionManagementTask
> 
>
> Key: HIVE-25002
> URL: https://issues.apache.org/jira/browse/HIVE-25002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25002?focusedWorklogId=585061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585061
 ]

ASF GitHub Bot logged work on HIVE-25002:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 10:40
Start Date: 19/Apr/21 10:40
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2167:
URL: https://github.com/apache/hive/pull/2167#discussion_r615736279



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -140,7 +140,7 @@
 
   public static final String RANGER_CONFIGURATION_RESOURCE_NAME = 
"ranger-hive-security.xml";
 
-  public static final String TARGET_OF_REPLICATION = "repl.target.for";
+  public static final String TARGET_OF_REPLICATION = 
ReplConst.TARGET_OF_REPLICATION;

Review comment:
   then use the same constant everywhere




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585061)
Time Spent: 1h  (was: 50m)

> modify condition for target of replication in statsUpdaterThread and 
> PartitionManagementTask
> 
>
> Key: HIVE-25002
> URL: https://issues.apache.org/jira/browse/HIVE-25002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25027) Hide Iceberg module behind a profile

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25027:
--
Labels: pull-request-available  (was: )

> Hide Iceberg module behind a profile
> 
>
> Key: HIVE-25027
> URL: https://issues.apache.org/jira/browse/HIVE-25027
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After creating {{patched-iceberg-core}} and {{patched-iceberg-api}} modules 
> the maven build works fine, but IntelliJ needs manual classpath setup for the 
> build in the IntelliJ to succeed.
> Most of the community does not use Iceberg and eventually the "patched" 
> modules will be removed as the Hive-Iceberg integration stabilizes and the 
> Iceberg project releases the changes we need. In the meantime we just hide 
> the whole {{Iceberg}} module behind a profile which is only used on the CI 
> and if the developer specifically sets it. 
> It could be used like"
> {code:java}
>  mvn clean install -DskipTests -Piceberg{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25027) Hide Iceberg module behind a profile

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25027?focusedWorklogId=585021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585021
 ]

ASF GitHub Bot logged work on HIVE-25027:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 10:14
Start Date: 19/Apr/21 10:14
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #2188:
URL: https://github.com/apache/hive/pull/2188


   ### What changes were proposed in this pull request?
   Hide Iceberg module behind a profile
   
   ### Why are the changes needed?
   After creating patched-iceberg-core and patched-iceberg-api modules the 
maven build works fine, but IntelliJ needs manual classpath setup for the build 
in the IntelliJ to succeed.
   
   Most of the community does not use Iceberg and eventually the "patched" 
modules will be removed as the Hive-Iceberg integration stabilizes and the 
Iceberg project releases the changes we need. In the meantime we just hide the 
whole Iceberg module behind a profile which is only used on the CI and if the 
developer specifically sets it. 
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Rebuilt the project in maven and in IntelliJ


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585021)
Remaining Estimate: 0h
Time Spent: 10m

> Hide Iceberg module behind a profile
> 
>
> Key: HIVE-25027
> URL: https://issues.apache.org/jira/browse/HIVE-25027
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After creating {{patched-iceberg-core}} and {{patched-iceberg-api}} modules 
> the maven build works fine, but IntelliJ needs manual classpath setup for the 
> build in the IntelliJ to succeed.
> Most of the community does not use Iceberg and eventually the "patched" 
> modules will be removed as the Hive-Iceberg integration stabilizes and the 
> Iceberg project releases the changes we need. In the meantime we just hide 
> the whole {{Iceberg}} module behind a profile which is only used on the CI 
> and if the developer specifically sets it. 
> It could be used like"
> {code:java}
>  mvn clean install -DskipTests -Piceberg{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25027) Hide Iceberg module behind a profile

2021-04-19 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-25027:
-


> Hide Iceberg module behind a profile
> 
>
> Key: HIVE-25027
> URL: https://issues.apache.org/jira/browse/HIVE-25027
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> After creating {{patched-iceberg-core}} and {{patched-iceberg-api}} modules 
> the maven build works fine, but IntelliJ needs manual classpath setup for the 
> build in the IntelliJ to succeed.
> Most of the community does not use Iceberg and eventually the "patched" 
> modules will be removed as the Hive-Iceberg integration stabilizes and the 
> Iceberg project releases the changes we need. In the meantime we just hide 
> the whole {{Iceberg}} module behind a profile which is only used on the CI 
> and if the developer specifically sets it. 
> It could be used like"
> {code:java}
>  mvn clean install -DskipTests -Piceberg{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=584981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584981
 ]

ASF GitHub Bot logged work on HIVE-24851:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 09:01
Start Date: 19/Apr/21 09:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2129:
URL: https://github.com/apache/hive/pull/2129#issuecomment-822301658


   Merged.
   
   Thanks for the fix and the work done to backport the change @losipiuk!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584981)
Time Spent: 8h  (was: 7h 50m)

> resources leak on exception in AvroGenericRecordReader constructor
> --
>
> Key: HIVE-24851
> URL: https://issues.apache.org/jira/browse/HIVE-24851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Lukasz Osipiuk
>Assignee: Lukasz Osipiuk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.3, 3.2.0, 4.0.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> AvroGenericRecordReader constructor creates an instance of FileReader but 
> lacks proper exception handling, and reader is not closed on the failure path.
> This results in leaking of underlying resources (e.g. S3 connections).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor

2021-04-19 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-24851:
--
Fix Version/s: 3.1.3

> resources leak on exception in AvroGenericRecordReader constructor
> --
>
> Key: HIVE-24851
> URL: https://issues.apache.org/jira/browse/HIVE-24851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Lukasz Osipiuk
>Assignee: Lukasz Osipiuk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.3, 3.2.0, 4.0.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> AvroGenericRecordReader constructor creates an instance of FileReader but 
> lacks proper exception handling, and reader is not closed on the failure path.
> This results in leaking of underlying resources (e.g. S3 connections).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=584976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584976
 ]

ASF GitHub Bot logged work on HIVE-24851:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 08:56
Start Date: 19/Apr/21 08:56
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2129:
URL: https://github.com/apache/hive/pull/2129


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584976)
Time Spent: 7h 50m  (was: 7h 40m)

> resources leak on exception in AvroGenericRecordReader constructor
> --
>
> Key: HIVE-24851
> URL: https://issues.apache.org/jira/browse/HIVE-24851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Lukasz Osipiuk
>Assignee: Lukasz Osipiuk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 4.0.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> AvroGenericRecordReader constructor creates an instance of FileReader but 
> lacks proper exception handling, and reader is not closed on the failure path.
> This results in leaking of underlying resources (e.g. S3 connections).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=584972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584972
 ]

ASF GitHub Bot logged work on HIVE-24851:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 08:47
Start Date: 19/Apr/21 08:47
Worklog Time Spent: 10m 
  Work Description: losipiuk commented on pull request #2129:
URL: https://github.com/apache/hive/pull/2129#issuecomment-822291962


   @pvary mergable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584972)
Time Spent: 7h 40m  (was: 7.5h)

> resources leak on exception in AvroGenericRecordReader constructor
> --
>
> Key: HIVE-24851
> URL: https://issues.apache.org/jira/browse/HIVE-24851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Lukasz Osipiuk
>Assignee: Lukasz Osipiuk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 4.0.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> AvroGenericRecordReader constructor creates an instance of FileReader but 
> lacks proper exception handling, and reader is not closed on the failure path.
> This results in leaking of underlying resources (e.g. S3 connections).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=584970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584970
 ]

ASF GitHub Bot logged work on HIVE-24851:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 08:46
Start Date: 19/Apr/21 08:46
Worklog Time Spent: 10m 
  Work Description: losipiuk commented on pull request #2129:
URL: https://github.com/apache/hive/pull/2129#issuecomment-822290914


   Based on 3 runs it looks like all the test failures are flakes. Same tests 
(which FWIW seem totally unrelated to the change) fail on on run and pass on 
the other.
   * (1) 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2129/1/tests/
 * Only preexisting failures
   * (2) 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2129/2/tests/
 * TestSlotZnode.testConcurrencyNoFallback failed (green on (1) and (3))
 * TestSlotZnode.testConcurrencyAndFallback failed (green on (1) and (3))
   * (3) 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2129/3/tests/
 * TestJdbcDriver2.testSelectExecAsync2 failed (was green on (2)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584970)
Time Spent: 7.5h  (was: 7h 20m)

> resources leak on exception in AvroGenericRecordReader constructor
> --
>
> Key: HIVE-24851
> URL: https://issues.apache.org/jira/browse/HIVE-24851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Lukasz Osipiuk
>Assignee: Lukasz Osipiuk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 4.0.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> AvroGenericRecordReader constructor creates an instance of FileReader but 
> lacks proper exception handling, and reader is not closed on the failure path.
> This results in leaking of underlying resources (e.g. S3 connections).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25019) Rename metrics that have spaces in the name

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25019?focusedWorklogId=584949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584949
 ]

ASF GitHub Bot logged work on HIVE-25019:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 08:14
Start Date: 19/Apr/21 08:14
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #2183:
URL: https://github.com/apache/hive/pull/2183


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584949)
Time Spent: 20m  (was: 10m)

> Rename metrics that have spaces in the name
> ---
>
> Key: HIVE-25019
> URL: https://issues.apache.org/jira/browse/HIVE-25019
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Metrics "num_compactions_ready for cleaning" and  "num_compactions_not 
> initiated" contain spaces.
> They should be renamed to "num_compactions_ready_for_cleaning" and 
> "num_compactions_not_initiated" respectively.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25019) Rename metrics that have spaces in the name

2021-04-19 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-25019.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master branch. Thanks for your contribution [~asinkovits]!

> Rename metrics that have spaces in the name
> ---
>
> Key: HIVE-25019
> URL: https://issues.apache.org/jira/browse/HIVE-25019
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Metrics "num_compactions_ready for cleaning" and  "num_compactions_not 
> initiated" contain spaces.
> They should be renamed to "num_compactions_ready_for_cleaning" and 
> "num_compactions_not_initiated" respectively.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25023) Optimize the operation of reading jar stream to avoid stream closed exception

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25023?focusedWorklogId=584948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584948
 ]

ASF GitHub Bot logged work on HIVE-25023:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 08:13
Start Date: 19/Apr/21 08:13
Worklog Time Spent: 10m 
  Work Description: dh20 commented on pull request #2185:
URL: https://github.com/apache/hive/pull/2185#issuecomment-822269434


   @sunchao hi,Sir, can you review it for me, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584948)
Time Spent: 20m  (was: 10m)

> Optimize the operation of reading jar stream to avoid stream closed exception
> -
>
> Key: HIVE-25023
> URL: https://issues.apache.org/jira/browse/HIVE-25023
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 3.1.2
>Reporter: hao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Optimize the operation of reading jar stream to avoid stream closed exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread WangHualei (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangHualei resolved HIVE-25025.
---
Resolution: Fixed

distcp method different from FileUtil.copy, because distcp method ignore parent 
directory, then may overwrite stats files,cause stats info lost.?

modify dst value, add parent directory to dst path, make distcp method run 
result as same as?FileUtil.copy.

> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>  Labels: patch, pull-request-available
> Attachments: HIVE-25025.patch
>
>   Original Estimate: 72h
>  Time Spent: 10m
>  Remaining Estimate: 71h 50m
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 
> example:
> set hive.exec.copyfile.maxsize=0;
>  set hive.exec.copyfile.maxnumfiles=0;
> insert overwrite table abc_new select * from abc;
> select count(1) from abc_new ;
> select * from abc_new ;
> then the  count(1) result will be 0, but select * will display real data, 
> because stats info lost.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread WangHualei (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangHualei updated HIVE-25025:
--
Status: In Progress  (was: Patch Available)

distcp method different from FileUtil.copy, because distcp method ignore parent 
directory, then may overwrite stats files,cause stats info lost.?

modify dst value, add parent directory to dst path, make distcp method run 
result as same as?FileUtil.copy.

> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>  Labels: patch, pull-request-available
> Attachments: HIVE-25025.patch
>
>   Original Estimate: 72h
>  Time Spent: 10m
>  Remaining Estimate: 71h 50m
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 
> example:
> set hive.exec.copyfile.maxsize=0;
>  set hive.exec.copyfile.maxnumfiles=0;
> insert overwrite table abc_new select * from abc;
> select count(1) from abc_new ;
> select * from abc_new ;
> then the  count(1) result will be 0, but select * will display real data, 
> because stats info lost.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread WangHualei (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangHualei updated HIVE-25025:
--
Attachment: HIVE-25025.patch
Labels: patch pull-request-available  (was: pull-request-available)
Status: Patch Available  (was: In Progress)

distcp method different from FileUtil.copy, because distcp method ignore parent 
directory, then may overwrite stats files,cause stats info lost.?

modify dst value, add parent directory to dst path, make distcp method run 
result as same as?FileUtil.copy.

> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>  Labels: pull-request-available, patch
> Attachments: HIVE-25025.patch
>
>   Original Estimate: 72h
>  Time Spent: 10m
>  Remaining Estimate: 71h 50m
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 
> example:
> set hive.exec.copyfile.maxsize=0;
>  set hive.exec.copyfile.maxnumfiles=0;
> insert overwrite table abc_new select * from abc;
> select count(1) from abc_new ;
> select * from abc_new ;
> then the  count(1) result will be 0, but select * will display real data, 
> because stats info lost.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-19 Thread hezhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated HIVE-25026:
---
Description: This issue is the same with hive-24577

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-19 Thread hezhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang reassigned HIVE-25026:
--


> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?focusedWorklogId=584939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584939
 ]

ASF GitHub Bot logged work on HIVE-25025:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 07:45
Start Date: 19/Apr/21 07:45
Worklog Time Spent: 10m 
  Work Description: wanghualei opened a new pull request #2187:
URL: https://github.com/apache/hive/pull/2187


   after set  Run as end user instead of Hive user , when execute insert 
overwrite , In MoveTask ,if source byte > HIVE_EXEC_COPYFILE_MAXSIZE  and 
source file count> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp 
method, it may cause tmp stats file lost. 
   
   example:
   
   set hive.exec.copyfile.maxsize=0;
   set hive.exec.copyfile.maxnumfiles=0;
   
   insert overwrite table abc_new select * from abc;
   
   select count(1) from abc_new ;
   
   select * from abc_new ;
   
   then the  count(1) result will be 0, but select * will display real data, 
because stats info lost.
   
   
   fix:
   
   distcp method different from FileUtil.copy, because distcp method ignore 
parent directory, then may overwrite stats files,cause stats info lost.?
   
   modify dst value, add parent directory to dst path, make distcp method run 
result as same as?FileUtil.copy.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584939)
Remaining Estimate: 71h 50m  (was: 72h)
Time Spent: 10m

> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>   Original Estimate: 72h
>  Time Spent: 10m
>  Remaining Estimate: 71h 50m
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 
> example:
> set hive.exec.copyfile.maxsize=0;
>  set hive.exec.copyfile.maxnumfiles=0;
> insert overwrite table abc_new select * from abc;
> select count(1) from abc_new ;
> select * from abc_new ;
> then the  count(1) result will be 0, but select * will display real data, 
> because stats info lost.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25025:
--
Labels: pull-request-available  (was: )

> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Time Spent: 10m
>  Remaining Estimate: 71h 50m
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 
> example:
> set hive.exec.copyfile.maxsize=0;
>  set hive.exec.copyfile.maxnumfiles=0;
> insert overwrite table abc_new select * from abc;
> select count(1) from abc_new ;
> select * from abc_new ;
> then the  count(1) result will be 0, but select * will display real data, 
> because stats info lost.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread WangHualei (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25025 started by WangHualei.
-
> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 
> example:
> set hive.exec.copyfile.maxsize=0;
>  set hive.exec.copyfile.maxnumfiles=0;
> insert overwrite table abc_new select * from abc;
> select count(1) from abc_new ;
> select * from abc_new ;
> then the  count(1) result will be 0, but select * will display real data, 
> because stats info lost.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25017) Fix response in GetLatestCommittedCompaction

2021-04-19 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-25017.
--
Resolution: Fixed

Committed to master branch. Thanks for your contribution [~hsnusonic]!

> Fix response in GetLatestCommittedCompaction
> 
>
> Key: HIVE-25017
> URL: https://issues.apache.org/jira/browse/HIVE-25017
> Project: Hive
>  Issue Type: Bug
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Dbname and Tablename are required for CompactionInfoStruct but the response 
> of getLatestCommittedCompactionInfo is not setting them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25017) Fix response in GetLatestCommittedCompaction

2021-04-19 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-25017:
-
Fix Version/s: 4.0.0

> Fix response in GetLatestCommittedCompaction
> 
>
> Key: HIVE-25017
> URL: https://issues.apache.org/jira/browse/HIVE-25017
> Project: Hive
>  Issue Type: Bug
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Dbname and Tablename are required for CompactionInfoStruct but the response 
> of getLatestCommittedCompactionInfo is not setting them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25017) Fix response in GetLatestCommittedCompaction

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25017?focusedWorklogId=584930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584930
 ]

ASF GitHub Bot logged work on HIVE-25017:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 07:26
Start Date: 19/Apr/21 07:26
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #2181:
URL: https://github.com/apache/hive/pull/2181


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584930)
Time Spent: 20m  (was: 10m)

> Fix response in GetLatestCommittedCompaction
> 
>
> Key: HIVE-25017
> URL: https://issues.apache.org/jira/browse/HIVE-25017
> Project: Hive
>  Issue Type: Bug
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Dbname and Tablename are required for CompactionInfoStruct but the response 
> of getLatestCommittedCompactionInfo is not setting them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread WangHualei (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangHualei reassigned HIVE-25025:
-


> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
> Environment: example:
> set hive.exec.copyfile.maxsize=0;
> set hive.exec.copyfile.maxnumfiles=0;
> insert overwrite table abd_new select * from abc;
> select count(*) from abd_new ;
> select * from abd_new ;
> then the  count(*) result will be 0, but select * will display real data, 
> because stats info lost.
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread WangHualei (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangHualei updated HIVE-25025:
--
Description: 
after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when execute 
insert overwrite , In MoveTask ,if source byte > HIVE_EXEC_COPYFILE_MAXSIZE  
and source file count> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp 
method, it may cause tmp stats file lost. 

example:

set hive.exec.copyfile.maxsize=0;
 set hive.exec.copyfile.maxnumfiles=0;

insert overwrite table abc_new select * from abc;

select count(1) from abc_new ;

select * from abc_new ;

then the  count(1) result will be 0, but select * will display real data, 
because stats info lost.

 

  was:after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
execute insert overwrite , In MoveTask ,if source byte > 
HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause tmp 
stats file lost. 


> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 
> example:
> set hive.exec.copyfile.maxsize=0;
>  set hive.exec.copyfile.maxnumfiles=0;
> insert overwrite table abc_new select * from abc;
> select count(1) from abc_new ;
> select * from abc_new ;
> then the  count(1) result will be 0, but select * will display real data, 
> because stats info lost.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost

2021-04-19 Thread WangHualei (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangHualei updated HIVE-25025:
--
Environment: (was: example:

set hive.exec.copyfile.maxsize=0;
set hive.exec.copyfile.maxnumfiles=0;

insert overwrite table abd_new select * from abc;

select count(*) from abd_new ;

select * from abd_new ;

then the  count(*) result will be 0, but select * will display real data, 
because stats info lost.)

> Distcp In MoveTask may cause stats info lost
> 
>
> Key: HIVE-25025
> URL: https://issues.apache.org/jira/browse/HIVE-25025
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: WangHualei
>Assignee: WangHualei
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> after set  _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when 
> execute insert overwrite , In MoveTask ,if source byte > 
> HIVE_EXEC_COPYFILE_MAXSIZE  and source file count> 
> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause 
> tmp stats file lost. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25009?focusedWorklogId=584925=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584925
 ]

ASF GitHub Bot logged work on HIVE-25009:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 07:04
Start Date: 19/Apr/21 07:04
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #2175:
URL: https://github.com/apache/hive/pull/2175


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584925)
Time Spent: 40m  (was: 0.5h)

> Compaction worker and initiator version check can cause NPE if the 
> COMPACTION_QUEUE is empty
> 
>
> Key: HIVE-25009
> URL: https://issues.apache.org/jira/browse/HIVE-25009
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24577) Task resubmission bug

2021-04-19 Thread hezhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang reassigned HIVE-24577:
--

Assignee: hezhang  (was: guojh)

> Task resubmission bug
> -
>
> Key: HIVE-24577
> URL: https://issues.apache.org/jira/browse/HIVE-24577
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
> Environment: hive-2.3.4
>Reporter: guojh
>Assignee: hezhang
>Priority: Major
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), tasks submit to yarn with parallel. If the jobs completed 
> simultaneously, then Their children task may submit more than ones.
> In our production cluster, we have a query with the stage dependencies is 
> below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1, Stage-10, Stage-14
>   Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5
>   Stage-4
>   Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
>   Stage-3
>   Stage-5
>   Stage-6 depends on stages: Stage-5
>   Stage-18 is a root stage
>   Stage-9 depends on stages: Stage-18
>   Stage-10 depends on stages: Stage-9
>   Stage-19 is a root stage
>   Stage-13 depends on stages: Stage-19
>   Stage-14 depends on stages: Stage-13
> {code}
> There is a certain probability that Stage-10 and Stage-14 will complete at 
> the same time, then their children Stage-2 was submitted twice. As bellow log:
> {code:java}
> 2021-01-03T13:35:32,079  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Launching Job 6 out of 6
> 2021-01-03T13:35:32,080  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2021-01-03T13:35:32,082  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Launching Job 7 out of 6
> 2021-01-03T13:35:32,083  INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25016) Error while running repl dump with with db regex

2021-04-19 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi resolved HIVE-25016.

Resolution: Fixed

> Error while running repl dump with with db regex
> 
>
> Key: HIVE-25016
> URL: https://issues.apache.org/jira/browse/HIVE-25016
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Doing incremental dump with create-database event with dbRegex `*` gives the 
> following exception : 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.EximUtil.updateIfCustomDbLocations(EximUtil.java:388)
> at 
> org.apache.hadoop.hive.ql.parse.EximUtil.createDbExportDump(EximUtil.java:357)
> at 
> org.apache.hadoop.hive.ql.parse.repl.dump.events.CreateDatabaseHandler.handle(CreateDatabaseHandler.java:42)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpEvent(ReplDumpTask.java:827)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:632)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:209)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25016) Error while running repl dump with with db regex

2021-04-19 Thread Aasha Medhi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324733#comment-17324733
 ] 

Aasha Medhi commented on HIVE-25016:


+1 Committed to master. Thank you for the patch [~^sharma] 

> Error while running repl dump with with db regex
> 
>
> Key: HIVE-25016
> URL: https://issues.apache.org/jira/browse/HIVE-25016
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Doing incremental dump with create-database event with dbRegex `*` gives the 
> following exception : 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.EximUtil.updateIfCustomDbLocations(EximUtil.java:388)
> at 
> org.apache.hadoop.hive.ql.parse.EximUtil.createDbExportDump(EximUtil.java:357)
> at 
> org.apache.hadoop.hive.ql.parse.repl.dump.events.CreateDatabaseHandler.handle(CreateDatabaseHandler.java:42)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpEvent(ReplDumpTask.java:827)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:632)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:209)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-19 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324725#comment-17324725
 ] 

László Pintér commented on HIVE-24928:
--

Thanks [~kgyrtkirk] [~mbod] [~pvary] for the reviews!

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-19 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-24928.
--
Resolution: Fixed

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=584904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584904
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 06:22
Start Date: 19/Apr/21 06:22
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2111:
URL: https://github.com/apache/hive/pull/2111


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584904)
Time Spent: 6h 20m  (was: 6h 10m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

74 matches

Mail list logo