[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=506040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506040
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 04:04
Start Date: 29/Oct/20 04:04
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r513942354



##
File path: ql/src/test/results/clientpositive/llap/sharedwork_semi.q.out
##
@@ -541,7 +541,7 @@ STAGE PLANS:
 Map Operator Tree:
 TableScan
   alias: s
-  filterExpr: (ss_sold_date_sk is not null and 
((ss_sold_date_sk BETWEEN DynamicValue(RS_7_d_d_date_sk_min) AND 
DynamicValue(RS_7_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_7_d_d_date_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN 
DynamicValue(RS_21_d_d_date_sk_min) AND DynamicValue(RS_21_d_d_date_sk_max) and 
in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_21_d_d_date_sk_bloom_filter) (type: boolean)
+  filterExpr: (((ss_sold_date_sk BETWEEN 
DynamicValue(RS_7_d_d_date_sk_min) AND DynamicValue(RS_7_d_d_date_sk_max) and 
in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_7_d_d_date_sk_bloom_filter))) 
or (ss_sold_date_sk BETWEEN DynamicValue(RS_21_d_d_date_sk_min) AND 
DynamicValue(RS_21_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_21_d_d_date_sk_bloom_filter and ss_sold_date_sk is not 
null) (type: boolean)

Review comment:
   we see a case when NonBlockingOpDeDupProc merges FIL-FIL, the 
conditionals may be reorder. 
   https://github.com/apache/hive/pull/1308 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506040)
Time Spent: 1h 40m  (was: 1.5h)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-28 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24325:
---
Status: Patch Available  (was: In Progress)

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This error happens when one of the columns that is used in the output 
> backtracks to a constant. We end up without a mapping for the column, which 
> leads to exception below.
> {code}
> org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
> target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
> 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
> at 
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1570)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:549)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12539)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
> 

[jira] [Work logged] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?focusedWorklogId=506032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506032
 ]

ASF GitHub Bot logged work on HIVE-24325:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 03:33
Start Date: 29/Oct/20 03:33
Worklog Time Spent: 10m 
  Work Description: jcamachor opened a new pull request #1622:
URL: https://github.com/apache/hive/pull/1622


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506032)
Remaining Estimate: 0h
Time Spent: 10m

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This error happens when one of the columns that is used in the output 
> backtracks to a constant. We end up without a mapping for the column, which 
> leads to exception below.
> {code}
> org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
> target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
> 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
> at 
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?focusedWorklogId=506033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506033
 ]

ASF GitHub Bot logged work on HIVE-24325:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 03:33
Start Date: 29/Oct/20 03:33
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1622:
URL: https://github.com/apache/hive/pull/1622#issuecomment-718337682


   @kasakrisz , could you review? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506033)
Time Spent: 20m  (was: 10m)

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This error happens when one of the columns that is used in the output 
> backtracks to a constant. We end up without a mapping for the column, which 
> leads to exception below.
> {code}
> org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
> target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
> 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
> at 
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
> 

[jira] [Updated] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24325:
--
Labels: pull-request-available  (was: )

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This error happens when one of the columns that is used in the output 
> backtracks to a constant. We end up without a mapping for the column, which 
> leads to exception below.
> {code}
> org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
> target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
> 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
> at 
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1570)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:549)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12539)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>  

[jira] [Updated] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-28 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24325:
---
Description: 
This error happens when one of the columns that is used in the output 
backtracks to a constant. We end up without a mapping for the column, which 
leads to exception below.

{code}
org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
at 
org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1570)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:549)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12539)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 

[jira] [Work started] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-28 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24325 started by Jesus Camacho Rodriguez.
--
> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> More info to come.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-28 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24325:
---
Summary: Cardinality preserving join optimization fails when column is 
backtracked to a constant  (was: Cardinality preserving join optimization may 
fail when column is backtracked to a constant)

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> More info to come.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24325) Cardinality preserving join optimization may fail when column is backtracked to a constant

2020-10-28 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24325:
---
Summary: Cardinality preserving join optimization may fail when column is 
backtracked to a constant  (was: Cardinality preserving join optimization may 
fail when column is a constant)

> Cardinality preserving join optimization may fail when column is backtracked 
> to a constant
> --
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> More info to come.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24325) Cardinality preserving join optimization may fail when column is a constant

2020-10-28 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24325:
--


> Cardinality preserving join optimization may fail when column is a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> More info to come.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24253) HMS and HS2 needs to support keystore/truststores types besides JKS by config

2020-10-28 Thread Yongzhi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222646#comment-17222646
 ] 

Yongzhi Chen commented on HIVE-24253:
-

[~kgyrtkirk], we cannot remove the ignore from it for the tests run fine when 
only run by itself, when I run the suite, it will have failures. My fixes in 
this Jira is just to add a new configurable property, it does not fix SSL 
existing issues in Hive. We may remove the ignore after we find the root cause 
of the issue and fix it. 

> HMS and HS2 needs to support keystore/truststores types besides JKS by config
> -
>
> Key: HIVE-24253
> URL: https://issues.apache.org/jira/browse/HIVE-24253
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support 
> the Keystore type configurable and default to keystore type specified for the 
> JDK and not always use JKS. Same as HIVE-23958 for hive, HMS should support 
> to set additional keystore/truststore types used for different applications 
> like for FIPS crypto algorithms.
> Also, make hive keystore type and algorithm configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24253) HMS and HS2 needs to support keystore/truststores types besides JKS by config

2020-10-28 Thread Yongzhi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen resolved HIVE-24253.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> HMS and HS2 needs to support keystore/truststores types besides JKS by config
> -
>
> Key: HIVE-24253
> URL: https://issues.apache.org/jira/browse/HIVE-24253
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support 
> the Keystore type configurable and default to keystore type specified for the 
> JDK and not always use JKS. Same as HIVE-23958 for hive, HMS should support 
> to set additional keystore/truststore types used for different applications 
> like for FIPS crypto algorithms.
> Also, make hive keystore type and algorithm configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23930) Upgrade to tez 0.10.0

2020-10-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222486#comment-17222486
 ] 

László Bodor commented on HIVE-23930:
-

pushed to master, thanks for the reviews on the blocker tasks: [~rbalamohan], 
[~ashutoshc], [~harishjp]!

> Upgrade to tez 0.10.0
> -
>
> Key: HIVE-23930
> URL: https://issues.apache.org/jira/browse/HIVE-23930
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Tez 0.10.0 is not yet released, but this ticket is for tracking the effort 
> and the needed hive changes.
> Currently, Hive depends on 0.9.1
> Hadoop dependencies:
> Hive/master: *3.1.0*
> Tez/master: *3.1.3*
> Tez/branch-0.9:  *2.7.2*
> TODOs: 
> - check why HIVE-23689 broke some unit tests intermittently (0.9.2 ->0.9.3 
> bump), because a 0.10.x upgrade will also contain those tez changes which 
> could be related
> - maintain the needed hive changes (reflecting tez api changes):
> HIVE-23190: LLAP: modify IndexCache to pass filesystem object to 
> TezSpillRecord



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24324) Remove deprecated API usage from Avro

2020-10-28 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-24324:
---


> Remove deprecated API usage from Avro
> -
>
> Key: HIVE-24324
> URL: https://issues.apache.org/jira/browse/HIVE-24324
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> {{JsonProperties#getJsonProp}} has been marked as deprecated in Avro 1.8 and 
> removed since Avro 1.9. This replaces the API usage for this with 
> {{getObjectProp}} which doesn't leak Json node from jackson. This will help 
> downstream apps to depend on Hive while using higher version of Avro, and 
> also help Hive to upgrade Avro version itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23930) Upgrade to tez 0.10.0

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23930?focusedWorklogId=505917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505917
 ]

ASF GitHub Bot logged work on HIVE-23930:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 21:16
Start Date: 28/Oct/20 21:16
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1311:
URL: https://github.com/apache/hive/pull/1311#issuecomment-718212897


   this is pushed to master directly with HIVE-24108, HIVE-23190



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505917)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrade to tez 0.10.0
> -
>
> Key: HIVE-23930
> URL: https://issues.apache.org/jira/browse/HIVE-23930
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Tez 0.10.0 is not yet released, but this ticket is for tracking the effort 
> and the needed hive changes.
> Currently, Hive depends on 0.9.1
> Hadoop dependencies:
> Hive/master: *3.1.0*
> Tez/master: *3.1.3*
> Tez/branch-0.9:  *2.7.2*
> TODOs: 
> - check why HIVE-23689 broke some unit tests intermittently (0.9.2 ->0.9.3 
> bump), because a 0.10.x upgrade will also contain those tez changes which 
> could be related
> - maintain the needed hive changes (reflecting tez api changes):
> HIVE-23190: LLAP: modify IndexCache to pass filesystem object to 
> TezSpillRecord



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23190) LLAP: modify IndexCache to pass filesystem object to TezSpillRecord

2020-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23190:

Fix Version/s: 4.0.0

> LLAP: modify IndexCache to pass filesystem object to TezSpillRecord
> ---
>
> Key: HIVE-23190
> URL: https://issues.apache.org/jira/browse/HIVE-23190
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23190.01.patch
>
>
> This ticket is about making the changes introduced in TEZ-4145 in Hive's copy 
> of IndexCache class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24108) AddToClassPathAction should use TezClassLoader

2020-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24108:

Fix Version/s: 4.0.0

> AddToClassPathAction should use TezClassLoader
> --
>
> Key: HIVE-24108
> URL: https://issues.apache.org/jira/browse/HIVE-24108
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-24108.01.patch, HIVE-24108.02.patch, 
> hive_log_llap.log
>
>
> TEZ-4228 fixes an issue from tez side, which is about to use TezClassLoader 
> instead of the system classloader. However, there are some codepaths, e.g. in 
>  [^hive_log_llap.log]  which shows that the system class loader is used. As 
> thread context classloaders are inherited, the easier solution is to 
> early-initialize TezClassLoader in LlapDaemon, and let all threads use that 
> as context class loader, so this solution is more like TEZ-4223 for llap 
> daemons.
> {code}
> 2020-09-02T00:18:20,242 ERROR [TezTR-93696_1_1_1_0_0] tez.TezProcessor: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:332)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:427)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:288)
>   ... 16 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:79)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:100)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:313)
>   ... 18 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
>   ... 21 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23190) LLAP: modify IndexCache to pass filesystem object to TezSpillRecord

2020-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-23190.
-
Resolution: Fixed

> LLAP: modify IndexCache to pass filesystem object to TezSpillRecord
> ---
>
> Key: HIVE-23190
> URL: https://issues.apache.org/jira/browse/HIVE-23190
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23190.01.patch
>
>
> This ticket is about making the changes introduced in TEZ-4145 in Hive's copy 
> of IndexCache class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24108) AddToClassPathAction should use TezClassLoader

2020-10-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222482#comment-17222482
 ] 

László Bodor commented on HIVE-24108:
-

pushed to master, thanks [~harishjp], [~ashutoshc] for the review!

> AddToClassPathAction should use TezClassLoader
> --
>
> Key: HIVE-24108
> URL: https://issues.apache.org/jira/browse/HIVE-24108
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: HIVE-24108.01.patch, HIVE-24108.02.patch, 
> hive_log_llap.log
>
>
> TEZ-4228 fixes an issue from tez side, which is about to use TezClassLoader 
> instead of the system classloader. However, there are some codepaths, e.g. in 
>  [^hive_log_llap.log]  which shows that the system class loader is used. As 
> thread context classloaders are inherited, the easier solution is to 
> early-initialize TezClassLoader in LlapDaemon, and let all threads use that 
> as context class loader, so this solution is more like TEZ-4223 for llap 
> daemons.
> {code}
> 2020-09-02T00:18:20,242 ERROR [TezTR-93696_1_1_1_0_0] tez.TezProcessor: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:332)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:427)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:288)
>   ... 16 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:79)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:100)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:313)
>   ... 18 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
>   ... 21 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23190) LLAP: modify IndexCache to pass filesystem object to TezSpillRecord

2020-10-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222484#comment-17222484
 ] 

László Bodor commented on HIVE-23190:
-

pushed to master, thanks [~rajesh.balamohan] for the review!

> LLAP: modify IndexCache to pass filesystem object to TezSpillRecord
> ---
>
> Key: HIVE-23190
> URL: https://issues.apache.org/jira/browse/HIVE-23190
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23190.01.patch
>
>
> This ticket is about making the changes introduced in TEZ-4145 in Hive's copy 
> of IndexCache class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23930) Upgrade to tez 0.10.0

2020-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-23930.
-
Resolution: Fixed

> Upgrade to tez 0.10.0
> -
>
> Key: HIVE-23930
> URL: https://issues.apache.org/jira/browse/HIVE-23930
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Tez 0.10.0 is not yet released, but this ticket is for tracking the effort 
> and the needed hive changes.
> Currently, Hive depends on 0.9.1
> Hadoop dependencies:
> Hive/master: *3.1.0*
> Tez/master: *3.1.3*
> Tez/branch-0.9:  *2.7.2*
> TODOs: 
> - check why HIVE-23689 broke some unit tests intermittently (0.9.2 ->0.9.3 
> bump), because a 0.10.x upgrade will also contain those tez changes which 
> could be related
> - maintain the needed hive changes (reflecting tez api changes):
> HIVE-23190: LLAP: modify IndexCache to pass filesystem object to 
> TezSpillRecord



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23930) Upgrade to tez 0.10.0

2020-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23930:

Fix Version/s: 4.0.0

> Upgrade to tez 0.10.0
> -
>
> Key: HIVE-23930
> URL: https://issues.apache.org/jira/browse/HIVE-23930
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Tez 0.10.0 is not yet released, but this ticket is for tracking the effort 
> and the needed hive changes.
> Currently, Hive depends on 0.9.1
> Hadoop dependencies:
> Hive/master: *3.1.0*
> Tez/master: *3.1.3*
> Tez/branch-0.9:  *2.7.2*
> TODOs: 
> - check why HIVE-23689 broke some unit tests intermittently (0.9.2 ->0.9.3 
> bump), because a 0.10.x upgrade will also contain those tez changes which 
> could be related
> - maintain the needed hive changes (reflecting tez api changes):
> HIVE-23190: LLAP: modify IndexCache to pass filesystem object to 
> TezSpillRecord



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24108) AddToClassPathAction should use TezClassLoader

2020-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-24108.
-
Resolution: Fixed

> AddToClassPathAction should use TezClassLoader
> --
>
> Key: HIVE-24108
> URL: https://issues.apache.org/jira/browse/HIVE-24108
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: HIVE-24108.01.patch, HIVE-24108.02.patch, 
> hive_log_llap.log
>
>
> TEZ-4228 fixes an issue from tez side, which is about to use TezClassLoader 
> instead of the system classloader. However, there are some codepaths, e.g. in 
>  [^hive_log_llap.log]  which shows that the system class loader is used. As 
> thread context classloaders are inherited, the easier solution is to 
> early-initialize TezClassLoader in LlapDaemon, and let all threads use that 
> as context class loader, so this solution is more like TEZ-4223 for llap 
> daemons.
> {code}
> 2020-09-02T00:18:20,242 ERROR [TezTR-93696_1_1_1_0_0] tez.TezProcessor: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:332)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:427)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:288)
>   ... 16 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:79)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:100)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:313)
>   ... 18 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
>   ... 21 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=505919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505919
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 21:19
Start Date: 28/Oct/20 21:19
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1241:
URL: https://github.com/apache/hive/pull/1241#issuecomment-718214905


   FYI, HIVE-23930 is resolved, hive now depends on tez 0.10.0 which is JDK11 
compliant



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505919)
Time Spent: 3h 10m  (was: 3h)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=505862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505862
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 18:48
Start Date: 28/Oct/20 18:48
Worklog Time Spent: 10m 
  Work Description: mustafaiman closed pull request #1577:
URL: https://github.com/apache/hive/pull/1577


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505862)
Time Spent: 2h 20m  (was: 2h 10m)

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=505853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505853
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 18:38
Start Date: 28/Oct/20 18:38
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r513671527



##
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorGraph.java
##
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.AppMasterEventOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule.DiGraph;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.DynamicPruningEventDesc;
+
+import com.google.common.collect.Sets;
+
+public class OperatorGraph {

Review comment:
   @jcamachor this is the checker class I was talking about - right now it 
builds on top of the basic `digraph` class I've introduce some time ago in 
`PointLookupOptimizer`
   
   

##
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorGraph.java
##
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.AppMasterEventOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule.DiGraph;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.DynamicPruningEventDesc;
+
+import com.google.common.collect.Sets;
+
+public class OperatorGraph {
+
+  /**
+   * A directed graph extended with support to check dag property.
+   */
+  static class DagGraph extends DiGraph {

Review comment:
   we can definetly roll our own graph representation; however sometimes I 
would feel that it would make things easier to have access to basic graph 
algorithms (for example to do a  topological order walk/etc) there is a small 
library called [jgrapht](https://jgrapht.org/) (EPL 2.0 license - I think it 
will be okay) which could be utilized for these kind of things
   
   @jcamachor what do you think about pulling in the jgrapht lib and removing 
the makeshift digraph classes?

##
File path: 

[jira] [Commented] (HIVE-24108) AddToClassPathAction should use TezClassLoader

2020-10-28 Thread Ashutosh Chauhan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222330#comment-17222330
 ] 

Ashutosh Chauhan commented on HIVE-24108:
-

+1

> AddToClassPathAction should use TezClassLoader
> --
>
> Key: HIVE-24108
> URL: https://issues.apache.org/jira/browse/HIVE-24108
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: HIVE-24108.01.patch, HIVE-24108.02.patch, 
> hive_log_llap.log
>
>
> TEZ-4228 fixes an issue from tez side, which is about to use TezClassLoader 
> instead of the system classloader. However, there are some codepaths, e.g. in 
>  [^hive_log_llap.log]  which shows that the system class loader is used. As 
> thread context classloaders are inherited, the easier solution is to 
> early-initialize TezClassLoader in LlapDaemon, and let all threads use that 
> as context class loader, so this solution is more like TEZ-4223 for llap 
> daemons.
> {code}
> 2020-09-02T00:18:20,242 ERROR [TezTR-93696_1_1_1_0_0] tez.TezProcessor: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:332)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:427)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:288)
>   ... 16 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:79)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:100)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:313)
>   ... 18 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
>   ... 21 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24270) Move scratchdir cleanup to background

2020-10-28 Thread Ashutosh Chauhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-24270:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Mustafa!

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24323) JDBC driver fails when using Kerberos due to missing dependencies

2020-10-28 Thread N Campbell (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

N Campbell updated HIVE-24323:
--
Description: 
*The Apache Hive web pages historically implied that only 3-JAR files are 
required*

hadoop-auth
 hadoop-common
 hive-jdbc



*If a connection is attempted using Kerberos authentication, it will fail due 
to several missing dependencies*

hadoop-auth-3.1.1.3.1.5.0-152.jar
 hadoop-common-3.1.1.3.1.5.0-152.jar
 hive-jdbc-3.1.0.3.1.5.0-152-standalone.jar

*Dependencies*
commons-collections-3.2.2.jar
 commons-configuration2.jar
 commons-lang-2.6.jar
 guava-29.0-jre.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.25.jar




*It is unclear if the intent of the standalone JAR is to include these 
dependencies or not.* 
But does not seem to be any documentation either way.



*It also appears that dependencies are not being shaded, which can result in 
conflicts with guava or wstx jar files in the class path.* 
Such as noted by ORACLE {color:#00}Doc ID 2650046.1{color}

{color:#00} 
com.ctc.wstx.io.StreamBootstrapper.getInstance(Ljava/lang/String;Lcom/ctc/wstx/io/SystemId;Ljava/io/InputStream;)Lcom/ctc/wstx/io/StreamBootstrapper;
 ]

java.lang.NoSuchMethodError: 
com.ctc.wstx.io.StreamBootstrapper.getInstance(Ljava/lang/String;Lcom/ctc/wstx/io/SystemId;Ljava/io/InputStream;)Lcom/ctc/wstx/io/StreamBootstrapper;
  at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2918)
  at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2901){color}

  was:
The Apache Hive web pages historically implied that only 3-JAR files are 
required

 hadoop-auth
 hadoop-common
 hive-jdbc

If a connection is attempted using Kerberos authentication, it will fail due to 
several missing dependencies

 hadoop-auth-3.1.1.3.1.5.0-152.jar
 hadoop-common-3.1.1.3.1.5.0-152.jar
 hive-jdbc-3.1.0.3.1.5.0-152-standalone.jar

It is unclear if the intent of the standalone JAR is to include these 
dependencies or not. But does not seem to be any documentation either way. 

It also appears that dependencies are not being shaded, which can result in 
conflicts with guava or wstx jar files in the class path. Such as noted by 
ORACLE {color:#00}Doc ID 2650046.1{color}

 commons-collections-3.2.2.jar
 commons-configuration2.jar
 commons-lang-2.6.jar
 guava-29.0-jre.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.25.jar


> JDBC driver fails when using Kerberos due to missing dependencies
> -
>
> Key: HIVE-24323
> URL: https://issues.apache.org/jira/browse/HIVE-24323
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.0
>Reporter: N Campbell
>Priority: Major
>
> *The Apache Hive web pages historically implied that only 3-JAR files are 
> required*
> hadoop-auth
>  hadoop-common
>  hive-jdbc
> *If a connection is attempted using Kerberos authentication, it will fail due 
> to several missing dependencies*
> hadoop-auth-3.1.1.3.1.5.0-152.jar
>  hadoop-common-3.1.1.3.1.5.0-152.jar
>  hive-jdbc-3.1.0.3.1.5.0-152-standalone.jar
> *Dependencies*
> commons-collections-3.2.2.jar
>  commons-configuration2.jar
>  commons-lang-2.6.jar
>  guava-29.0-jre.jar
>  log4j-1.2.17.jar
>  slf4j-api-1.7.25.jar
> *It is unclear if the intent of the standalone JAR is to include these 
> dependencies or not.* 
> But does not seem to be any documentation either way.
> *It also appears that dependencies are not being shaded, which can result in 
> conflicts with guava or wstx jar files in the class path.* 
> Such as noted by ORACLE {color:#00}Doc ID 2650046.1{color}
> {color:#00} 
> com.ctc.wstx.io.StreamBootstrapper.getInstance(Ljava/lang/String;Lcom/ctc/wstx/io/SystemId;Ljava/io/InputStream;)Lcom/ctc/wstx/io/StreamBootstrapper;
>  ]
> java.lang.NoSuchMethodError: 
> com.ctc.wstx.io.StreamBootstrapper.getInstance(Ljava/lang/String;Lcom/ctc/wstx/io/SystemId;Ljava/io/InputStream;)Lcom/ctc/wstx/io/StreamBootstrapper;
>   at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2918)
>   at 
> org.apache.hadoop.conf.Configuration.parse(Configuration.java:2901){color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=505832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505832
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 17:56
Start Date: 28/Oct/20 17:56
Worklog Time Spent: 10m 
  Work Description: kuczoram opened a new pull request #1620:
URL: https://github.com/apache/hive/pull/1620


   …he move step
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505832)
Time Spent: 0.5h  (was: 20m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=505829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505829
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 17:44
Start Date: 28/Oct/20 17:44
Worklog Time Spent: 10m 
  Work Description: kuczoram closed pull request #1557:
URL: https://github.com/apache/hive/pull/1557


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505829)
Time Spent: 20m  (was: 10m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2020-10-28 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-24322:
-
Description: 
In IMPALA-10247 there was an exception from Hive when tyring to load the data:
{noformat}
2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
exec.Task: Job Commit failed with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
 at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
 at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
 at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
 at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
 at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
 at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
 at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
 at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:392)
 at 
org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
 ... 29 more
{noformat}
The reason of the exception was that Hive was trying to read an empty manifest 
file. Manifest files are used in case of direct insert to determine which files 
needs to be kept and which one needs to be cleaned up. They are created by the 
tasks and they use the task attempt Id as postfix. In this particular test what 
happened is that one of the container ran out of memory so Tez decided to kill 
it right after the manifest file got created but before the paths got written 
into the manifest file. This was the manifest file for the task attempt 0. Then 
Tez assigned a new container to the task, so a new attempt was made with 
attemptId=1. This one was successful, and wrote the manifest file correctly. 
But Hive didn't know about this, since this out of memory issue got handled by 
Tez under the hood, so there was no exception in Hive, therefore no clean-up in 
the manifest folder. And when Hive is reading the manifest files, it just reads 
every file from the defined folder, so it tried to read the manifest files for 
attempt 0 and 1 as well.
If there are multiple manifest files with the same name but different 
attemptId, Hive should only read the one with the biggest attempt Id.

  was:
In [IMPALA-10247|https://issues.apache.org/jira/browse/IMPALA-10247] there was 
an exception from Hive when tyring to load the data:
{noformat}
2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
exec.Task: Job Commit failed with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
 at 

[jira] [Assigned] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2020-10-28 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora reassigned HIVE-24322:



> In case of direct insert, the attempt ID has to be checked when reading the 
> manifest files
> --
>
> Key: HIVE-24322
> URL: https://issues.apache.org/jira/browse/HIVE-24322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
> Fix For: 4.0.0
>
>
> In [IMPALA-10247|https://issues.apache.org/jira/browse/IMPALA-10247] there 
> was an exception from Hive when tyring to load the data:
> {noformat}
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
>  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
>  ... 29 more
> {noformat}
> The reason of the exception was that Hive was trying to read an empty 
> manifest file. Manifest files are used in case of direct insert to determine 
> which files needs to be kept and which one needs to be cleaned up. They are 
> created by the tasks and they use the tast attempt Id as postfix. In this 
> particular test what happened is that one of the container ran out of memory 
> so Tez decided to kill it right after the manifest file got created but 
> before the pathes got written into the manifest file. This was the manifest 
> file for the task attempt 0. Then Tez assigned a new container to the task, 
> so a new attemp was made with attemptId=1. This one was successful, and wrote 
> the manifest file correctly. But Hive didn't know about this, since this out 
> of memory issue got handled by Tez under the hood, so there was no exception 
> in Hive, therefore no clean-up in the manifest folder. And when Hive is 
> reading the manifest files, it just reads every file from the defined folder, 
> so it tried to read the manifest files for attemp 0 and 1 as well.
> If there are multiple manifest files with the same name but different 
> attemptId, Hive should only read the one with the biggest attempt Id.



--
This 

[jira] [Updated] (HIVE-24321) Implement Default getSerDeStats in AbstractSerDe

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24321:
--
Labels: pull-request-available  (was: )

> Implement Default getSerDeStats in AbstractSerDe
> 
>
> Key: HIVE-24321
> URL: https://issues.apache.org/jira/browse/HIVE-24321
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seems like very few SerDes implement the getSerDeStats feature.  Add a 
> default implementation and remove all of the superfluous overrides in the 
> implementing classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24321) Implement Default getSerDeStats in AbstractSerDe

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24321?focusedWorklogId=505778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505778
 ]

ASF GitHub Bot logged work on HIVE-24321:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 15:55
Start Date: 28/Oct/20 15:55
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1619:
URL: https://github.com/apache/hive/pull/1619


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505778)
Remaining Estimate: 0h
Time Spent: 10m

> Implement Default getSerDeStats in AbstractSerDe
> 
>
> Key: HIVE-24321
> URL: https://issues.apache.org/jira/browse/HIVE-24321
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seems like very few SerDes implement the getSerDeStats feature.  Add a 
> default implementation and remove all of the superfluous overrides in the 
> implementing classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24321) Implement Default getSerDeStats in AbstractSerDe

2020-10-28 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24321:
-


> Implement Default getSerDeStats in AbstractSerDe
> 
>
> Key: HIVE-24321
> URL: https://issues.apache.org/jira/browse/HIVE-24321
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Seems like very few SerDes implement the getSerDeStats feature.  Add a 
> default implementation and remove all of the superfluous overrides in the 
> implementing classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24253) HMS and HS2 needs to support keystore/truststores types besides JKS by config

2020-10-28 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1730#comment-1730
 ] 

Zoltan Haindrich commented on HIVE-24253:
-

[~ychena]   the PR seems to be merged around a week ago - I think we should 
close this ticket.

but...I was coming here for a different reason: it seems to me that you've made 
some improvements to the `TestSSL` test ; do you think we can remove the Ignore 
from it?
(some testcases from TestSSL were  frequent guests in unrelated testruns - so 
it was marked as ignored)
https://github.com/apache/hive/blob/375433510b73c5a22bde4e13485dfc16eaa24706/itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestSSL.java#L56

> HMS and HS2 needs to support keystore/truststores types besides JKS by config
> -
>
> Key: HIVE-24253
> URL: https://issues.apache.org/jira/browse/HIVE-24253
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support 
> the Keystore type configurable and default to keystore type specified for the 
> JDK and not always use JKS. Same as HIVE-23958 for hive, HMS should support 
> to set additional keystore/truststore types used for different applications 
> like for FIPS crypto algorithms.
> Also, make hive keystore type and algorithm configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=505767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505767
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 15:36
Start Date: 28/Oct/20 15:36
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun opened a new pull request #1616:
URL: https://github.com/apache/hive/pull/1616


   ### What changes were proposed in this pull request?
   
   This PR aims to upgrade Apache ORC from 1.5.6 to 1.5.8.
   
   ### Why are the changes needed?
   
   This will bring eleven bug fixes.
   - ORC 1.5.7: https://issues.apache.org/jira/projects/ORC/versions/12345702
   - ORC 1.5.8: https://issues.apache.org/jira/projects/ORC/versions/12346462
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Pass the CI with the existing test cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505767)
Time Spent: 0.5h  (was: 20m)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=505768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505768
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 15:36
Start Date: 28/Oct/20 15:36
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun closed pull request #1616:
URL: https://github.com/apache/hive/pull/1616


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505768)
Time Spent: 40m  (was: 0.5h)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=505769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505769
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 15:36
Start Date: 28/Oct/20 15:36
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1616:
URL: https://github.com/apache/hive/pull/1616#issuecomment-718017194


   Thanks, @pgaref . I closed and reopened this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505769)
Time Spent: 50m  (was: 40m)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24297) LLAP buffer collision causes NPE

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24297?focusedWorklogId=505743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505743
 ]

ASF GitHub Bot logged work on HIVE-24297:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 14:43
Start Date: 28/Oct/20 14:43
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #1614:
URL: https://github.com/apache/hive/pull/1614


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505743)
Time Spent: 20m  (was: 10m)

> LLAP buffer collision causes NPE
> 
>
> Key: HIVE-24297
> URL: https://issues.apache.org/jira/browse/HIVE-24297
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-23741 introduced an optimization so that CacheTags are not stored on 
> buffer level, but rather on file level, as one cache tag can only relate to 
> one file. With this change a buffer->filecache reference was introduced so 
> that the buffer's tag can be calculated with an extra indirection i.e. 
> buffer.filecache.tag.
> However during buffer collision in putFileData method, we don't set the 
> filecache reference of the collided (new) buffer: 
> [https://github.com/apache/hive/commit/2e18a7408a8dd49beecad8d66bfe054b7dc474da#diff-d2ccd7cf3042845a0812a5e118f82db49253d82fc86449ffa408903bf434fb6dR309-R311]
> Later this cases NPE when the new (instantly decRef'ed) buffer is evicted:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)
> at 
> java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:129)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:125)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.reportRemoved(CacheContentsTracker.java:109)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyEvicted(CacheContentsTracker.java:238)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(LowLevelLrfuCachePolicy.java:276)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(CacheContentsTracker.java:177)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:98)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:65)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:323)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(EncodedReaderImpl.java:1302)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:930)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:506)
> ... 16 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23829) Compute Stats Incorrect for Binary Columns

2020-10-28 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-23829.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks!

> Compute Stats Incorrect for Binary Columns
> --
>
> Key: HIVE-23829
> URL: https://issues.apache.org/jira/browse/HIVE-23829
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I came across an issue when working on [HIVE-22674].
> The SerDe used for processing binary data tries to auto-detect if the data is 
> in Base-64.  It uses 
> {{org.apache.commons.codec.binary.Base64#isArrayByteBase64}} which has two 
> issues:
> # It's slow since it will check if the array is compatible,... and then 
> process the data (examines the array twice)
> # More importantly, this method _Tests a given byte array to see if it 
> contains only valid characters within the Base64 alphabet. Currently the 
> method treats whitespace as valid._
> https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isArrayByteBase64-byte:A-
> The 
> [qtest|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/ql/src/test/queries/clientpositive/compute_stats_binary.q]
>  for this feature uses full sentences (which includes spaces) 
> [here|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/data/files/binary.txt]
>  and therefore it thinks this data is Base-64 and returns an incorrect 
> estimation for size.
> This should really not auto-detect Base64 data and instead it should be 
> enabled with a table property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23829) Compute Stats Incorrect for Binary Columns

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23829?focusedWorklogId=505720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505720
 ]

ASF GitHub Bot logged work on HIVE-23829:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 13:31
Start Date: 28/Oct/20 13:31
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1313:
URL: https://github.com/apache/hive/pull/1313


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505720)
Time Spent: 1h 40m  (was: 1.5h)

> Compute Stats Incorrect for Binary Columns
> --
>
> Key: HIVE-23829
> URL: https://issues.apache.org/jira/browse/HIVE-23829
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I came across an issue when working on [HIVE-22674].
> The SerDe used for processing binary data tries to auto-detect if the data is 
> in Base-64.  It uses 
> {{org.apache.commons.codec.binary.Base64#isArrayByteBase64}} which has two 
> issues:
> # It's slow since it will check if the array is compatible,... and then 
> process the data (examines the array twice)
> # More importantly, this method _Tests a given byte array to see if it 
> contains only valid characters within the Base64 alphabet. Currently the 
> method treats whitespace as valid._
> https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isArrayByteBase64-byte:A-
> The 
> [qtest|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/ql/src/test/queries/clientpositive/compute_stats_binary.q]
>  for this feature uses full sentences (which includes spaces) 
> [here|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/data/files/binary.txt]
>  and therefore it thinks this data is Base-64 and returns an incorrect 
> estimation for size.
> This should really not auto-detect Base64 data and instead it should be 
> enabled with a table property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23829) Compute Stats Incorrect for Binary Columns

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23829?focusedWorklogId=505721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505721
 ]

ASF GitHub Bot logged work on HIVE-23829:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 13:31
Start Date: 28/Oct/20 13:31
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1313:
URL: https://github.com/apache/hive/pull/1313#issuecomment-717934374


   @HunterL Merged to master.  Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505721)
Time Spent: 1h 50m  (was: 1h 40m)

> Compute Stats Incorrect for Binary Columns
> --
>
> Key: HIVE-23829
> URL: https://issues.apache.org/jira/browse/HIVE-23829
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I came across an issue when working on [HIVE-22674].
> The SerDe used for processing binary data tries to auto-detect if the data is 
> in Base-64.  It uses 
> {{org.apache.commons.codec.binary.Base64#isArrayByteBase64}} which has two 
> issues:
> # It's slow since it will check if the array is compatible,... and then 
> process the data (examines the array twice)
> # More importantly, this method _Tests a given byte array to see if it 
> contains only valid characters within the Base64 alphabet. Currently the 
> method treats whitespace as valid._
> https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isArrayByteBase64-byte:A-
> The 
> [qtest|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/ql/src/test/queries/clientpositive/compute_stats_binary.q]
>  for this feature uses full sentences (which includes spaces) 
> [here|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/data/files/binary.txt]
>  and therefore it thinks this data is Base-64 and returns an incorrect 
> estimation for size.
> This should really not auto-detect Base64 data and instead it should be 
> enabled with a table property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24288) Files created by CompileProcessor have incorrect permissions

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24288?focusedWorklogId=505716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505716
 ]

ASF GitHub Bot logged work on HIVE-24288:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 13:04
Start Date: 28/Oct/20 13:04
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1590:
URL: https://github.com/apache/hive/pull/1590#discussion_r513424742



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java
##
@@ -254,6 +276,9 @@ CommandProcessorResponse compile(SessionState ss) throws 
CommandProcessorExcepti
 
 if (ss != null){
   ss.add_resource(ResourceType.JAR, testArchive.getAbsolutePath());
+  try {
+testArchive.deleteOnExit();

Review comment:
   after testing further, somehow deleting the jar after add_resource will 
result in CNFE when creating the UDF. So the above should be fine given the 
permissions are rw
   
   java.lang.ClassNotFoundException: Pyth
at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
~[?:1.8.0_231]
at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_231]
at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_231]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_231]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_231]
at 
org.apache.hadoop.hive.ql.ddl.function.create.CreateFunctionOperation.getUdfClass(CreateFunctionOperation.java:96)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.function.create.CreateFunctionOperation.createTemporaryFunction(CreateFunctionOperation.java:73)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.function.create.CreateFunctionOperation.execute(CreateFunctionOperation.java:57)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:326) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:144) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:325)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_231]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
 ~[hadoop-common-3.1.0.jar:?]
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:343)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_231]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_231]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_231]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_231]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.


[jira] [Commented] (HIVE-24320) TestMiniLlapLocal sometimes hangs because of some derby issues

2020-10-28 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222124#comment-17222124
 ] 

Zoltan Haindrich commented on HIVE-24320:
-

attached last 1000 lines of hive log  and full jstack trace 

there seems to be some derby issues prior to this state - not sure if those are 
only caused by the issue or they are part of the root cause

issues seems to start with:
{code}
2020-10-28T01:24:33,767  WARN [Heartbeater-3] pool.ProxyConnection: 
HikariPool-3 - Connection org.apache.derby.impl.jdbc.EmbedConnection@1913174287 
(XID = null), (SESSIONID = 68), (DATABASE = 
/home/jenkins/agent/workspace/internal-hive-precommit_PR-2/itests/qtest/target/tmp/junit_metastore_db),
 (DRDAID = null)  marked as broken because of SQLSTATE(08003), ErrorCode(4)
java.sql.SQLNonTransientConnectionException: No current connection.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
~[derby-10.14.2.0.jar:?]
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source) ~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source) ~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.Util.noCurrentConnection(Unknown Source) 
~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.checkIfClosed(Unknown 
Source) ~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.setupContextStack(Unknown 
Source) ~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.rollback(Unknown Source) 
~[derby-10.14.2.0.jar:?]
at 
com.zaxxer.hikari.pool.ProxyConnection.rollback(ProxyConnection.java:362) 
~[HikariCP-2.6.1.jar:?]
at 
com.zaxxer.hikari.pool.HikariProxyConnection.rollback(HikariProxyConnection.java)
 ~[HikariCP-2.6.1.jar:?]
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.rollbackDBConn(TxnHandler.java:3787)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2912) 
~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8440)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_262]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_262]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_262]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at com.sun.proxy.$Proxy58.heartbeat(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3250)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_262]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_262]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_262]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262]
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at com.sun.proxy.$Proxy59.heartbeat(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:665) 
~[hive-exec-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1085)
 ~[hive-exec-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at java.security.AccessController.doPrivileged(Native Method) 
[?:1.8.0_262]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_262]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
 [hadoop-common-3.1.1.7.2.3.0-212.jar:?]
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1084)
 [hive-exec-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 

[jira] [Updated] (HIVE-24320) TestMiniLlapLocal sometimes hangs because of some derby issues

2020-10-28 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24320:

Attachment: 3hr.jstack
3hr.hive.log

> TestMiniLlapLocal sometimes hangs because of some derby issues
> --
>
> Key: HIVE-24320
> URL: https://issues.apache.org/jira/browse/HIVE-24320
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: 3hr.hive.log, 3hr.jstack
>
>
> code in question is a slightly modified version of branch-3
> opening ticket to make notes about the investigation
> {code}
> "dcce5fec-2365-4697-8a8f-04a4dfa5d9f5 main" #1 prio=5 os_prio=0 
> tid=0x7fd7c000a800 nid=0x1de23 waiting on condition [0x7fd7c4b7]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xc61635f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1981)
> at 
> org.apache.derby.impl.services.cache.CacheEntry.waitUntilIdentityIsSet(Unknown
>  Source)
> at 
> org.apache.derby.impl.services.cache.ConcurrentCache.getEntry(Unknown Source)
> at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
> at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
> at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown 
> Source)
> at org.apache.derby.impl.store.access.heap.Heap.open(Unknown Source)
> at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
> Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndexMinion(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSubKeyConstraint(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorViaIndex(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorsScan(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptors(Unknown
>  Source)
> - locked <0xc615c9a8> (a 
> org.apache.derby.iapi.sql.dictionary.ConstraintDescriptorList)
> at 
> org.apache.derby.iapi.sql.dictionary.TableDescriptor.getAllRelevantConstraints(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.compile.DMLModStatementNode.getAllRelevantConstraints(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.compile.DMLModStatementNode.bindConstraints(Unknown 
> Source)
> at org.apache.derby.impl.sql.compile.DeleteNode.bindStatement(Unknown 
> Source)
> at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown 
> Source)
> at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
> at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
> at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> - locked <0xc4bb5fd0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection)
> at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeBatchElement(Unknown Source)
> at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeLargeBatch(Unknown Source)
> - locked <0xc4bb5fd0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection)
> at org.apache.derby.impl.jdbc.EmbedStatement.executeBatch(Unknown 
> Source)
> at 
> com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125)
> at 
> com.zaxxer.hikari.pool.HikariProxyStatement.executeBatch(HikariProxyStatement.java)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.executeQueriesInBatch(TxnDbUtil.java:658)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.updateCommitIdAndCleanUpMetadata(TxnHandler.java:1338)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.commitTxn(TxnHandler.java:1236)
> at 
> 

[jira] [Assigned] (HIVE-24320) TestMiniLlapLocal sometimes hangs because of some derby issues

2020-10-28 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24320:
---


> TestMiniLlapLocal sometimes hangs because of some derby issues
> --
>
> Key: HIVE-24320
> URL: https://issues.apache.org/jira/browse/HIVE-24320
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> code in question is a slightly modified version of branch-3
> opening ticket to make notes about the investigation
> {code}
> "dcce5fec-2365-4697-8a8f-04a4dfa5d9f5 main" #1 prio=5 os_prio=0 
> tid=0x7fd7c000a800 nid=0x1de23 waiting on condition [0x7fd7c4b7]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xc61635f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1981)
> at 
> org.apache.derby.impl.services.cache.CacheEntry.waitUntilIdentityIsSet(Unknown
>  Source)
> at 
> org.apache.derby.impl.services.cache.ConcurrentCache.getEntry(Unknown Source)
> at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
> at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
> at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown 
> Source)
> at org.apache.derby.impl.store.access.heap.Heap.open(Unknown Source)
> at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
> Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndexMinion(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSubKeyConstraint(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorViaIndex(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorsScan(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptors(Unknown
>  Source)
> - locked <0xc615c9a8> (a 
> org.apache.derby.iapi.sql.dictionary.ConstraintDescriptorList)
> at 
> org.apache.derby.iapi.sql.dictionary.TableDescriptor.getAllRelevantConstraints(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.compile.DMLModStatementNode.getAllRelevantConstraints(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.compile.DMLModStatementNode.bindConstraints(Unknown 
> Source)
> at org.apache.derby.impl.sql.compile.DeleteNode.bindStatement(Unknown 
> Source)
> at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown 
> Source)
> at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
> at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
> at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> - locked <0xc4bb5fd0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection)
> at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeBatchElement(Unknown Source)
> at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeLargeBatch(Unknown Source)
> - locked <0xc4bb5fd0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection)
> at org.apache.derby.impl.jdbc.EmbedStatement.executeBatch(Unknown 
> Source)
> at 
> com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125)
> at 
> com.zaxxer.hikari.pool.HikariProxyStatement.executeBatch(HikariProxyStatement.java)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.executeQueriesInBatch(TxnDbUtil.java:658)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.updateCommitIdAndCleanUpMetadata(TxnHandler.java:1338)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.commitTxn(TxnHandler.java:1236)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.commit_txn(HiveMetaStore.java:8315)
> at 

[jira] [Updated] (HIVE-24318) When GlobalLimit is efficient, query will run twice with "Retry query with a different approach..."

2020-10-28 Thread libo (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

libo updated HIVE-24318:

Attachment: HIVE-24318.patch
  Assignee: libo
Status: Patch Available  (was: Open)

> When GlobalLimit is efficient, query will run twice with "Retry query with a 
> different approach..."
> ---
>
> Key: HIVE-24318
> URL: https://issues.apache.org/jira/browse/HIVE-24318
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.1
> Environment: Hadoop 2.6.0
> Hive-2.0.1
>Reporter: libo
>Assignee: libo
>Priority: Minor
> Attachments: HIVE-24318.patch
>
>
> hive.limit.optimize.enable=true
> hive.limit.row.max.size=1000
> hive.limit.optimize.fetch.max=1000
> hive.fetch.task.conversion.threshold=256
> hive.fetch.task.conversion=more
>  
> *sql eg:*
> select db_name,concat(tb_name,'test') from (select * from test1.t3 where 
> dt='0909' limit 10)t1;
> (only partitioned table)
> *console information:*
> Retry query with a different approach...
>  
> *exception stack:*
> org.apache.hadoop.hive.ql.CommandNeedRetryException
>  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2022)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:317)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:232)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:475)
>  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:855)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:794)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:721)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:236)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24307) Beeline with property-file and -e parameter is failing

2020-10-28 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222093#comment-17222093
 ] 

Aasha Medhi commented on HIVE-24307:


+1

> Beeline with property-file and -e parameter is failing
> --
>
> Key: HIVE-24307
> URL: https://issues.apache.org/jira/browse/HIVE-24307
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Beeline query with property file specified with -e parameter fails with :
> {noformat}
> Cannot run commands specified using -e. No current connection
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=505658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505658
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 10:00
Start Date: 28/Oct/20 10:00
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1616:
URL: https://github.com/apache/hive/pull/1616#issuecomment-717826458


   Thanks for the patch @dongjoon-hyun -- can you please reopen the PR as I 
dont see the pre-commit test results at all (I guess they were never triggered)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505658)
Time Spent: 20m  (was: 10m)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-28 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24316:
-

Assignee: Dongjoon Hyun

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-28 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24316:
--
Description: 
This will bring eleven bug fixes.
 * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
 * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24302) Cleaner shouldn't run if it can't remove obsolete files

2020-10-28 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222072#comment-17222072
 ] 

Karen Coppage commented on HIVE-24302:
--

Duplicates HIVE-24291

> Cleaner shouldn't run if it can't remove obsolete files
> ---
>
> Key: HIVE-24302
> URL: https://issues.apache.org/jira/browse/HIVE-24302
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Example:
>  # open txn 5, leave it open (maybe it's a long-running compaction)
>  # insert into table t in txns 6, 7 with writeids 1, 2
>  # compactor.Worker runs on table t and compacts writeids 1, 2
>  # compactor.Cleaner picks up the compaction queue entry, but doesn't delete 
> any files because the min global open txnid is 5, which cannot see writeIds 
> 1, 2.
>  # Cleaner marks the compactor queue entry as cleaned and removes the entry 
> from the queue.
> delta_1 and delta_2 will remain in the file system until another compaction 
> is run on table t.
> Step 5 should not happen, we should skip calling markCleaned() and leave it 
> in the queue in "ready to clean" state. MarkCleaned() should be called only 
> after txn 5 is closed and, following that, the cleaner runs successfully.
> This will potentially slow down the cleaner, but on the other hand it won't 
> silently "fail" i.e. not do its job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24291?focusedWorklogId=505651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505651
 ]

ASF GitHub Bot logged work on HIVE-24291:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 09:44
Start Date: 28/Oct/20 09:44
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1592:
URL: https://github.com/apache/hive/pull/1592#issuecomment-717817745


   @deniskuzZ Can I ask you for a review?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505651)
Time Spent: 0.5h  (was: 20m)

> Compaction Cleaner prematurely cleans up deltas
> ---
>
> Key: HIVE-24291
> URL: https://issues.apache.org/jira/browse/HIVE-24291
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since HIVE-23107 the cleaner can clean up deltas that are still used by 
> running queries.
> Example:
>  * TxnId 1-5 writes to a partition, all commits
>  * Compactor starts with txnId=6
>  * Long running query starts with txnId=7, it sees txnId=6 as open in its 
> snapshot
>  * Compaction commits
>  * Cleaner runs
> Previously min_history_level table would have prevented the Cleaner to delete 
> the deltas1-5 until txnId=7 is open, but now they will be deleted and the 
> long running query may fail if its tries to access the files.
> Solution could be to not run the cleaner until any txn is open that was 
> opened before the compaction was committed (CQ_NEXT_TXN_ID)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24108) AddToClassPathAction should use TezClassLoader

2020-10-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222066#comment-17222066
 ] 

László Bodor commented on HIVE-24108:
-

thanks [~harishjp]! could you please review [~rajesh.balamohan], [~ashutoshc]? 
this is probably the last tez 0.10 to hive blocker

> AddToClassPathAction should use TezClassLoader
> --
>
> Key: HIVE-24108
> URL: https://issues.apache.org/jira/browse/HIVE-24108
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: HIVE-24108.01.patch, HIVE-24108.02.patch, 
> hive_log_llap.log
>
>
> TEZ-4228 fixes an issue from tez side, which is about to use TezClassLoader 
> instead of the system classloader. However, there are some codepaths, e.g. in 
>  [^hive_log_llap.log]  which shows that the system class loader is used. As 
> thread context classloaders are inherited, the easier solution is to 
> early-initialize TezClassLoader in LlapDaemon, and let all threads use that 
> as context class loader, so this solution is more like TEZ-4223 for llap 
> daemons.
> {code}
> 2020-09-02T00:18:20,242 ERROR [TezTR-93696_1_1_1_0_0] tez.TezProcessor: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:332)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:427)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:288)
>   ... 16 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:79)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:100)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:313)
>   ... 18 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
>   ... 21 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24302) Cleaner shouldn't run if it can't remove obsolete files

2020-10-28 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-24302.
--
Resolution: Duplicate

> Cleaner shouldn't run if it can't remove obsolete files
> ---
>
> Key: HIVE-24302
> URL: https://issues.apache.org/jira/browse/HIVE-24302
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Example:
>  # open txn 5, leave it open (maybe it's a long-running compaction)
>  # insert into table t in txns 6, 7 with writeids 1, 2
>  # compactor.Worker runs on table t and compacts writeids 1, 2
>  # compactor.Cleaner picks up the compaction queue entry, but doesn't delete 
> any files because the min global open txnid is 5, which cannot see writeIds 
> 1, 2.
>  # Cleaner marks the compactor queue entry as cleaned and removes the entry 
> from the queue.
> delta_1 and delta_2 will remain in the file system until another compaction 
> is run on table t.
> Step 5 should not happen, we should skip calling markCleaned() and leave it 
> in the queue in "ready to clean" state. MarkCleaned() should be called only 
> after txn 5 is closed and, following that, the cleaner runs successfully.
> This will potentially slow down the cleaner, but on the other hand it won't 
> silently "fail" i.e. not do its job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24108) AddToClassPathAction should use TezClassLoader

2020-10-28 Thread Harish JP (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222055#comment-17222055
 ] 

Harish JP commented on HIVE-24108:
--

Thanks [~abstractdog]. LGTM +1, non binding.

> AddToClassPathAction should use TezClassLoader
> --
>
> Key: HIVE-24108
> URL: https://issues.apache.org/jira/browse/HIVE-24108
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: HIVE-24108.01.patch, HIVE-24108.02.patch, 
> hive_log_llap.log
>
>
> TEZ-4228 fixes an issue from tez side, which is about to use TezClassLoader 
> instead of the system classloader. However, there are some codepaths, e.g. in 
>  [^hive_log_llap.log]  which shows that the system class loader is used. As 
> thread context classloaders are inherited, the easier solution is to 
> early-initialize TezClassLoader in LlapDaemon, and let all threads use that 
> as context class loader, so this solution is more like TEZ-4223 for llap 
> daemons.
> {code}
> 2020-09-02T00:18:20,242 ERROR [TezTR-93696_1_1_1_0_0] tez.TezProcessor: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:332)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:427)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:288)
>   ... 16 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:79)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:100)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:313)
>   ... 18 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.serde2.TestSerDe
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
>   ... 21 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24173) notification cleanup interval value changes depending upon replication enabled or not.

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24173?focusedWorklogId=505634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505634
 ]

ASF GitHub Bot logged work on HIVE-24173:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 08:43
Start Date: 28/Oct/20 08:43
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1593:
URL: https://github.com/apache/hive/pull/1593#discussion_r513266229



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -522,6 +522,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 REPLCMINTERVAL("hive.repl.cm.interval","3600s",
 new TimeValidator(TimeUnit.SECONDS),
 "Inteval for cmroot cleanup thread."),
+REPL_EVENT_DB_LISTENER_TTL("hive.repl.event.db.listener.timetolive", 10, 
new TimeValidator(TimeUnit.DAYS),

Review comment:
   Set the hive.repl.cm.retain to 7 days in Metastore conf also





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505634)
Time Spent: 40m  (was: 0.5h)

> notification cleanup interval value changes depending upon replication 
> enabled or not.
> --
>
> Key: HIVE-24173
> URL: https://issues.apache.org/jira/browse/HIVE-24173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently we use hive.metastore.event.db.listener.timetolive to determine how 
> long the events are stored in rdbms backing hms. We should have another 
> configuration for the same purpose in context of replication so that we have 
> longer time configured for that otherwise we can default to a 1 day.
> hive.repl.cm.enabled can be used to identify if replication is enabled or 
> not. if enabled use the new configuration property to determine ttl for 
> events in rdbms else use hive.metastore.event.db.listener.timetolive for ttl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24294) TezSessionPool sessions can throw AssertionError

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24294?focusedWorklogId=505604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505604
 ]

ASF GitHub Bot logged work on HIVE-24294:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 07:44
Start Date: 28/Oct/20 07:44
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #1596:
URL: https://github.com/apache/hive/pull/1596#issuecomment-717759487


   @nareshpr Thanks for the patch. Merged it to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505604)
Time Spent: 40m  (was: 0.5h)

> TezSessionPool sessions can throw AssertionError
> 
>
> Key: HIVE-24294
> URL: https://issues.apache.org/jira/browse/HIVE-24294
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Whenever default TezSessionPool sessions are reopened for some reason, we are 
> setting dagResources to null before close & setting it back in openWhenever 
> default TezSessionPool sessions are reopened for some reason, we are setting 
> dagResources to null before close & setting it back in open
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
> If there is an exception in sessionState.close(), we are not restoring the 
> dagResource but moving the session back to TezSessionPool.eg., exception 
> trace when sessionState.close() failed
> {code:java}
> 2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: 
> client.TezClient (:()) - Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, 
> diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, 
> sessionTimeoutInterval=60 ms
> Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0   
>  at 
> org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
> at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code}
> Because of this, all new queries using this corrupted sessions are failing 
> with below exception
> {code:java}
> Caused by: java.lang.AssertionError: Ensure called on an unitialized (or 
> closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: 
> java.lang.AssertionError: Ensure called on an unitialized (or closed) session 
> 41774265-b7da-4d58-84a8-1bedfd597aec at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24294) TezSessionPool sessions can throw AssertionError

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24294?focusedWorklogId=505603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505603
 ]

ASF GitHub Bot logged work on HIVE-24294:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 07:44
Start Date: 28/Oct/20 07:44
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #1596:
URL: https://github.com/apache/hive/pull/1596


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505603)
Time Spent: 0.5h  (was: 20m)

> TezSessionPool sessions can throw AssertionError
> 
>
> Key: HIVE-24294
> URL: https://issues.apache.org/jira/browse/HIVE-24294
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Whenever default TezSessionPool sessions are reopened for some reason, we are 
> setting dagResources to null before close & setting it back in openWhenever 
> default TezSessionPool sessions are reopened for some reason, we are setting 
> dagResources to null before close & setting it back in open
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
> If there is an exception in sessionState.close(), we are not restoring the 
> dagResource but moving the session back to TezSessionPool.eg., exception 
> trace when sessionState.close() failed
> {code:java}
> 2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: 
> client.TezClient (:()) - Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, 
> diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, 
> sessionTimeoutInterval=60 ms
> Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0   
>  at 
> org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
> at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code}
> Because of this, all new queries using this corrupted sessions are failing 
> with below exception
> {code:java}
> Caused by: java.lang.AssertionError: Ensure called on an unitialized (or 
> closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: 
> java.lang.AssertionError: Ensure called on an unitialized (or closed) session 
> 41774265-b7da-4d58-84a8-1bedfd597aec at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24305) avro decimal schema is not properly populating scale/precision if value is enclosed in quote

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24305?focusedWorklogId=505602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505602
 ]

ASF GitHub Bot logged work on HIVE-24305:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 07:39
Start Date: 28/Oct/20 07:39
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #1601:
URL: https://github.com/apache/hive/pull/1601#discussion_r513232453



##
File path: ql/src/test/queries/clientnegative/avro_decimal.q
##
@@ -12,6 +12,6 @@ OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
   'numFiles'='1',
-  
'avro.schema.literal'='{\"namespace\":\"com.howdy\",\"name\":\"some_schema\",\"type\":\"record\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"value\",\"type\":{\"type\":\"bytes\",\"logicalType\":\"decimal\",\"precision\":"5",\"scale\":"2"}}]}'
+  
'avro.schema.literal'='{\"namespace\":\"com.howdy\",\"name\":\"some_schema\",\"type\":\"record\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"value\",\"type\":{\"type\":\"bytes\",\"logicalType\":\"decimal\",\"precision\":"a",\"scale\":"b"}}]}'

Review comment:
   What happens if the precision or scale numbers are negative? Could you 
please add some q tests to cover that scenario as well?

##
File path: 
serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java
##
@@ -186,6 +188,20 @@ public static TypeInfo generateTypeInfo(Schema schema,
 return typeInfoCache.retrieve(schema, seenSchemas);
   }
 
+  private static int getIntValue(JsonNode jsonNode) {
+int value = 0;
+if (jsonNode instanceof TextNode) {
+  try {

Review comment:
   Nit: Instead of using a try-catch block, you could use 
StringUtils.isNumeric() to determine if a string is a positive decimal number. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505602)
Time Spent: 20m  (was: 10m)

> avro decimal schema is not properly populating scale/precision if value is 
> enclosed in quote
> 
>
> Key: HIVE-24305
> URL: https://issues.apache.org/jira/browse/HIVE-24305
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> CREATE TABLE test_quoted_scale_precision STORED AS AVRO TBLPROPERTIES 
> ('avro.schema.literal'='{"type":"record","name":"DecimalTest","namespace":"com.example.test","fields":[{"name":"Decimal24_6","type":["null",{"type":"bytes","logicalType":"decimal","precision":24,"scale":"6"}]}]}');
>  
> desc test_quoted_scale_precision;
> // current output
> decimal24_6 decimal(24,0)
> // expected output
> decimal24_6 decimal(24,6){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24317) External Table is not replicated for Cloud store (e.g. Microsoft ADLS Gen2)

2020-10-28 Thread Nikhil Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Gupta reassigned HIVE-24317:
---


> External Table is not replicated for Cloud store (e.g. Microsoft ADLS Gen2)
> ---
>
> Key: HIVE-24317
> URL: https://issues.apache.org/jira/browse/HIVE-24317
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Minor
>
> External Table is not replicated properly because of distcp options. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24306) Launch single copy task for single batch of partitions in repl load for managed table

2020-10-28 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24306:
---
Description: 
For data dumped in staging location, we will run a single distcp at the table 
level for all partitions as the data is already present in the staging location.

For _files case where data is on source cluster and staging just has the file 
list, distcp is executed at the each file level. This is to take care of the cm 
case where we need the full path and encoded path(for cm). If the table is 
dropped, table level distcp will fail. 

This patch takes care of single copy for staging data.
However to run single distcp at the table level, file listing in distcp might 
lead  to OOM if the number of files are too high. So it needs to be fixed at 
the distcp level before committing this patch.

> Launch single copy task for single batch of partitions in repl load for 
> managed table
> -
>
> Key: HIVE-24306
> URL: https://issues.apache.org/jira/browse/HIVE-24306
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24306.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For data dumped in staging location, we will run a single distcp at the table 
> level for all partitions as the data is already present in the staging 
> location.
> For _files case where data is on source cluster and staging just has the file 
> list, distcp is executed at the each file level. This is to take care of the 
> cm case where we need the full path and encoded path(for cm). If the table is 
> dropped, table level distcp will fail. 
> This patch takes care of single copy for staging data.
> However to run single distcp at the table level, file listing in distcp might 
> lead  to OOM if the number of files are too high. So it needs to be fixed at 
> the distcp level before committing this patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24306) Launch single copy task for single batch of partitions in repl load for managed table

2020-10-28 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24306:
---
Attachment: HIVE-24306.01.patch
Status: Patch Available  (was: In Progress)

> Launch single copy task for single batch of partitions in repl load for 
> managed table
> -
>
> Key: HIVE-24306
> URL: https://issues.apache.org/jira/browse/HIVE-24306
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24306.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24306) Launch single copy task for single batch of partitions in repl load for managed table

2020-10-28 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24306 started by Aasha Medhi.
--
> Launch single copy task for single batch of partitions in repl load for 
> managed table
> -
>
> Key: HIVE-24306
> URL: https://issues.apache.org/jira/browse/HIVE-24306
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24306.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24302) Cleaner shouldn't run if it can't remove obsolete files

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24302?focusedWorklogId=505581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505581
 ]

ASF GitHub Bot logged work on HIVE-24302:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 06:53
Start Date: 28/Oct/20 06:53
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1612:
URL: https://github.com/apache/hive/pull/1612#issuecomment-717738838


   This is solved here: https://github.com/apache/hive/pull/1592 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505581)
Time Spent: 0.5h  (was: 20m)

> Cleaner shouldn't run if it can't remove obsolete files
> ---
>
> Key: HIVE-24302
> URL: https://issues.apache.org/jira/browse/HIVE-24302
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Example:
>  # open txn 5, leave it open (maybe it's a long-running compaction)
>  # insert into table t in txns 6, 7 with writeids 1, 2
>  # compactor.Worker runs on table t and compacts writeids 1, 2
>  # compactor.Cleaner picks up the compaction queue entry, but doesn't delete 
> any files because the min global open txnid is 5, which cannot see writeIds 
> 1, 2.
>  # Cleaner marks the compactor queue entry as cleaned and removes the entry 
> from the queue.
> delta_1 and delta_2 will remain in the file system until another compaction 
> is run on table t.
> Step 5 should not happen, we should skip calling markCleaned() and leave it 
> in the queue in "ready to clean" state. MarkCleaned() should be called only 
> after txn 5 is closed and, following that, the cleaner runs successfully.
> This will potentially slow down the cleaner, but on the other hand it won't 
> silently "fail" i.e. not do its job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)