[jira] [Updated] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-09-16 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24069:
---
Priority: Major  (was: Minor)

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo

2020-09-16 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-24063:
--

Assignee: Zhihua Deng

> SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
> ---
>
> Key: HIVE-24063
> URL: https://issues.apache.org/jira/browse/HIVE-24063
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the current SqlOperator is SqlCastFunction, 
> FunctionRegistry.getFunctionInfo would return null, 
> but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to 
> metastore for the function definition,  an exception stack trace can be seen 
> here in HiveServer2 log:
> INFO exec.FunctionRegistry: Unable to look up default.cast in metastore
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> NoSuchObjectException(message:Function @hive#default.cast does not exist)
>  at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] 
>  
> So it's may be better to handle explicit cast before geting the FunctionInfo 
> from Registry. Even if there is no cast in the query,  the method 
> handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-09-16 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24044:
---
Priority: Major  (was: Minor)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-09-16 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-24069:
--

Assignee: Zhihua Deng

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-09-16 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-24044:
--

Assignee: Zhihua Deng

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23871?focusedWorklogId=485478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485478
 ]

ASF GitHub Bot logged work on HIVE-23871:
-

Author: ASF GitHub Bot
Created on: 17/Sep/20 00:46
Start Date: 17/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1273:
URL: https://github.com/apache/hive/pull/1273#issuecomment-693741003


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 485478)
Time Spent: 1h 10m  (was: 1h)

> ObjectStore should properly handle MicroManaged Table properties
> 
>
> Key: HIVE-23871
> URL: https://issues.apache.org/jira/browse/HIVE-23871
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: table1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore 
> by skipping particular Table properties like SkewInfo, bucketCols, ordering 
> etc.
>  However, it does that for all Transactional Tables – not only ACID – causing 
> MicroManaged Tables to behave abnormally.
>  MicroManaged (insert_only) tables may miss needed properties such as Storage 
> Desc Params – that may define how lines are delimited (like in the example 
> below):
> To repro the issue:
> {code:java}
> CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
> LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans;
> describe formatted delim_table_trans;
> SELECT * FROM delim_table_trans;
> {code}
> Result:
> {code:java}
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   bucketing_version   2   
>   numFiles1   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   72  
>   transactional   true
>   transactional_propertiesinsert_only 
>  A masked pattern was here 
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> PREHOOK: query: SELECT * FROM delim_table_trans
> PREHOOK: type: QUERY
> PREHOOK: Input: default@delim_table_trans
>  A masked pattern was here 
> POSTHOOK: query: SELECT * FROM delim_table_trans
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@delim_table_trans
>  A masked pattern was here 
> NULL  NULLNULL
> NULL  NULLNULL
> NULL  NULLNULL
> NULL  NULLNULL
> NULL  NULLNULL
> NULL  NULLNULL
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485341&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485341
 ]

ASF GitHub Bot logged work on HIVE-24154:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 19:56
Start Date: 16/Sep/20 19:56
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1492:
URL: https://github.com/apache/hive/pull/1492#discussion_r489718951



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
##
@@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) {
 }
 
 @Override public RexNode visitCall(RexCall call) {
-  final RexNode node;
-  final List operands;
-  final List newOperands;
-  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
   switch (call.getKind()) {
 case AND:
-  // IN clauses need to be combined by keeping only common elements
-  operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
-  for (int i = 0; i < operands.size(); i++) {
-RexNode operand = operands.get(i);
-if (operand.getKind() == SqlKind.IN) {
-  RexCall inCall = (RexCall) operand;
-  if 
(!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
-continue;
-  }
-  RexNode ref = inCall.getOperands().get(0);
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-Set expressions = Sets.newHashSet();
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  expressions.add(inCall.getOperands().get(j));
-}
-inLHSExprToRHSExprs.get(ref).retainAll(expressions);
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j));
-}
-  }
-  operands.remove(i);
-  --i;
-} else if (operand.getKind() == SqlKind.EQUALS) {
-  Constraint c = Constraint.of(operand);
-  if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) {
-continue;
+  return handleAND(rexBuilder, call);
+case OR:
+  return handleOR(rexBuilder, call);
+default:
+  return super.visitCall(call);
+  }
+}
+
+private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) {
+  // IN clauses need to be combined by keeping only common elements
+  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+  // We will use this set to keep those expressions that may evaluate
+  // into a null value.
+  final Multimap inLHSExprToRHSNullableExprs = 
LinkedHashMultimap.create();
+  final List operands = new 
ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
+  for (int i = 0; i < operands.size(); i++) {
+RexNode operand = operands.get(i);
+if (operand.getKind() == SqlKind.IN) {
+  RexCall inCall = (RexCall) operand;
+  if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+continue;
+  }
+  RexNode ref = inCall.getOperands().get(0);
+  if (ref.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, ref);
+  }
+  if (inLHSExprToRHSExprs.containsKey(ref)) {
+Set expressions = Sets.newHashSet();
+for (int j = 1; j < inCall.getOperands().size(); j++) {
+  RexNode constNode = inCall.getOperands().get(j);
+  expressions.add(constNode);
+  if (constNode.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, constNode);
   }
-  RexNode ref = c.exprNode;
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-
inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode));
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-inLHSExprToRHSExprs.put(ref, c.co

[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485246
 ]

ASF GitHub Bot logged work on HIVE-24154:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 16:50
Start Date: 16/Sep/20 16:50
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1492:
URL: https://github.com/apache/hive/pull/1492#discussion_r489581662



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
##
@@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) {
 }
 
 @Override public RexNode visitCall(RexCall call) {
-  final RexNode node;
-  final List operands;
-  final List newOperands;
-  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
   switch (call.getKind()) {
 case AND:
-  // IN clauses need to be combined by keeping only common elements
-  operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
-  for (int i = 0; i < operands.size(); i++) {
-RexNode operand = operands.get(i);
-if (operand.getKind() == SqlKind.IN) {
-  RexCall inCall = (RexCall) operand;
-  if 
(!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
-continue;
-  }
-  RexNode ref = inCall.getOperands().get(0);
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-Set expressions = Sets.newHashSet();
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  expressions.add(inCall.getOperands().get(j));
-}
-inLHSExprToRHSExprs.get(ref).retainAll(expressions);
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j));
-}
-  }
-  operands.remove(i);
-  --i;
-} else if (operand.getKind() == SqlKind.EQUALS) {
-  Constraint c = Constraint.of(operand);
-  if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) {
-continue;
+  return handleAND(rexBuilder, call);
+case OR:
+  return handleOR(rexBuilder, call);
+default:
+  return super.visitCall(call);
+  }
+}
+
+private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) {
+  // IN clauses need to be combined by keeping only common elements
+  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+  // We will use this set to keep those expressions that may evaluate
+  // into a null value.
+  final Multimap inLHSExprToRHSNullableExprs = 
LinkedHashMultimap.create();
+  final List operands = new 
ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
+  for (int i = 0; i < operands.size(); i++) {
+RexNode operand = operands.get(i);
+if (operand.getKind() == SqlKind.IN) {
+  RexCall inCall = (RexCall) operand;
+  if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+continue;
+  }
+  RexNode ref = inCall.getOperands().get(0);
+  if (ref.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, ref);
+  }
+  if (inLHSExprToRHSExprs.containsKey(ref)) {
+Set expressions = Sets.newHashSet();
+for (int j = 1; j < inCall.getOperands().size(); j++) {
+  RexNode constNode = inCall.getOperands().get(j);
+  expressions.add(constNode);
+  if (constNode.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, constNode);
   }
-  RexNode ref = c.exprNode;
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-
inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode));
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-inLHSExprToRHSExprs.put(ref, c.co

[jira] [Updated] (HIVE-24169) HiveServer2 UDF cache

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24169:
--
Labels: pull-request-available  (was: )

> HiveServer2 UDF cache
> -
>
> Key: HIVE-24169
> URL: https://issues.apache.org/jira/browse/HIVE-24169
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access 
> in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=485232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485232
 ]

ASF GitHub Bot logged work on HIVE-24169:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 16:25
Start Date: 16/Sep/20 16:25
Worklog Time Spent: 10m 
  Work Description: sam-an-cloudera opened a new pull request #1503:
URL: https://github.com/apache/hive/pull/1503


   
   ### What changes were proposed in this pull request?
   This feature adds a HiveServer2 level UDF cache that can be shared across 
sessions. 
   
   
   
   ### Why are the changes needed?
   Without this feature, on S3 based system, describe function and select udf() 
will be localized each and every time, taking 2 minutes to download 300mb. 
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   manual
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 485232)
Remaining Estimate: 0h
Time Spent: 10m

> HiveServer2 UDF cache
> -
>
> Key: HIVE-24169
> URL: https://issues.apache.org/jira/browse/HIVE-24169
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access 
> in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485191&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485191
 ]

ASF GitHub Bot logged work on HIVE-24154:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 14:53
Start Date: 16/Sep/20 14:53
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1492:
URL: https://github.com/apache/hive/pull/1492#discussion_r489501206



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
##
@@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) {
 }
 
 @Override public RexNode visitCall(RexCall call) {
-  final RexNode node;
-  final List operands;
-  final List newOperands;
-  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
   switch (call.getKind()) {
 case AND:
-  // IN clauses need to be combined by keeping only common elements
-  operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
-  for (int i = 0; i < operands.size(); i++) {
-RexNode operand = operands.get(i);
-if (operand.getKind() == SqlKind.IN) {
-  RexCall inCall = (RexCall) operand;
-  if 
(!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
-continue;
-  }
-  RexNode ref = inCall.getOperands().get(0);
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-Set expressions = Sets.newHashSet();
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  expressions.add(inCall.getOperands().get(j));
-}
-inLHSExprToRHSExprs.get(ref).retainAll(expressions);
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j));
-}
-  }
-  operands.remove(i);
-  --i;
-} else if (operand.getKind() == SqlKind.EQUALS) {
-  Constraint c = Constraint.of(operand);
-  if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) {
-continue;
+  return handleAND(rexBuilder, call);
+case OR:
+  return handleOR(rexBuilder, call);
+default:
+  return super.visitCall(call);
+  }
+}
+
+private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) {
+  // IN clauses need to be combined by keeping only common elements
+  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+  // We will use this set to keep those expressions that may evaluate
+  // into a null value.
+  final Multimap inLHSExprToRHSNullableExprs = 
LinkedHashMultimap.create();
+  final List operands = new 
ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
+  for (int i = 0; i < operands.size(); i++) {
+RexNode operand = operands.get(i);
+if (operand.getKind() == SqlKind.IN) {
+  RexCall inCall = (RexCall) operand;
+  if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+continue;
+  }
+  RexNode ref = inCall.getOperands().get(0);
+  if (ref.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, ref);
+  }
+  if (inLHSExprToRHSExprs.containsKey(ref)) {
+Set expressions = Sets.newHashSet();
+for (int j = 1; j < inCall.getOperands().size(); j++) {
+  RexNode constNode = inCall.getOperands().get(j);
+  expressions.add(constNode);
+  if (constNode.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, constNode);
   }
-  RexNode ref = c.exprNode;
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-
inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode));
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-inLHSExprToRHSExprs.put(ref, c.co

[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485190
 ]

ASF GitHub Bot logged work on HIVE-24154:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 14:53
Start Date: 16/Sep/20 14:53
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1492:
URL: https://github.com/apache/hive/pull/1492#discussion_r489501206



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
##
@@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) {
 }
 
 @Override public RexNode visitCall(RexCall call) {
-  final RexNode node;
-  final List operands;
-  final List newOperands;
-  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
   switch (call.getKind()) {
 case AND:
-  // IN clauses need to be combined by keeping only common elements
-  operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
-  for (int i = 0; i < operands.size(); i++) {
-RexNode operand = operands.get(i);
-if (operand.getKind() == SqlKind.IN) {
-  RexCall inCall = (RexCall) operand;
-  if 
(!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
-continue;
-  }
-  RexNode ref = inCall.getOperands().get(0);
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-Set expressions = Sets.newHashSet();
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  expressions.add(inCall.getOperands().get(j));
-}
-inLHSExprToRHSExprs.get(ref).retainAll(expressions);
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j));
-}
-  }
-  operands.remove(i);
-  --i;
-} else if (operand.getKind() == SqlKind.EQUALS) {
-  Constraint c = Constraint.of(operand);
-  if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) {
-continue;
+  return handleAND(rexBuilder, call);
+case OR:
+  return handleOR(rexBuilder, call);
+default:
+  return super.visitCall(call);
+  }
+}
+
+private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) {
+  // IN clauses need to be combined by keeping only common elements
+  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+  // We will use this set to keep those expressions that may evaluate
+  // into a null value.
+  final Multimap inLHSExprToRHSNullableExprs = 
LinkedHashMultimap.create();
+  final List operands = new 
ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
+  for (int i = 0; i < operands.size(); i++) {
+RexNode operand = operands.get(i);
+if (operand.getKind() == SqlKind.IN) {
+  RexCall inCall = (RexCall) operand;
+  if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+continue;
+  }
+  RexNode ref = inCall.getOperands().get(0);
+  if (ref.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, ref);
+  }
+  if (inLHSExprToRHSExprs.containsKey(ref)) {
+Set expressions = Sets.newHashSet();
+for (int j = 1; j < inCall.getOperands().size(); j++) {
+  RexNode constNode = inCall.getOperands().get(j);
+  expressions.add(constNode);
+  if (constNode.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, constNode);
   }
-  RexNode ref = c.exprNode;
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-
inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode));
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-inLHSExprToRHSExprs.put(ref, c.co

[jira] [Work logged] (HIVE-24168) Disable hdfsEncryptionShims cache during query-based compaction

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24168?focusedWorklogId=485177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485177
 ]

ASF GitHub Bot logged work on HIVE-24168:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 14:16
Start Date: 16/Sep/20 14:16
Worklog Time Spent: 10m 
  Work Description: zchovan commented on a change in pull request #1501:
URL: https://github.com/apache/hive/pull/1501#discussion_r489471299



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -4475,7 +4481,8 @@ public static boolean moveFile(final HiveConf conf, Path 
srcf, final Path destf,
 destFs.copyFromLocalFile(srcf, destf);
 return true;
   } else {
-if (needToCopy(conf, srcf, destf, srcFs, destFs, configuredOwner, 
isManaged)) {
+if (needToCopy(conf, srcf, destf, srcFs, destFs, configuredOwner, 
isManaged, isCompactionTable,
+isMmCompactionTable)) {
   //copy if across file system or encryption zones.
   LOG.debug("Copying source " + srcf + " to " + destf + " because HDFS 
encryption zones are different.");

Review comment:
   Would it make sense to include the hashes for the fs instances in the 
debug logs? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 485177)
Time Spent: 40m  (was: 0.5h)

> Disable hdfsEncryptionShims cache during query-based compaction
> ---
>
> Key: HIVE-24168
> URL: https://issues.apache.org/jira/browse/HIVE-24168
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hive keeps a cache of encryption shims in SessionState (Map HadoopShims.HdfsEncryptionShim> hdfsEncryptionShims). Each encryption shim in 
> the cache stores a FileSystem object.
> After compaction where the session user is not the same user as the owner of 
> the partition/table directory, we close all FileSystem objects associated 
> with the user running the compaction, possibly closing an FS stored in the 
> encryption shim cache. The next time query-based compaction is run on a 
> table/partition owned by the same user, compaction will fail in MoveTask[1] 
> since the FileSystem stored in the cache was closed.
> This change disables the cache during query-based compaction (optionally; 
> default: disabled).
> [1] Error:
> {code:java}
> 2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: 
> [rncdpdev-2.fyre.ibm.com-27]: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Filesystem 
> closed. org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Filesystem closed
>   at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637)
>   at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147)
>   at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477)
>   at 
> org.apache.hadoop.hive.ql.DriverUtils.runOnDriver(DriverUtils.java:70)
>   at 
> org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:116)
>   at 
> org.apache.hadoop.hive.ql.txn.compactor.MmMajorQueryCompactor.runCompaction(MmMajorQueryCompactor.java:72)
>   at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:232)
>   at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:221)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security

[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485132
 ]

ASF GitHub Bot logged work on HIVE-24154:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 13:05
Start Date: 16/Sep/20 13:05
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1492:
URL: https://github.com/apache/hive/pull/1492#discussion_r489412449



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
##
@@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) {
 }
 
 @Override public RexNode visitCall(RexCall call) {
-  final RexNode node;
-  final List operands;
-  final List newOperands;
-  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
   switch (call.getKind()) {
 case AND:
-  // IN clauses need to be combined by keeping only common elements
-  operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
-  for (int i = 0; i < operands.size(); i++) {
-RexNode operand = operands.get(i);
-if (operand.getKind() == SqlKind.IN) {
-  RexCall inCall = (RexCall) operand;
-  if 
(!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
-continue;
-  }
-  RexNode ref = inCall.getOperands().get(0);
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-Set expressions = Sets.newHashSet();
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  expressions.add(inCall.getOperands().get(j));
-}
-inLHSExprToRHSExprs.get(ref).retainAll(expressions);
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-for (int j = 1; j < inCall.getOperands().size(); j++) {
-  inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j));
-}
-  }
-  operands.remove(i);
-  --i;
-} else if (operand.getKind() == SqlKind.EQUALS) {
-  Constraint c = Constraint.of(operand);
-  if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) {
-continue;
+  return handleAND(rexBuilder, call);
+case OR:
+  return handleOR(rexBuilder, call);
+default:
+  return super.visitCall(call);
+  }
+}
+
+private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) {
+  // IN clauses need to be combined by keeping only common elements
+  final Multimap inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+  // We will use this set to keep those expressions that may evaluate
+  // into a null value.
+  final Multimap inLHSExprToRHSNullableExprs = 
LinkedHashMultimap.create();
+  final List operands = new 
ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
+  for (int i = 0; i < operands.size(); i++) {
+RexNode operand = operands.get(i);
+if (operand.getKind() == SqlKind.IN) {
+  RexCall inCall = (RexCall) operand;
+  if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+continue;
+  }
+  RexNode ref = inCall.getOperands().get(0);
+  if (ref.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, ref);
+  }
+  if (inLHSExprToRHSExprs.containsKey(ref)) {
+Set expressions = Sets.newHashSet();
+for (int j = 1; j < inCall.getOperands().size(); j++) {
+  RexNode constNode = inCall.getOperands().get(j);
+  expressions.add(constNode);
+  if (constNode.getType().isNullable()) {
+inLHSExprToRHSNullableExprs.put(ref, constNode);
   }
-  RexNode ref = c.exprNode;
-  if (inLHSExprToRHSExprs.containsKey(ref)) {
-
inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode));
-if (!inLHSExprToRHSExprs.containsKey(ref)) {
-  // Note that Multimap does not keep a key if all its values 
are removed.
-  // Hence, since there are no common expressions and it is 
within an AND,
-  // we should return false
-  return rexBuilder.makeLiteral(false);
-}
-  } else {
-inLHSExprToRHSExprs.put(ref, c.co

[jira] [Assigned] (HIVE-24172) Fix TestMmCompactorOnMr

2020-09-16 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-24172:


Assignee: Karen Coppage

> Fix TestMmCompactorOnMr
> ---
>
> Key: HIVE-24172
> URL: https://issues.apache.org/jira/browse/HIVE-24172
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Karen Coppage
>Priority: Major
>
> test is unstable;
> http://ci.hive.apache.org/job/hive-flaky-check/112/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24170) Add UDF resources explicitly to the classpath while handling drop function event during load.

2020-09-16 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196928#comment-17196928
 ] 

Aasha Medhi commented on HIVE-24170:


+1

> Add UDF resources explicitly to the classpath while handling drop function 
> event during load.
> -
>
> Key: HIVE-24170
> URL: https://issues.apache.org/jira/browse/HIVE-24170
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24170.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24158) Cleanup isn't complete in OrcFileMergeOperator#closeOp

2020-09-16 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-24158:
-
Fix Version/s: 4.0.0

> Cleanup isn't complete in OrcFileMergeOperator#closeOp
> --
>
> Key: HIVE-24158
> URL: https://issues.apache.org/jira/browse/HIVE-24158
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Field Map outWriters isn't cleared during operation close:
> {code:java}
> if (outWriters != null) {
> for (Map.Entry outWriterEntry : outWriters.entrySet()) {
>  Writer outWriter = outWriterEntry.getValue();
>  outWriter.close();
>  outWriter = null;
> }
>    }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24162) Query based compaction looses bloom filter

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24162?focusedWorklogId=485100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485100
 ]

ASF GitHub Bot logged work on HIVE-24162:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 11:50
Start Date: 16/Sep/20 11:50
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #1498:
URL: https://github.com/apache/hive/pull/1498


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 485100)
Time Spent: 1h  (was: 50m)

> Query based compaction looses bloom filter
> --
>
> Key: HIVE-24162
> URL: https://issues.apache.org/jira/browse/HIVE-24162
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
>   
> {noformat}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `bloomTest`(  |
> |   `msisdn` string, |
> |   `imsi` varchar(20),  |
> |   `imei` bigint,   |
> |   `cell_id` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'orc.bloom.filter.columns'='msisdn,cell_id,imsi',  |
> |   'orc.bloom.filter.fpp'='0.02',   |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1597222946')|
> ++
> insert into  bloomTest values ("a", "b", 10, 20);
> insert into  bloomTest values ("aa", "bb", 100, 200);
> insert into  bloomTest values ("aaa", "bbb", 1000, 2000);
> select * from bloomTest;
> +---+-+-++
> | bloomtest.msisdn  | bloomtest.imsi  | bloomtest.imei  | bloomtest.cell_id  |
> +---+-+-++
> | a | b   | 10  | 20 |
> | aa| bb  | 100 | 200|
> | aaa   | bbb | 1000| 2000   |
> +---+-+-++
> {noformat}
>  - Compact the table
> {code:java}
> alter table bloomTest compact 'MAJOR';
> {code}
>  - Wait for the compaction to be over and check for bloom filters in dataset.
>   
>  - delta would have it, but not in the base dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24162) Query based compaction looses bloom filter

2020-09-16 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-24162.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks [~pvargacl] for the patch!

> Query based compaction looses bloom filter
> --
>
> Key: HIVE-24162
> URL: https://issues.apache.org/jira/browse/HIVE-24162
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
>   
> {noformat}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `bloomTest`(  |
> |   `msisdn` string, |
> |   `imsi` varchar(20),  |
> |   `imei` bigint,   |
> |   `cell_id` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'orc.bloom.filter.columns'='msisdn,cell_id,imsi',  |
> |   'orc.bloom.filter.fpp'='0.02',   |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1597222946')|
> ++
> insert into  bloomTest values ("a", "b", 10, 20);
> insert into  bloomTest values ("aa", "bb", 100, 200);
> insert into  bloomTest values ("aaa", "bbb", 1000, 2000);
> select * from bloomTest;
> +---+-+-++
> | bloomtest.msisdn  | bloomtest.imsi  | bloomtest.imei  | bloomtest.cell_id  |
> +---+-+-++
> | a | b   | 10  | 20 |
> | aa| bb  | 100 | 200|
> | aaa   | bbb | 1000| 2000   |
> +---+-+-++
> {noformat}
>  - Compact the table
> {code:java}
> alter table bloomTest compact 'MAJOR';
> {code}
>  - Wait for the compaction to be over and check for bloom filters in dataset.
>   
>  - delta would have it, but not in the base dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24170) Add UDF resources explicitly to the classpath while handling drop function event during load.

2020-09-16 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24170:

Attachment: HIVE-24170.01.patch

> Add UDF resources explicitly to the classpath while handling drop function 
> event during load.
> -
>
> Key: HIVE-24170
> URL: https://issues.apache.org/jira/browse/HIVE-24170
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24170.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24170) Add UDF resources explicitly to the classpath while handling drop function event during load.

2020-09-16 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24170:

Attachment: (was: HIVE-24170.01.patch)

> Add UDF resources explicitly to the classpath while handling drop function 
> event during load.
> -
>
> Key: HIVE-24170
> URL: https://issues.apache.org/jira/browse/HIVE-24170
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24170.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24170) Add UDF resources explicitly to the classpath while handling drop function event during load.

2020-09-16 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24170:

Summary: Add UDF resources explicitly to the classpath while handling drop 
function event during load.  (was: Add UDF resources explicitely to the 
classpath while handling drop function event during load.)

> Add UDF resources explicitly to the classpath while handling drop function 
> event during load.
> -
>
> Key: HIVE-24170
> URL: https://issues.apache.org/jira/browse/HIVE-24170
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24170.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24168) Disable hdfsEncryptionShims cache during query-based compaction

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24168?focusedWorklogId=484999&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484999
 ]

ASF GitHub Bot logged work on HIVE-24168:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 08:16
Start Date: 16/Sep/20 08:16
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1501:
URL: https://github.com/apache/hive/pull/1501#discussion_r489250118



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -445,14 +446,23 @@ public static boolean isCompactionTable(Properties 
tblProperties) {
   }
 
   /**
-   * Determine if a table is used during query based compaction.
+   * Determine if a table is used during query based compaction for CRUD 
tables.
* @param parameters table properties map
* @return true, if the parameters contains {@link 
AcidUtils#COMPACTOR_TABLE_PROPERTY}
*/
   public static boolean isCompactionTable(Map parameters) {
 return Boolean.valueOf(parameters.getOrDefault(COMPACTOR_TABLE_PROPERTY, 
"false"));
   }
 
+  /**
+   * Determine if a table is used during query based compaction for MM 
insert-only tables.
+   * @param parameters table properties map
+   * @return true, if the parameters contains {@link 
AcidUtils#MM_COMPACTOR_TABLE_PROPERTY}
+   */
+  public static boolean isMmCompactionTable(Map parameters) {

Review comment:
   isCompactionTable logically would be true for both full acid and mm 
tables, but until now we've only used it to mark tables used for compacting 
full acid tables. AFAIK we don't want to apply the operations we do on full 
acid compaction tables to mm compaction tables.
   I could rename isCompactionTable() to isFullAcidCompactionTable() for easier 
reading, would that do?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 484999)
Time Spent: 0.5h  (was: 20m)

> Disable hdfsEncryptionShims cache during query-based compaction
> ---
>
> Key: HIVE-24168
> URL: https://issues.apache.org/jira/browse/HIVE-24168
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive keeps a cache of encryption shims in SessionState (Map HadoopShims.HdfsEncryptionShim> hdfsEncryptionShims). Each encryption shim in 
> the cache stores a FileSystem object.
> After compaction where the session user is not the same user as the owner of 
> the partition/table directory, we close all FileSystem objects associated 
> with the user running the compaction, possibly closing an FS stored in the 
> encryption shim cache. The next time query-based compaction is run on a 
> table/partition owned by the same user, compaction will fail in MoveTask[1] 
> since the FileSystem stored in the cache was closed.
> This change disables the cache during query-based compaction (optionally; 
> default: disabled).
> [1] Error:
> {code:java}
> 2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: 
> [rncdpdev-2.fyre.ibm.com-27]: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Filesystem 
> closed. org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Filesystem closed
>   at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637)
>   at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147)
>   at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477)
>   at 
> org.apache.hadoop.hi

[jira] [Resolved] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns

2020-09-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24084.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus and Vineet for reviewing the changes!

> Push Aggregates thru joins in case it re-groups previously unique columns
> -
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24160) Scheduled executions must allow state transition EXECUTING->TIMED_OUT

2020-09-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24160.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Krisztian for reviewing the changes!

> Scheduled executions must allow state transition EXECUTING->TIMED_OUT
> -
>
> Key: HIVE-24160
> URL: https://issues.apache.org/jira/browse/HIVE-24160
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=484984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484984
 ]

ASF GitHub Bot logged work on HIVE-24084:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 07:51
Start Date: 16/Sep/20 07:51
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1439:
URL: https://github.com/apache/hive/pull/1439


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 484984)
Time Spent: 3h 50m  (was: 3h 40m)

> Push Aggregates thru joins in case it re-groups previously unique columns
> -
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24160) Scheduled executions must allow state transition EXECUTING->TIMED_OUT

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24160?focusedWorklogId=484980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484980
 ]

ASF GitHub Bot logged work on HIVE-24160:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 07:47
Start Date: 16/Sep/20 07:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1496:
URL: https://github.com/apache/hive/pull/1496


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 484980)
Time Spent: 20m  (was: 10m)

> Scheduled executions must allow state transition EXECUTING->TIMED_OUT
> -
>
> Key: HIVE-24160
> URL: https://issues.apache.org/jira/browse/HIVE-24160
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24162) Query based compaction looses bloom filter

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24162?focusedWorklogId=484979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484979
 ]

ASF GitHub Bot logged work on HIVE-24162:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 07:46
Start Date: 16/Sep/20 07:46
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1498:
URL: https://github.com/apache/hive/pull/1498#discussion_r489232187



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilder.java
##
@@ -543,18 +543,26 @@ private void addTblProperties(StringBuilder query, int 
bucketingVersion) {
 if (crud && minor && isBucketed) {
   tblProperties.put("bucketing_version", String.valueOf(bucketingVersion));
 }
-if (insertOnly && sourceTab != null) { // to avoid NPEs, skip this part if 
sourceTab is null
-  // Exclude all standard table properties.
-  Set excludes = getHiveMetastoreConstants();
-  excludes.addAll(StatsSetupConst.TABLE_PARAMS_STATS_KEYS);
-  for (Map.Entry e : sourceTab.getParameters().entrySet()) 
{
-if (e.getValue() == null) {
-  continue;
+if (sourceTab != null) { // to avoid NPEs, skip this part if sourceTab is 
null
+  if (insertOnly) {
+// Exclude all standard table properties.
+Set excludes = getHiveMetastoreConstants();
+excludes.addAll(StatsSetupConst.TABLE_PARAMS_STATS_KEYS);
+for (Map.Entry e : 
sourceTab.getParameters().entrySet()) {
+  if (e.getValue() == null) {
+continue;
+  }
+  if (excludes.contains(e.getKey())) {
+continue;
+  }
+  tblProperties.put(e.getKey(), 
HiveStringUtils.escapeHiveCommand(e.getValue()));
 }
-if (excludes.contains(e.getKey())) {
-  continue;
+  } else if (crud) {
+for (Map.Entry e : 
sourceTab.getParameters().entrySet()) {
+  if (e.getKey().startsWith("orc.")) {

Review comment:
   Makes sense!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 484979)
Time Spent: 50m  (was: 40m)

> Query based compaction looses bloom filter
> --
>
> Key: HIVE-24162
> URL: https://issues.apache.org/jira/browse/HIVE-24162
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
>   
> {noformat}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `bloomTest`(  |
> |   `msisdn` string, |
> |   `imsi` varchar(20),  |
> |   `imei` bigint,   |
> |   `cell_id` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'orc.bloom.filter.columns'='msisdn,cell_id,imsi',  |
> |   'orc.bloom.filter.fpp'='0.02',   |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1597222946')|
> ++
> insert into  bloomTest values ("a", "b", 10, 20);
> insert into  bloomTest values ("aa", "bb", 100, 200);
> insert into  bloomTest values ("aaa", "bbb", 1000, 2000);
> select * from bloomTest;
> +---+-+-++
> | bloomtest.msisdn  | bloomtest.imsi  | bloomtest.imei  | bloomtest.cell_id  |
> +---+-+-++
> | a | b   

[jira] [Work logged] (HIVE-24168) Disable hdfsEncryptionShims cache during query-based compaction

2020-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24168?focusedWorklogId=484970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484970
 ]

ASF GitHub Bot logged work on HIVE-24168:
-

Author: ASF GitHub Bot
Created on: 16/Sep/20 07:16
Start Date: 16/Sep/20 07:16
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1501:
URL: https://github.com/apache/hive/pull/1501#discussion_r489214913



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -445,14 +446,23 @@ public static boolean isCompactionTable(Properties 
tblProperties) {
   }
 
   /**
-   * Determine if a table is used during query based compaction.
+   * Determine if a table is used during query based compaction for CRUD 
tables.
* @param parameters table properties map
* @return true, if the parameters contains {@link 
AcidUtils#COMPACTOR_TABLE_PROPERTY}
*/
   public static boolean isCompactionTable(Map parameters) {
 return Boolean.valueOf(parameters.getOrDefault(COMPACTOR_TABLE_PROPERTY, 
"false"));
   }
 
+  /**
+   * Determine if a table is used during query based compaction for MM 
insert-only tables.
+   * @param parameters table properties map
+   * @return true, if the parameters contains {@link 
AcidUtils#MM_COMPACTOR_TABLE_PROPERTY}
+   */
+  public static boolean isMmCompactionTable(Map parameters) {

Review comment:
   Shouldn't isCompactionTable return true in both cases? Isn't it a 
problem the other places we use this util, that the mmCompactionTable are 
missed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 484970)
Time Spent: 20m  (was: 10m)

> Disable hdfsEncryptionShims cache during query-based compaction
> ---
>
> Key: HIVE-24168
> URL: https://issues.apache.org/jira/browse/HIVE-24168
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive keeps a cache of encryption shims in SessionState (Map HadoopShims.HdfsEncryptionShim> hdfsEncryptionShims). Each encryption shim in 
> the cache stores a FileSystem object.
> After compaction where the session user is not the same user as the owner of 
> the partition/table directory, we close all FileSystem objects associated 
> with the user running the compaction, possibly closing an FS stored in the 
> encryption shim cache. The next time query-based compaction is run on a 
> table/partition owned by the same user, compaction will fail in MoveTask[1] 
> since the FileSystem stored in the cache was closed.
> This change disables the cache during query-based compaction (optionally; 
> default: disabled).
> [1] Error:
> {code:java}
> 2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: 
> [rncdpdev-2.fyre.ibm.com-27]: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Filesystem 
> closed. org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Filesystem closed
>   at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637)
>   at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147)
>   at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477)
>   at 
> org.apache.hadoop.hive.ql.DriverUtils.runOnDriver(DriverUtils.java:70)
>   at 
> org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:116)
>   at 
> org.apache.hadoop.hive.ql.txn.compactor.MmMajo