[jira] [Updated] (HIVE-24069) HiveHistory should log the task that ends abnormally
[ https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-24069: --- Priority: Major (was: Minor) > HiveHistory should log the task that ends abnormally > > > Key: HIVE-24069 > URL: https://issues.apache.org/jira/browse/HIVE-24069 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When the task returns with the exitVal not equal to 0, The Executor would > skip marking the task return code and calling endTask. This may make the > history log incomplete for such tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
[ https://issues.apache.org/jira/browse/HIVE-24063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng reassigned HIVE-24063: -- Assignee: Zhihua Deng > SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo > --- > > Key: HIVE-24063 > URL: https://issues.apache.org/jira/browse/HIVE-24063 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When the current SqlOperator is SqlCastFunction, > FunctionRegistry.getFunctionInfo would return null, > but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to > metastore for the function definition, an exception stack trace can be seen > here in HiveServer2 log: > INFO exec.FunctionRegistry: Unable to look up default.cast in metastore > org.apache.hadoop.hive.ql.metadata.HiveException: > NoSuchObjectException(message:Function @hive#default.cast does not exist) > at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > > So it's may be better to handle explicit cast before geting the FunctionInfo > from Registry. Even if there is no cast in the query, the method > handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24044) Implement listPartitionNames on temporary tables
[ https://issues.apache.org/jira/browse/HIVE-24044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-24044: --- Priority: Major (was: Minor) > Implement listPartitionNames on temporary tables > - > > Key: HIVE-24044 > URL: https://issues.apache.org/jira/browse/HIVE-24044 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Temporary tables can have their own partitions, and IMetaStoreClient use > {code:java} > List listPartitionNames(PartitionsByExprRequest request){code} > to filter or sort the results. This method can be implemented on temporary > tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24069) HiveHistory should log the task that ends abnormally
[ https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng reassigned HIVE-24069: -- Assignee: Zhihua Deng > HiveHistory should log the task that ends abnormally > > > Key: HIVE-24069 > URL: https://issues.apache.org/jira/browse/HIVE-24069 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When the task returns with the exitVal not equal to 0, The Executor would > skip marking the task return code and calling endTask. This may make the > history log incomplete for such tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24044) Implement listPartitionNames on temporary tables
[ https://issues.apache.org/jira/browse/HIVE-24044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng reassigned HIVE-24044: -- Assignee: Zhihua Deng > Implement listPartitionNames on temporary tables > - > > Key: HIVE-24044 > URL: https://issues.apache.org/jira/browse/HIVE-24044 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Temporary tables can have their own partitions, and IMetaStoreClient use > {code:java} > List listPartitionNames(PartitionsByExprRequest request){code} > to filter or sort the results. This method can be implemented on temporary > tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?focusedWorklogId=485478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485478 ] ASF GitHub Bot logged work on HIVE-23871: - Author: ASF GitHub Bot Created on: 17/Sep/20 00:46 Start Date: 17/Sep/20 00:46 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1273: URL: https://github.com/apache/hive/pull/1273#issuecomment-693741003 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 485478) Time Spent: 1h 10m (was: 1h) > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: table1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
[ https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485341&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485341 ] ASF GitHub Bot logged work on HIVE-24154: - Author: ASF GitHub Bot Created on: 16/Sep/20 19:56 Start Date: 16/Sep/20 19:56 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1492: URL: https://github.com/apache/hive/pull/1492#discussion_r489718951 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java ## @@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) { } @Override public RexNode visitCall(RexCall call) { - final RexNode node; - final List operands; - final List newOperands; - final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); switch (call.getKind()) { case AND: - // IN clauses need to be combined by keeping only common elements - operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); - for (int i = 0; i < operands.size(); i++) { -RexNode operand = operands.get(i); -if (operand.getKind() == SqlKind.IN) { - RexCall inCall = (RexCall) operand; - if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { -continue; - } - RexNode ref = inCall.getOperands().get(0); - if (inLHSExprToRHSExprs.containsKey(ref)) { -Set expressions = Sets.newHashSet(); -for (int j = 1; j < inCall.getOperands().size(); j++) { - expressions.add(inCall.getOperands().get(j)); -} -inLHSExprToRHSExprs.get(ref).retainAll(expressions); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -for (int j = 1; j < inCall.getOperands().size(); j++) { - inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j)); -} - } - operands.remove(i); - --i; -} else if (operand.getKind() == SqlKind.EQUALS) { - Constraint c = Constraint.of(operand); - if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) { -continue; + return handleAND(rexBuilder, call); +case OR: + return handleOR(rexBuilder, call); +default: + return super.visitCall(call); + } +} + +private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) { + // IN clauses need to be combined by keeping only common elements + final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); + // We will use this set to keep those expressions that may evaluate + // into a null value. + final Multimap inLHSExprToRHSNullableExprs = LinkedHashMultimap.create(); + final List operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); + for (int i = 0; i < operands.size(); i++) { +RexNode operand = operands.get(i); +if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { +continue; + } + RexNode ref = inCall.getOperands().get(0); + if (ref.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, ref); + } + if (inLHSExprToRHSExprs.containsKey(ref)) { +Set expressions = Sets.newHashSet(); +for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + expressions.add(constNode); + if (constNode.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, constNode); } - RexNode ref = c.exprNode; - if (inLHSExprToRHSExprs.containsKey(ref)) { - inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode)); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -inLHSExprToRHSExprs.put(ref, c.co
[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
[ https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485246 ] ASF GitHub Bot logged work on HIVE-24154: - Author: ASF GitHub Bot Created on: 16/Sep/20 16:50 Start Date: 16/Sep/20 16:50 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1492: URL: https://github.com/apache/hive/pull/1492#discussion_r489581662 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java ## @@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) { } @Override public RexNode visitCall(RexCall call) { - final RexNode node; - final List operands; - final List newOperands; - final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); switch (call.getKind()) { case AND: - // IN clauses need to be combined by keeping only common elements - operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); - for (int i = 0; i < operands.size(); i++) { -RexNode operand = operands.get(i); -if (operand.getKind() == SqlKind.IN) { - RexCall inCall = (RexCall) operand; - if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { -continue; - } - RexNode ref = inCall.getOperands().get(0); - if (inLHSExprToRHSExprs.containsKey(ref)) { -Set expressions = Sets.newHashSet(); -for (int j = 1; j < inCall.getOperands().size(); j++) { - expressions.add(inCall.getOperands().get(j)); -} -inLHSExprToRHSExprs.get(ref).retainAll(expressions); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -for (int j = 1; j < inCall.getOperands().size(); j++) { - inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j)); -} - } - operands.remove(i); - --i; -} else if (operand.getKind() == SqlKind.EQUALS) { - Constraint c = Constraint.of(operand); - if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) { -continue; + return handleAND(rexBuilder, call); +case OR: + return handleOR(rexBuilder, call); +default: + return super.visitCall(call); + } +} + +private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) { + // IN clauses need to be combined by keeping only common elements + final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); + // We will use this set to keep those expressions that may evaluate + // into a null value. + final Multimap inLHSExprToRHSNullableExprs = LinkedHashMultimap.create(); + final List operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); + for (int i = 0; i < operands.size(); i++) { +RexNode operand = operands.get(i); +if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { +continue; + } + RexNode ref = inCall.getOperands().get(0); + if (ref.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, ref); + } + if (inLHSExprToRHSExprs.containsKey(ref)) { +Set expressions = Sets.newHashSet(); +for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + expressions.add(constNode); + if (constNode.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, constNode); } - RexNode ref = c.exprNode; - if (inLHSExprToRHSExprs.containsKey(ref)) { - inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode)); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -inLHSExprToRHSExprs.put(ref, c.co
[jira] [Updated] (HIVE-24169) HiveServer2 UDF cache
[ https://issues.apache.org/jira/browse/HIVE-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24169: -- Labels: pull-request-available (was: ) > HiveServer2 UDF cache > - > > Key: HIVE-24169 > URL: https://issues.apache.org/jira/browse/HIVE-24169 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Sam An >Assignee: Sam An >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > UDF is cache per session. This optional feature can help speed up UDF access > in S3 scenario. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache
[ https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=485232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485232 ] ASF GitHub Bot logged work on HIVE-24169: - Author: ASF GitHub Bot Created on: 16/Sep/20 16:25 Start Date: 16/Sep/20 16:25 Worklog Time Spent: 10m Work Description: sam-an-cloudera opened a new pull request #1503: URL: https://github.com/apache/hive/pull/1503 ### What changes were proposed in this pull request? This feature adds a HiveServer2 level UDF cache that can be shared across sessions. ### Why are the changes needed? Without this feature, on S3 based system, describe function and select udf() will be localized each and every time, taking 2 minutes to download 300mb. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manual This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 485232) Remaining Estimate: 0h Time Spent: 10m > HiveServer2 UDF cache > - > > Key: HIVE-24169 > URL: https://issues.apache.org/jira/browse/HIVE-24169 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Sam An >Assignee: Sam An >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > UDF is cache per session. This optional feature can help speed up UDF access > in S3 scenario. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
[ https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485191&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485191 ] ASF GitHub Bot logged work on HIVE-24154: - Author: ASF GitHub Bot Created on: 16/Sep/20 14:53 Start Date: 16/Sep/20 14:53 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1492: URL: https://github.com/apache/hive/pull/1492#discussion_r489501206 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java ## @@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) { } @Override public RexNode visitCall(RexCall call) { - final RexNode node; - final List operands; - final List newOperands; - final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); switch (call.getKind()) { case AND: - // IN clauses need to be combined by keeping only common elements - operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); - for (int i = 0; i < operands.size(); i++) { -RexNode operand = operands.get(i); -if (operand.getKind() == SqlKind.IN) { - RexCall inCall = (RexCall) operand; - if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { -continue; - } - RexNode ref = inCall.getOperands().get(0); - if (inLHSExprToRHSExprs.containsKey(ref)) { -Set expressions = Sets.newHashSet(); -for (int j = 1; j < inCall.getOperands().size(); j++) { - expressions.add(inCall.getOperands().get(j)); -} -inLHSExprToRHSExprs.get(ref).retainAll(expressions); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -for (int j = 1; j < inCall.getOperands().size(); j++) { - inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j)); -} - } - operands.remove(i); - --i; -} else if (operand.getKind() == SqlKind.EQUALS) { - Constraint c = Constraint.of(operand); - if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) { -continue; + return handleAND(rexBuilder, call); +case OR: + return handleOR(rexBuilder, call); +default: + return super.visitCall(call); + } +} + +private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) { + // IN clauses need to be combined by keeping only common elements + final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); + // We will use this set to keep those expressions that may evaluate + // into a null value. + final Multimap inLHSExprToRHSNullableExprs = LinkedHashMultimap.create(); + final List operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); + for (int i = 0; i < operands.size(); i++) { +RexNode operand = operands.get(i); +if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { +continue; + } + RexNode ref = inCall.getOperands().get(0); + if (ref.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, ref); + } + if (inLHSExprToRHSExprs.containsKey(ref)) { +Set expressions = Sets.newHashSet(); +for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + expressions.add(constNode); + if (constNode.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, constNode); } - RexNode ref = c.exprNode; - if (inLHSExprToRHSExprs.containsKey(ref)) { - inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode)); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -inLHSExprToRHSExprs.put(ref, c.co
[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
[ https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485190 ] ASF GitHub Bot logged work on HIVE-24154: - Author: ASF GitHub Bot Created on: 16/Sep/20 14:53 Start Date: 16/Sep/20 14:53 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1492: URL: https://github.com/apache/hive/pull/1492#discussion_r489501206 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java ## @@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) { } @Override public RexNode visitCall(RexCall call) { - final RexNode node; - final List operands; - final List newOperands; - final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); switch (call.getKind()) { case AND: - // IN clauses need to be combined by keeping only common elements - operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); - for (int i = 0; i < operands.size(); i++) { -RexNode operand = operands.get(i); -if (operand.getKind() == SqlKind.IN) { - RexCall inCall = (RexCall) operand; - if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { -continue; - } - RexNode ref = inCall.getOperands().get(0); - if (inLHSExprToRHSExprs.containsKey(ref)) { -Set expressions = Sets.newHashSet(); -for (int j = 1; j < inCall.getOperands().size(); j++) { - expressions.add(inCall.getOperands().get(j)); -} -inLHSExprToRHSExprs.get(ref).retainAll(expressions); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -for (int j = 1; j < inCall.getOperands().size(); j++) { - inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j)); -} - } - operands.remove(i); - --i; -} else if (operand.getKind() == SqlKind.EQUALS) { - Constraint c = Constraint.of(operand); - if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) { -continue; + return handleAND(rexBuilder, call); +case OR: + return handleOR(rexBuilder, call); +default: + return super.visitCall(call); + } +} + +private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) { + // IN clauses need to be combined by keeping only common elements + final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); + // We will use this set to keep those expressions that may evaluate + // into a null value. + final Multimap inLHSExprToRHSNullableExprs = LinkedHashMultimap.create(); + final List operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); + for (int i = 0; i < operands.size(); i++) { +RexNode operand = operands.get(i); +if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { +continue; + } + RexNode ref = inCall.getOperands().get(0); + if (ref.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, ref); + } + if (inLHSExprToRHSExprs.containsKey(ref)) { +Set expressions = Sets.newHashSet(); +for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + expressions.add(constNode); + if (constNode.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, constNode); } - RexNode ref = c.exprNode; - if (inLHSExprToRHSExprs.containsKey(ref)) { - inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode)); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -inLHSExprToRHSExprs.put(ref, c.co
[jira] [Work logged] (HIVE-24168) Disable hdfsEncryptionShims cache during query-based compaction
[ https://issues.apache.org/jira/browse/HIVE-24168?focusedWorklogId=485177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485177 ] ASF GitHub Bot logged work on HIVE-24168: - Author: ASF GitHub Bot Created on: 16/Sep/20 14:16 Start Date: 16/Sep/20 14:16 Worklog Time Spent: 10m Work Description: zchovan commented on a change in pull request #1501: URL: https://github.com/apache/hive/pull/1501#discussion_r489471299 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -4475,7 +4481,8 @@ public static boolean moveFile(final HiveConf conf, Path srcf, final Path destf, destFs.copyFromLocalFile(srcf, destf); return true; } else { -if (needToCopy(conf, srcf, destf, srcFs, destFs, configuredOwner, isManaged)) { +if (needToCopy(conf, srcf, destf, srcFs, destFs, configuredOwner, isManaged, isCompactionTable, +isMmCompactionTable)) { //copy if across file system or encryption zones. LOG.debug("Copying source " + srcf + " to " + destf + " because HDFS encryption zones are different."); Review comment: Would it make sense to include the hashes for the fs instances in the debug logs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 485177) Time Spent: 40m (was: 0.5h) > Disable hdfsEncryptionShims cache during query-based compaction > --- > > Key: HIVE-24168 > URL: https://issues.apache.org/jira/browse/HIVE-24168 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Hive keeps a cache of encryption shims in SessionState (Map HadoopShims.HdfsEncryptionShim> hdfsEncryptionShims). Each encryption shim in > the cache stores a FileSystem object. > After compaction where the session user is not the same user as the owner of > the partition/table directory, we close all FileSystem objects associated > with the user running the compaction, possibly closing an FS stored in the > encryption shim cache. The next time query-based compaction is run on a > table/partition owned by the same user, compaction will fail in MoveTask[1] > since the FileSystem stored in the cache was closed. > This change disables the cache during query-based compaction (optionally; > default: disabled). > [1] Error: > {code:java} > 2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: > [rncdpdev-2.fyre.ibm.com-27]: FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Filesystem > closed. org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: Filesystem closed > at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477) > at > org.apache.hadoop.hive.ql.DriverUtils.runOnDriver(DriverUtils.java:70) > at > org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:116) > at > org.apache.hadoop.hive.ql.txn.compactor.MmMajorQueryCompactor.runCompaction(MmMajorQueryCompactor.java:72) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:232) > at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:221) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security
[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
[ https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=485132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485132 ] ASF GitHub Bot logged work on HIVE-24154: - Author: ASF GitHub Bot Created on: 16/Sep/20 13:05 Start Date: 16/Sep/20 13:05 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1492: URL: https://github.com/apache/hive/pull/1492#discussion_r489412449 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java ## @@ -678,100 +679,135 @@ private RexNode useStructIfNeeded(List columns) { } @Override public RexNode visitCall(RexCall call) { - final RexNode node; - final List operands; - final List newOperands; - final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); switch (call.getKind()) { case AND: - // IN clauses need to be combined by keeping only common elements - operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); - for (int i = 0; i < operands.size(); i++) { -RexNode operand = operands.get(i); -if (operand.getKind() == SqlKind.IN) { - RexCall inCall = (RexCall) operand; - if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { -continue; - } - RexNode ref = inCall.getOperands().get(0); - if (inLHSExprToRHSExprs.containsKey(ref)) { -Set expressions = Sets.newHashSet(); -for (int j = 1; j < inCall.getOperands().size(); j++) { - expressions.add(inCall.getOperands().get(j)); -} -inLHSExprToRHSExprs.get(ref).retainAll(expressions); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -for (int j = 1; j < inCall.getOperands().size(); j++) { - inLHSExprToRHSExprs.put(ref, inCall.getOperands().get(j)); -} - } - operands.remove(i); - --i; -} else if (operand.getKind() == SqlKind.EQUALS) { - Constraint c = Constraint.of(operand); - if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) { -continue; + return handleAND(rexBuilder, call); +case OR: + return handleOR(rexBuilder, call); +default: + return super.visitCall(call); + } +} + +private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) { + // IN clauses need to be combined by keeping only common elements + final Multimap inLHSExprToRHSExprs = LinkedHashMultimap.create(); + // We will use this set to keep those expressions that may evaluate + // into a null value. + final Multimap inLHSExprToRHSNullableExprs = LinkedHashMultimap.create(); + final List operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); + for (int i = 0; i < operands.size(); i++) { +RexNode operand = operands.get(i); +if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { +continue; + } + RexNode ref = inCall.getOperands().get(0); + if (ref.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, ref); + } + if (inLHSExprToRHSExprs.containsKey(ref)) { +Set expressions = Sets.newHashSet(); +for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + expressions.add(constNode); + if (constNode.getType().isNullable()) { +inLHSExprToRHSNullableExprs.put(ref, constNode); } - RexNode ref = c.exprNode; - if (inLHSExprToRHSExprs.containsKey(ref)) { - inLHSExprToRHSExprs.get(ref).retainAll(Collections.singleton(c.constNode)); -if (!inLHSExprToRHSExprs.containsKey(ref)) { - // Note that Multimap does not keep a key if all its values are removed. - // Hence, since there are no common expressions and it is within an AND, - // we should return false - return rexBuilder.makeLiteral(false); -} - } else { -inLHSExprToRHSExprs.put(ref, c.co
[jira] [Assigned] (HIVE-24172) Fix TestMmCompactorOnMr
[ https://issues.apache.org/jira/browse/HIVE-24172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage reassigned HIVE-24172: Assignee: Karen Coppage > Fix TestMmCompactorOnMr > --- > > Key: HIVE-24172 > URL: https://issues.apache.org/jira/browse/HIVE-24172 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Karen Coppage >Priority: Major > > test is unstable; > http://ci.hive.apache.org/job/hive-flaky-check/112/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24170) Add UDF resources explicitly to the classpath while handling drop function event during load.
[ https://issues.apache.org/jira/browse/HIVE-24170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196928#comment-17196928 ] Aasha Medhi commented on HIVE-24170: +1 > Add UDF resources explicitly to the classpath while handling drop function > event during load. > - > > Key: HIVE-24170 > URL: https://issues.apache.org/jira/browse/HIVE-24170 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24170.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24158) Cleanup isn't complete in OrcFileMergeOperator#closeOp
[ https://issues.apache.org/jira/browse/HIVE-24158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage updated HIVE-24158: - Fix Version/s: 4.0.0 > Cleanup isn't complete in OrcFileMergeOperator#closeOp > -- > > Key: HIVE-24158 > URL: https://issues.apache.org/jira/browse/HIVE-24158 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Field Map outWriters isn't cleared during operation close: > {code:java} > if (outWriters != null) { > for (Map.Entry outWriterEntry : outWriters.entrySet()) { > Writer outWriter = outWriterEntry.getValue(); > outWriter.close(); > outWriter = null; > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24162) Query based compaction looses bloom filter
[ https://issues.apache.org/jira/browse/HIVE-24162?focusedWorklogId=485100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-485100 ] ASF GitHub Bot logged work on HIVE-24162: - Author: ASF GitHub Bot Created on: 16/Sep/20 11:50 Start Date: 16/Sep/20 11:50 Worklog Time Spent: 10m Work Description: klcopp merged pull request #1498: URL: https://github.com/apache/hive/pull/1498 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 485100) Time Spent: 1h (was: 50m) > Query based compaction looses bloom filter > -- > > Key: HIVE-24162 > URL: https://issues.apache.org/jira/browse/HIVE-24162 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > *Steps to reproduce:* > > {noformat} > ++ > | createtab_stmt | > ++ > | CREATE TABLE `bloomTest`( | > | `msisdn` string, | > | `imsi` varchar(20), | > | `imei` bigint, | > | `cell_id` bigint)| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | > | LOCATION | > | > 's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest' > | > | TBLPROPERTIES (| > | 'bucketing_version'='2', | > | 'orc.bloom.filter.columns'='msisdn,cell_id,imsi', | > | 'orc.bloom.filter.fpp'='0.02', | > | 'transactional'='true', | > | 'transactional_properties'='default',| > | 'transient_lastDdlTime'='1597222946')| > ++ > insert into bloomTest values ("a", "b", 10, 20); > insert into bloomTest values ("aa", "bb", 100, 200); > insert into bloomTest values ("aaa", "bbb", 1000, 2000); > select * from bloomTest; > +---+-+-++ > | bloomtest.msisdn | bloomtest.imsi | bloomtest.imei | bloomtest.cell_id | > +---+-+-++ > | a | b | 10 | 20 | > | aa| bb | 100 | 200| > | aaa | bbb | 1000| 2000 | > +---+-+-++ > {noformat} > - Compact the table > {code:java} > alter table bloomTest compact 'MAJOR'; > {code} > - Wait for the compaction to be over and check for bloom filters in dataset. > > - delta would have it, but not in the base dataset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24162) Query based compaction looses bloom filter
[ https://issues.apache.org/jira/browse/HIVE-24162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage resolved HIVE-24162. -- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master. Thanks [~pvargacl] for the patch! > Query based compaction looses bloom filter > -- > > Key: HIVE-24162 > URL: https://issues.apache.org/jira/browse/HIVE-24162 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > *Steps to reproduce:* > > {noformat} > ++ > | createtab_stmt | > ++ > | CREATE TABLE `bloomTest`( | > | `msisdn` string, | > | `imsi` varchar(20), | > | `imei` bigint, | > | `cell_id` bigint)| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | > | LOCATION | > | > 's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest' > | > | TBLPROPERTIES (| > | 'bucketing_version'='2', | > | 'orc.bloom.filter.columns'='msisdn,cell_id,imsi', | > | 'orc.bloom.filter.fpp'='0.02', | > | 'transactional'='true', | > | 'transactional_properties'='default',| > | 'transient_lastDdlTime'='1597222946')| > ++ > insert into bloomTest values ("a", "b", 10, 20); > insert into bloomTest values ("aa", "bb", 100, 200); > insert into bloomTest values ("aaa", "bbb", 1000, 2000); > select * from bloomTest; > +---+-+-++ > | bloomtest.msisdn | bloomtest.imsi | bloomtest.imei | bloomtest.cell_id | > +---+-+-++ > | a | b | 10 | 20 | > | aa| bb | 100 | 200| > | aaa | bbb | 1000| 2000 | > +---+-+-++ > {noformat} > - Compact the table > {code:java} > alter table bloomTest compact 'MAJOR'; > {code} > - Wait for the compaction to be over and check for bloom filters in dataset. > > - delta would have it, but not in the base dataset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24170) Add UDF resources explicitly to the classpath while handling drop function event during load.
[ https://issues.apache.org/jira/browse/HIVE-24170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24170: Attachment: HIVE-24170.01.patch > Add UDF resources explicitly to the classpath while handling drop function > event during load. > - > > Key: HIVE-24170 > URL: https://issues.apache.org/jira/browse/HIVE-24170 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24170.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24170) Add UDF resources explicitly to the classpath while handling drop function event during load.
[ https://issues.apache.org/jira/browse/HIVE-24170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24170: Attachment: (was: HIVE-24170.01.patch) > Add UDF resources explicitly to the classpath while handling drop function > event during load. > - > > Key: HIVE-24170 > URL: https://issues.apache.org/jira/browse/HIVE-24170 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24170.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24170) Add UDF resources explicitly to the classpath while handling drop function event during load.
[ https://issues.apache.org/jira/browse/HIVE-24170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24170: Summary: Add UDF resources explicitly to the classpath while handling drop function event during load. (was: Add UDF resources explicitely to the classpath while handling drop function event during load.) > Add UDF resources explicitly to the classpath while handling drop function > event during load. > - > > Key: HIVE-24170 > URL: https://issues.apache.org/jira/browse/HIVE-24170 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24170.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24168) Disable hdfsEncryptionShims cache during query-based compaction
[ https://issues.apache.org/jira/browse/HIVE-24168?focusedWorklogId=484999&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484999 ] ASF GitHub Bot logged work on HIVE-24168: - Author: ASF GitHub Bot Created on: 16/Sep/20 08:16 Start Date: 16/Sep/20 08:16 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1501: URL: https://github.com/apache/hive/pull/1501#discussion_r489250118 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -445,14 +446,23 @@ public static boolean isCompactionTable(Properties tblProperties) { } /** - * Determine if a table is used during query based compaction. + * Determine if a table is used during query based compaction for CRUD tables. * @param parameters table properties map * @return true, if the parameters contains {@link AcidUtils#COMPACTOR_TABLE_PROPERTY} */ public static boolean isCompactionTable(Map parameters) { return Boolean.valueOf(parameters.getOrDefault(COMPACTOR_TABLE_PROPERTY, "false")); } + /** + * Determine if a table is used during query based compaction for MM insert-only tables. + * @param parameters table properties map + * @return true, if the parameters contains {@link AcidUtils#MM_COMPACTOR_TABLE_PROPERTY} + */ + public static boolean isMmCompactionTable(Map parameters) { Review comment: isCompactionTable logically would be true for both full acid and mm tables, but until now we've only used it to mark tables used for compacting full acid tables. AFAIK we don't want to apply the operations we do on full acid compaction tables to mm compaction tables. I could rename isCompactionTable() to isFullAcidCompactionTable() for easier reading, would that do? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 484999) Time Spent: 0.5h (was: 20m) > Disable hdfsEncryptionShims cache during query-based compaction > --- > > Key: HIVE-24168 > URL: https://issues.apache.org/jira/browse/HIVE-24168 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Hive keeps a cache of encryption shims in SessionState (Map HadoopShims.HdfsEncryptionShim> hdfsEncryptionShims). Each encryption shim in > the cache stores a FileSystem object. > After compaction where the session user is not the same user as the owner of > the partition/table directory, we close all FileSystem objects associated > with the user running the compaction, possibly closing an FS stored in the > encryption shim cache. The next time query-based compaction is run on a > table/partition owned by the same user, compaction will fail in MoveTask[1] > since the FileSystem stored in the cache was closed. > This change disables the cache during query-based compaction (optionally; > default: disabled). > [1] Error: > {code:java} > 2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: > [rncdpdev-2.fyre.ibm.com-27]: FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Filesystem > closed. org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: Filesystem closed > at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477) > at > org.apache.hadoop.hi
[jira] [Resolved] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns
[ https://issues.apache.org/jira/browse/HIVE-24084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24084. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Jesus and Vineet for reviewing the changes! > Push Aggregates thru joins in case it re-groups previously unique columns > - > > Key: HIVE-24084 > URL: https://issues.apache.org/jira/browse/HIVE-24084 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24160) Scheduled executions must allow state transition EXECUTING->TIMED_OUT
[ https://issues.apache.org/jira/browse/HIVE-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24160. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Krisztian for reviewing the changes! > Scheduled executions must allow state transition EXECUTING->TIMED_OUT > - > > Key: HIVE-24160 > URL: https://issues.apache.org/jira/browse/HIVE-24160 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns
[ https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=484984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484984 ] ASF GitHub Bot logged work on HIVE-24084: - Author: ASF GitHub Bot Created on: 16/Sep/20 07:51 Start Date: 16/Sep/20 07:51 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1439: URL: https://github.com/apache/hive/pull/1439 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 484984) Time Spent: 3h 50m (was: 3h 40m) > Push Aggregates thru joins in case it re-groups previously unique columns > - > > Key: HIVE-24084 > URL: https://issues.apache.org/jira/browse/HIVE-24084 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24160) Scheduled executions must allow state transition EXECUTING->TIMED_OUT
[ https://issues.apache.org/jira/browse/HIVE-24160?focusedWorklogId=484980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484980 ] ASF GitHub Bot logged work on HIVE-24160: - Author: ASF GitHub Bot Created on: 16/Sep/20 07:47 Start Date: 16/Sep/20 07:47 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1496: URL: https://github.com/apache/hive/pull/1496 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 484980) Time Spent: 20m (was: 10m) > Scheduled executions must allow state transition EXECUTING->TIMED_OUT > - > > Key: HIVE-24160 > URL: https://issues.apache.org/jira/browse/HIVE-24160 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24162) Query based compaction looses bloom filter
[ https://issues.apache.org/jira/browse/HIVE-24162?focusedWorklogId=484979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484979 ] ASF GitHub Bot logged work on HIVE-24162: - Author: ASF GitHub Bot Created on: 16/Sep/20 07:46 Start Date: 16/Sep/20 07:46 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1498: URL: https://github.com/apache/hive/pull/1498#discussion_r489232187 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilder.java ## @@ -543,18 +543,26 @@ private void addTblProperties(StringBuilder query, int bucketingVersion) { if (crud && minor && isBucketed) { tblProperties.put("bucketing_version", String.valueOf(bucketingVersion)); } -if (insertOnly && sourceTab != null) { // to avoid NPEs, skip this part if sourceTab is null - // Exclude all standard table properties. - Set excludes = getHiveMetastoreConstants(); - excludes.addAll(StatsSetupConst.TABLE_PARAMS_STATS_KEYS); - for (Map.Entry e : sourceTab.getParameters().entrySet()) { -if (e.getValue() == null) { - continue; +if (sourceTab != null) { // to avoid NPEs, skip this part if sourceTab is null + if (insertOnly) { +// Exclude all standard table properties. +Set excludes = getHiveMetastoreConstants(); +excludes.addAll(StatsSetupConst.TABLE_PARAMS_STATS_KEYS); +for (Map.Entry e : sourceTab.getParameters().entrySet()) { + if (e.getValue() == null) { +continue; + } + if (excludes.contains(e.getKey())) { +continue; + } + tblProperties.put(e.getKey(), HiveStringUtils.escapeHiveCommand(e.getValue())); } -if (excludes.contains(e.getKey())) { - continue; + } else if (crud) { +for (Map.Entry e : sourceTab.getParameters().entrySet()) { + if (e.getKey().startsWith("orc.")) { Review comment: Makes sense! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 484979) Time Spent: 50m (was: 40m) > Query based compaction looses bloom filter > -- > > Key: HIVE-24162 > URL: https://issues.apache.org/jira/browse/HIVE-24162 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > *Steps to reproduce:* > > {noformat} > ++ > | createtab_stmt | > ++ > | CREATE TABLE `bloomTest`( | > | `msisdn` string, | > | `imsi` varchar(20), | > | `imei` bigint, | > | `cell_id` bigint)| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | > | LOCATION | > | > 's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest' > | > | TBLPROPERTIES (| > | 'bucketing_version'='2', | > | 'orc.bloom.filter.columns'='msisdn,cell_id,imsi', | > | 'orc.bloom.filter.fpp'='0.02', | > | 'transactional'='true', | > | 'transactional_properties'='default',| > | 'transient_lastDdlTime'='1597222946')| > ++ > insert into bloomTest values ("a", "b", 10, 20); > insert into bloomTest values ("aa", "bb", 100, 200); > insert into bloomTest values ("aaa", "bbb", 1000, 2000); > select * from bloomTest; > +---+-+-++ > | bloomtest.msisdn | bloomtest.imsi | bloomtest.imei | bloomtest.cell_id | > +---+-+-++ > | a | b
[jira] [Work logged] (HIVE-24168) Disable hdfsEncryptionShims cache during query-based compaction
[ https://issues.apache.org/jira/browse/HIVE-24168?focusedWorklogId=484970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484970 ] ASF GitHub Bot logged work on HIVE-24168: - Author: ASF GitHub Bot Created on: 16/Sep/20 07:16 Start Date: 16/Sep/20 07:16 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1501: URL: https://github.com/apache/hive/pull/1501#discussion_r489214913 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -445,14 +446,23 @@ public static boolean isCompactionTable(Properties tblProperties) { } /** - * Determine if a table is used during query based compaction. + * Determine if a table is used during query based compaction for CRUD tables. * @param parameters table properties map * @return true, if the parameters contains {@link AcidUtils#COMPACTOR_TABLE_PROPERTY} */ public static boolean isCompactionTable(Map parameters) { return Boolean.valueOf(parameters.getOrDefault(COMPACTOR_TABLE_PROPERTY, "false")); } + /** + * Determine if a table is used during query based compaction for MM insert-only tables. + * @param parameters table properties map + * @return true, if the parameters contains {@link AcidUtils#MM_COMPACTOR_TABLE_PROPERTY} + */ + public static boolean isMmCompactionTable(Map parameters) { Review comment: Shouldn't isCompactionTable return true in both cases? Isn't it a problem the other places we use this util, that the mmCompactionTable are missed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 484970) Time Spent: 20m (was: 10m) > Disable hdfsEncryptionShims cache during query-based compaction > --- > > Key: HIVE-24168 > URL: https://issues.apache.org/jira/browse/HIVE-24168 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Hive keeps a cache of encryption shims in SessionState (Map HadoopShims.HdfsEncryptionShim> hdfsEncryptionShims). Each encryption shim in > the cache stores a FileSystem object. > After compaction where the session user is not the same user as the owner of > the partition/table directory, we close all FileSystem objects associated > with the user running the compaction, possibly closing an FS stored in the > encryption shim cache. The next time query-based compaction is run on a > table/partition owned by the same user, compaction will fail in MoveTask[1] > since the FileSystem stored in the cache was closed. > This change disables the cache during query-based compaction (optionally; > default: disabled). > [1] Error: > {code:java} > 2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: > [rncdpdev-2.fyre.ibm.com-27]: FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Filesystem > closed. org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: Filesystem closed > at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477) > at > org.apache.hadoop.hive.ql.DriverUtils.runOnDriver(DriverUtils.java:70) > at > org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:116) > at > org.apache.hadoop.hive.ql.txn.compactor.MmMajo