[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466826953 ## File path: ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query94_anti_join.q.out ## @@ -0,0 +1,94 @@ +PREHOOK: query: explain cbo +select + count(distinct ws_order_number) as `order count` + ,sum(ws_ext_ship_cost) as `total shipping cost` + ,sum(ws_net_profit) as `total net profit` +from + web_sales ws1 + ,date_dim + ,customer_address + ,web_site +where +d_date between '1999-5-01' and + (cast('1999-5-01' as date) + 60 days) +and ws1.ws_ship_date_sk = d_date_sk +and ws1.ws_ship_addr_sk = ca_address_sk +and ca_state = 'TX' +and ws1.ws_web_site_sk = web_site_sk +and web_company_name = 'pri' +and exists (select * +from web_sales ws2 +where ws1.ws_order_number = ws2.ws_order_number + and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) +and not exists(select * + from web_returns wr1 + where ws1.ws_order_number = wr1.wr_order_number) +order by count(distinct ws_order_number) +limit 100 +PREHOOK: type: QUERY +PREHOOK: Input: default@customer_address +PREHOOK: Input: default@date_dim +PREHOOK: Input: default@web_returns +PREHOOK: Input: default@web_sales +PREHOOK: Input: default@web_site +PREHOOK: Output: hdfs://### HDFS PATH ### +POSTHOOK: query: explain cbo +select + count(distinct ws_order_number) as `order count` + ,sum(ws_ext_ship_cost) as `total shipping cost` + ,sum(ws_net_profit) as `total net profit` +from + web_sales ws1 + ,date_dim + ,customer_address + ,web_site +where +d_date between '1999-5-01' and + (cast('1999-5-01' as date) + 60 days) +and ws1.ws_ship_date_sk = d_date_sk +and ws1.ws_ship_addr_sk = ca_address_sk +and ca_state = 'TX' +and ws1.ws_web_site_sk = web_site_sk +and web_company_name = 'pri' +and exists (select * +from web_sales ws2 +where ws1.ws_order_number = ws2.ws_order_number + and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) +and not exists(select * + from web_returns wr1 + where ws1.ws_order_number = wr1.wr_order_number) +order by count(distinct ws_order_number) +limit 100 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@customer_address +POSTHOOK: Input: default@date_dim +POSTHOOK: Input: default@web_returns +POSTHOOK: Input: default@web_sales +POSTHOOK: Input: default@web_site +POSTHOOK: Output: hdfs://### HDFS PATH ### +CBO PLAN: +HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], agg#2=[sum($6)]) + HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], cost=[not available]) Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466826675 ## File path: ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out ## @@ -0,0 +1,99 @@ +PREHOOK: query: explain cbo Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466826103 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. \n" + "If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the\n" + "specified size, the join is directly converted to a mapjoin (there is no conditional task)."), - +HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466825617 ## File path: ql/src/test/results/clientpositive/llap/subquery_notexists_having.q.out ## @@ -31,7 +31,8 @@ STAGE PLANS: Tez A masked pattern was here Edges: -Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE) +Reducer 2 <- Map 1 (SIMPLE_EDGE) Review comment: yes ..the join is getting converted to SMB join ..and so no reducer is required. In case of anti join its not getting converted. That is because left outer is adding an extra group by which is making the RS node on left and right side equal, the pre-condition for converting to SMB join. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466822516 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -2129,6 +2133,16 @@ private RelNode applyPreJoinOrderingTransforms(RelNode basePlan, RelMetadataProv HiveRemoveSqCountCheck.INSTANCE); } + // 10. Convert left outer join + null filter on right side table column to anti join. Add this + // rule after all the optimization for which calcite support for anti join is missing. + // Needs to be done before ProjectRemoveRule as it expect a project over filter. + // This is done before join re-ordering as join re-ordering is converting the left outer Review comment: As discussed, i have created a Jira https://issues.apache.org/jira/browse/HIVE-24013 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466819572 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java ## @@ -92,7 +104,7 @@ public void onMatch(RelOptRuleCall call) { Set rightPushedPredicates = Sets.newHashSet(registry.getPushedPredicates(join, 1)); boolean genPredOnLeft = join.getJoinType() == JoinRelType.RIGHT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin(); -boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin(); +boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin()|| join.getJoinType() == JoinRelType.ANTI; Review comment: yes ..that is taken care of. // For anti join, we should proceed to emit records if the right side is empty or not matching. if (type == JoinDesc.ANTI_JOIN && !producedRow) { This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466819149 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAntiSemiJoinRule.java ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAntiJoin; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveAntiSemiJoinRule extends RelOptRule { + protected static final Logger LOG = LoggerFactory.getLogger(HiveAntiSemiJoinRule.class); + public static final HiveAntiSemiJoinRule INSTANCE = new HiveAntiSemiJoinRule(); + + //HiveProject(fld=[$0]) + // HiveFilter(condition=[IS NULL($1)]) + //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[not available]) + // + // TO + // + //HiveProject(fld_tbl=[$0]) + // HiveAntiJoin(condition=[=($0, $1)], joinType=[anti]) + // + public HiveAntiSemiJoinRule() { +super(operand(Project.class, operand(Filter.class, operand(Join.class, RelOptRule.any(, +"HiveJoinWithFilterToAntiJoinRule:filter"); + } + + // is null filter over a left join. + public void onMatch(final RelOptRuleCall call) { +final Project project = call.rel(0); +final Filter filter = call.rel(1); +final Join join = call.rel(2); +perform(call, project, filter, join); + } + + protected void perform(RelOptRuleCall call, Project project, Filter filter, Join join) { +LOG.debug("Start Matching HiveAntiJoinRule"); + +//TODO : Need to support this scenario. +if (join.getCondition().isAlwaysTrue()) { + return; +} + +//We support conversion from left outer join only. +if (join.getJoinType() != JoinRelType.LEFT) { + return; +} + +assert (filter != null); + +// If null filter is not present from right side then we can not convert to anti join. +List aboveFilters = RelOptUtil.conjunctions(filter.getCondition()); +Stream nullFilters = aboveFilters.stream().filter(filterNode -> filterNode.getKind() == SqlKind.IS_NULL); +boolean hasNullFilter = HiveCalciteUtil.hasAnyExpressionFromRightSide(join, nullFilters.collect(Collectors.toList())); +if (!hasNullFilter) { + return; +} + +// If any projection is there from right side, then we can not convert to anti join. +boolean hasProjection = HiveCalciteUtil.hasAnyExpressionFromRightSide(join, project.getProjects()); +if (hasProjection) { + return; +} + +LOG.debug("Matched HiveAntiJoinRule"); + +// Build anti join with same left, right child and condition as original left outer join. +Join anti = HiveAntiJoin.getAntiJoin(join.getLeft().getCluster(), join.getLeft().getTraitSet(), +join.getLeft(), join.getRight(), join.getCondition()); +RelNode newProject = project.copy(project.getTraitSet(), anti, project.getProjects(), project.getRowType()); +call.transformTo(newProject); Review comment: for normal filter, its being pushed down. here we get filters which can not be pushed down. I have modified the code to handle those filters. And added these extra tests to verify. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466818492 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinMultiKeyOperator.java ## @@ -0,0 +1,400 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinBytesHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.apache.hadoop.hive.serde2.ByteStream.Output; +import org.apache.hadoop.hive.serde2.binarysortable.fast.BinarySortableSerializeWrite; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// Multi-Key hash table import. +// Multi-Key specific imports. + +// TODO : Duplicate codes need to merge with semi join. +/* + * Specialized class for doing a vectorized map join that is an anti join on Multi-Key + * using hash set. + */ +public class VectorMapJoinAntiJoinMultiKeyOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + + // + + private static final String CLASS_NAME = VectorMapJoinAntiJoinMultiKeyOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // + + // (none) + + // The above members are initialized by the constructor and must not be + // transient. + //--- + + // The hash map for this specialized class. + private transient VectorMapJoinBytesHashSet hashSet; + + //--- + // Multi-Key specific members. + // + + // Object that can take a set of columns in row in a vectorized row batch and serialized it. + // Known to not have any nulls. + private transient VectorSerializeRow keyVectorSerializeWrite; + + // The BinarySortable serialization of the current key. + private transient Output currentKeyOutput; + + // The BinarySortable serialization of the saved key for a possible series of equal keys. + private transient Output saveKeyOutput; + + //--- + // Pass-thru constructors. + // + + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinMultiKeyOperator() { +super(); + } + + public VectorMapJoinAntiJoinMultiKeyOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinMultiKeyOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + //--- + // Process Multi-Key Anti Join on a vectorized row batch. + // + + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +/* + * Initialize Multi-Key members for this specialized class. + */ + +keyVectorSerializeWrite = new VectorSerializeRow(BinarySortableSerializeWrite.with( +this.getConf().getKeyTblDesc().getProperties(), bigTableKeyColumnMap.length)); +keyVectorSerializeWrite.init(bigTableKeyTypeInfos, bigTableKeyColumnMap); + +currentKeyOutput = new
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466818194 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java ## @@ -0,0 +1,315 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// TODO : Duplicate codes need to merge with semi join. +// Single-Column Long hash table import. +// Single-Column Long specific imports. + +/* + * Specialized class for doing a vectorized map join that is an anti join on a Single-Column Long + * using a hash set. + */ +public class VectorMapJoinAntiJoinLongOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + private static final String CLASS_NAME = VectorMapJoinAntiJoinLongOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // The above members are initialized by the constructor and must not be + // transient. + + // The hash map for this specialized class. + private transient VectorMapJoinLongHashSet hashSet; + + // Single-Column Long specific members. + // For integers, we have optional min/max filtering. + private transient boolean useMinMax; + private transient long min; + private transient long max; + + // The column number for this one column join specialization. + private transient int singleJoinColumn; + + // Pass-thru constructors. + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinLongOperator() { +super(); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + // Process Single-Column Long Anti Join on a vectorized row batch. + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +// Initialize Single-Column Long members for this specialized class. +singleJoinColumn = bigTableKeyColumnMap[0]; + } + + @Override + public void hashTableSetup() throws HiveException { +super.hashTableSetup(); + +// Get our Single-Column Long hash set information for this specialized class. +hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable; +useMinMax = hashSet.useMinMax(); +if (useMinMax) { + min = hashSet.min(); + max = hashSet.max(); +} + } + + @Override + public void processBatch(VectorizedRowBatch batch) throws HiveException { + +try { + // (Currently none) + // antiPerBatchSetup(batch); + + // For anti joins, we may apply the filter(s) now. + for(VectorExpression ve : bigTableFilterExpressions) { +ve.evaluate(batch); + } + + final int inputLogicalSize = batch.size; + if (inputLogicalSize == 0) { +return; + } + + // Perform any key expressions. Results will go into scratch columns. + if (bigTableKeyExpressions != null) { +for (VectorExpression ve : bigTableKeyExpressions) { + ve.evaluate(batch); +} + } + + // The one join column for this specialized class. +
[GitHub] [hive] GuoPhilipse removed a comment on pull request #1363: HIVE-23996: Remove unused line to keep code clean
GuoPhilipse removed a comment on pull request #1363: URL: https://github.com/apache/hive/pull/1363#issuecomment-669876714 cc @pvary This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] jcamachor opened a new pull request #1374: HIVE-24012: Support for rewriting with materialized views containing …
jcamachor opened a new pull request #1374: URL: https://github.com/apache/hive/pull/1374 …grouping sets ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] sam-an-cloudera opened a new pull request #1373: HIVE-24004: Improve performance for filter hook for superuser path
sam-an-cloudera opened a new pull request #1373: URL: https://github.com/apache/hive/pull/1373 This is an improvement on filter hook. For superuser if we can skip authorization, we don't need to create authorizer. This can save some CPU cycles. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] sam-an-cloudera commented on pull request #1372: HIVE-24004: Improve performance for filter hook for superuser path
sam-an-cloudera commented on pull request #1372: URL: https://github.com/apache/hive/pull/1372#issuecomment-670265392 problem with jenkins. Do another one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] sam-an-cloudera closed pull request #1372: HIVE-24004: Improve performance for filter hook for superuser path
sam-an-cloudera closed pull request #1372: URL: https://github.com/apache/hive/pull/1372 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] sam-an-cloudera opened a new pull request #1372: HIVE-24002: Improve performance for filter hook for superuser path
sam-an-cloudera opened a new pull request #1372: URL: https://github.com/apache/hive/pull/1372 This is an improvement on filter hook. For superuser if we can skip authorization, we don't need to create authorizer. This can save some CPU cycles. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] deniskuzZ closed pull request #1369: HIVE-24000: Put exclusive MERGE INSERT under the feature flag
deniskuzZ closed pull request #1369: URL: https://github.com/apache/hive/pull/1369 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] deniskuzZ opened a new pull request #1371: HIVE-24000: Put exclusive MERGE INSERT under the feature flag
deniskuzZ opened a new pull request #1371: URL: https://github.com/apache/hive/pull/1371 (cherry picked from commit 0e4b02af485cb1972ebc4f251d853c710e70164f) test out fix ### What changes were proposed in this pull request? Pushed exclusive MERGE INSERT under the feature flag ### Why are the changes needed? Backward compatibility ### Does this PR introduce _any_ user-facing change? new feature flag property was introduced 'hive.txn.xlock.mergeinsert' ### How was this patch tested? TestDbTxnManager2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] vineetgarg02 merged pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 merged pull request #1315: URL: https://github.com/apache/hive/pull/1315 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kishendas commented on a change in pull request #1355: send tableId to get_partition APIs
kishendas commented on a change in pull request #1355: URL: https://github.com/apache/hive/pull/1355#discussion_r466627274 ## File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestTxnManager.java ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.lockmgr; + +import org.apache.hadoop.hive.common.FileUtils; +import org.apache.hadoop.hive.common.ValidTxnWriteIdList; +import org.apache.hadoop.hive.metastore.api.CommitTxnRequest; +import org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse; +import org.apache.hadoop.hive.metastore.api.TxnToWriteId; +import org.apache.hadoop.hive.metastore.api.TxnType; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidReadTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.Database; +import org.apache.hadoop.hive.ql.Context; +import org.apache.hadoop.hive.ql.ErrorMsg; +import org.apache.hadoop.hive.ql.DriverState; +import org.apache.hadoop.hive.ql.QueryPlan; +import org.apache.hadoop.hive.ql.hooks.ReadEntity; +import org.apache.hadoop.hive.ql.hooks.WriteEntity; +import org.apache.hadoop.hive.ql.metadata.DummyPartition; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.metadata.Partition; +import org.apache.hadoop.hive.ql.metadata.Table; +import org.apache.hadoop.util.ReflectionUtils; + +import java.util.*; + +/** + * An implementation of {@link HiveTxnManager} that does not support + * transactions. + * This class is only used in test. + */ +class TestTxnManager extends DummyTxnManager { + final static Character SEMICOLON = ':'; Review comment: :-) It's a COLON. ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -5292,6 +5303,16 @@ public synchronized SynchronizedMetaStoreClient getSynchronizedMSC() throws Meta return syncMetaStoreClient; } +/** + * @return the metastore client for the current thread + * @throws MetaException + */ + @LimitedPrivate(value = {"Hive"}) Review comment: Sure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kishendas commented on a change in pull request #1355: send tableId to get_partition APIs
kishendas commented on a change in pull request #1355: URL: https://github.com/apache/hive/pull/1355#discussion_r466626987 ## File path: ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreClient.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.metadata; Review comment: Makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kishendas commented on a change in pull request #1355: send tableId to get_partition APIs
kishendas commented on a change in pull request #1355: URL: https://github.com/apache/hive/pull/1355#discussion_r466626887 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -474,6 +476,12 @@ public static Hive get() throws HiveException { return get(true); } + public static Hive get(IMetaStoreClient msc) throws HiveException, MetaException { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kishendas commented on a change in pull request #1355: send tableId to get_partition APIs
kishendas commented on a change in pull request #1355: URL: https://github.com/apache/hive/pull/1355#discussion_r466625978 ## File path: ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreClientApiArgumentsChecker.java ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.metadata; + +import com.google.common.collect.Lists; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidWriteIdList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.HiveMetaStoreClient; +import org.apache.hadoop.hive.metastore.IMetaStoreClient; +import org.apache.hadoop.hive.metastore.TableType; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.ql.Context; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc; +import org.apache.hadoop.hive.ql.session.SessionState; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqualOrGreaterThan; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; +import org.apache.thrift.TException; +import org.junit.Before; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * TestHiveMetaStoreClientApiArgumentsChecker + * + * This class works with {@link TestHiveMetaStoreClient} in order to verify the arguments that + * are sent from HS2 to HMS APIs. + * + */ +public class TestHiveMetaStoreClientApiArgumentsChecker { + + private Hive hive; + private IMetaStoreClient msc; + private FileSystem fs; + final static String DB_NAME = "db"; + final static String TABLE_NAME = "table"; + private IMetaStoreClient client; + private Table t; + + protected static final String USER_NAME = "user0"; + + @Before + public void setUp() throws Exception { + +client = new TestHiveMetaStoreClient(new HiveConf(Hive.class)); +hive = Hive.get(client); + hive.getConf().set(MetastoreConf.ConfVars.FS_HANDLER_THREADS_COUNT.getVarname(), "15"); + hive.getConf().set(MetastoreConf.ConfVars.MSCK_PATH_VALIDATION.getVarname(), "throw"); +msc = new HiveMetaStoreClient(hive.getConf()); + +hive.getConf().setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER, + "org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory"); +HiveConf.setBoolVar(hive.getConf(), HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false); +hive.getConf().set(ValidTxnList.VALID_TXNS_KEY, "1:"); +hive.getConf().set(ValidWriteIdList.VALID_WRITEIDS_KEY, TABLE_NAME + ":1:"); +hive.getConf().setVar(HiveConf.ConfVars.HIVE_TXN_MANAGER, "org.apache.hadoop.hive.ql.lockmgr.TestTxnManager"); +SessionState.start(hive.getConf()); +SessionState.get().initTxnMgr(hive.getConf()); +Context ctx = new Context(hive.getConf()); +SessionState.get().getTxnMgr().openTxn(ctx, USER_NAME); + +t = new Table(); +org.apache.hadoop.hive.metastore.api.Table tTable = new org.apache.hadoop.hive.metastore.api.Table(); +tTable.setId(Long.MAX_VALUE); +t.setTTable(tTable); +Map parameters = new HashMap<>(); +parameters.put(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true"); Review comment: HS2 methods have explicit checks to know whether a given table is transactional or not. So, validWriteIdList and tableId are only set for transactional tables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail:
[GitHub] [hive] vihangk1 commented on a change in pull request #1355: send tableId to get_partition APIs
vihangk1 commented on a change in pull request #1355: URL: https://github.com/apache/hive/pull/1355#discussion_r466598803 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -474,6 +476,12 @@ public static Hive get() throws HiveException { return get(true); } + public static Hive get(IMetaStoreClient msc) throws HiveException, MetaException { Review comment: If this is only used for testing purposes would be good to annotate this method with @VisibleForTesting ## File path: ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreClient.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.metadata; Review comment: A more natural place for this class would be in standalone-metastore module. I don't see anything in this class it to be dependent on hive-exec. ## File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestTxnManager.java ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.lockmgr; + +import org.apache.hadoop.hive.common.FileUtils; +import org.apache.hadoop.hive.common.ValidTxnWriteIdList; +import org.apache.hadoop.hive.metastore.api.CommitTxnRequest; +import org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse; +import org.apache.hadoop.hive.metastore.api.TxnToWriteId; +import org.apache.hadoop.hive.metastore.api.TxnType; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidReadTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.Database; +import org.apache.hadoop.hive.ql.Context; +import org.apache.hadoop.hive.ql.ErrorMsg; +import org.apache.hadoop.hive.ql.DriverState; +import org.apache.hadoop.hive.ql.QueryPlan; +import org.apache.hadoop.hive.ql.hooks.ReadEntity; +import org.apache.hadoop.hive.ql.hooks.WriteEntity; +import org.apache.hadoop.hive.ql.metadata.DummyPartition; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.metadata.Partition; +import org.apache.hadoop.hive.ql.metadata.Table; +import org.apache.hadoop.util.ReflectionUtils; + +import java.util.*; + +/** + * An implementation of {@link HiveTxnManager} that does not support + * transactions. + * This class is only used in test. + */ +class TestTxnManager extends DummyTxnManager { + final static Character SEMICOLON = ':'; Review comment: nit, the name of the variable and its value seems off. Either change the value to ';' or rename the variable to colon. ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -5292,6 +5303,16 @@ public synchronized SynchronizedMetaStoreClient getSynchronizedMSC() throws Meta return syncMetaStoreClient; } +/** + * @return the metastore client for the current thread + * @throws MetaException + */ + @LimitedPrivate(value = {"Hive"}) Review comment: I think these annotations are more useful for public APIs. Since Hive.java is not a public API you can just use @VisibleForTesting annotation. ## File path: ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreClientApiArgumentsChecker.java ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation
[GitHub] [hive] vihangk1 commented on a change in pull request #1330: HIVE-23890: Create HMS endpoint for querying file lists using FlatBuf…
vihangk1 commented on a change in pull request #1330: URL: https://github.com/apache/hive/pull/1330#discussion_r466575514 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{ 4: optional string errorMessage, } +struct GetFileListRequest { Review comment: This naming of this struct could be more generic (may be a better name could be GetFileMetadataRequest). Also did you consider reusing existing/extending getFileMetadata HMS API? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String dbname, String name, Table } } +@Override +public GetFileListResponse get_file_list(GetFileListRequest req) throws MetaException { + String catName = req.isSetCatName() ? req.getCatName() : getDefaultCatalog(conf); + String dbName = req.getDbName(); + String tblName = req.getTableName(); + List partitions = req.getPartVals(); + // Will be used later, when cache is introduced + String validWriteIdList = req.getValidWriteIdList(); + + startFunction("get_file_list", ": " + TableName.getQualified(catName, dbName, tblName) + + ", partitions: " + partitions.toString()); + + + GetFileListResponse response = new GetFileListResponse(); + + boolean success = false; + Exception ex = null; + try { +Partition p = getMS().getPartition(catName, dbName, tblName, partitions); +Path path = new Path(p.getSd().getLocation()); + +FileSystem fs = path.getFileSystem(conf); +RemoteIterator itr = fs.listFiles(path, true); +while (itr.hasNext()) { + FileStatus fStatus = itr.next(); + Reader reader = OrcFile.createReader(fStatus.getPath(), OrcFile.readerOptions(fs.getConf())); Review comment: Does this assume that the request is always for a ORC table? ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{ 4: optional string errorMessage, } +struct GetFileListRequest { + 1: optional string catName, + 2: optional string dbName, + 3: optional string tableName, + 4: optional list partVals, + 6: optional string validWriteIdList +} + +struct GetFileListResponse { Review comment: I think it will be useful to have a separate struct defined for the FileMetadata which also includes a type field. For instance, I can see this being useful for various engines and the FileMetadata format could be different for each engine. For instance, engine1 may require FileStatus, BlockInformation while engine2 only is interested in FileStatus, FilemodificationTime, while there is another file-metadata type for ACID state which includes some ACID specific information like fileformat that you used below. ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2454,10 +2467,14 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) // partition keys in new_part should be the same as those in old partition. void rename_partition(1:string db_name, 2:string tbl_name, 3:list part_vals, 4:Partition new_part) throws (1:InvalidOperationException o1, 2:MetaException o2) - + RenamePartitionResponse rename_partition_req(1:RenamePartitionRequest req) throws (1:InvalidOperationException o1, 2:MetaException o2) + // Returns a file list using FlatBuffers as serialization + GetFileListResponse get_file_list(1:GetFileListRequest req) +throws(1:MetaException o1) Review comment: Seems like if the partition doesn't exist you might need to throw a NoSuchObjectFoundException as well. ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String dbname, String name, Table } } +@Override +public GetFileListResponse get_file_list(GetFileListRequest req) throws MetaException { + String catName = req.isSetCatName() ? req.getCatName() : getDefaultCatalog(conf); + String dbName = req.getDbName(); + String tblName = req.getTableName(); + List partitions = req.getPartVals(); + // Will be used later, when cache is introduced + String validWriteIdList = req.getValidWriteIdList(); + + startFunction("get_file_list", ": " + TableName.getQualified(catName, dbName, tblName) + + ", partitions: " + partitions.toString()); + + + GetFileListResponse response = new GetFileListResponse(); + + boolean success = false; + Exception ex = null; + try {
[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
aasha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466579718 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java ## @@ -196,12 +203,14 @@ private String checkHiveEntityGuid(AtlasRequestBuilder atlasRequestBuilder, Stri AtlasObjectId objectId = atlasRequestBuilder.getItemToExport(clusterName, srcDb); Set> entries = objectId.getUniqueAttributes().entrySet(); if (entries == null || entries.isEmpty()) { - throw new SemanticException("Could find entries in objectId for:" + clusterName); + throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE.format("Could find " + +"entries in objectId for:" + clusterName, "atlas")); Review comment: ok ok. yes will do that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pkumarsinha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
pkumarsinha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466578320 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java ## @@ -86,107 +87,59 @@ private boolean checkIfPathExist(Path sourcePath, UserGroupInformation proxyUser return proxyUser.doAs((PrivilegedExceptionAction) () -> sourcePath.getFileSystem(conf).exists(sourcePath)); } - private int handleException(Exception e, Path sourcePath, Path targetPath, - int currentRetry, UserGroupInformation proxyUser) { -try { - LOG.info("Checking if source path " + sourcePath + " is missing for exception ", e); - if (!checkIfPathExist(sourcePath, proxyUser)) { -LOG.info("Source path is missing. Ignoring exception."); -return 0; - } -} catch (Exception ex) { - LOG.warn("Source path missing check failed. ", ex); -} -// retry logic only for i/o exception -if (!(e instanceof IOException)) { - LOG.error("Unable to copy {} to {}", sourcePath, targetPath, e); - setException(e); - return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode(); -} - -if (currentRetry <= MAX_COPY_RETRY) { - LOG.warn("Unable to copy {} to {}", sourcePath, targetPath, e); -} else { - LOG.error("Unable to copy {} to {} even after retrying for {} time", sourcePath, targetPath, currentRetry, e); - setException(e); - return ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getErrorCode(); -} -int sleepTime = FileUtils.getSleepTime(currentRetry); -LOG.info("Sleep for " + sleepTime + " milliseconds before retry no " + (currentRetry)); -try { - Thread.sleep(sleepTime); -} catch (InterruptedException timerEx) { - LOG.info("Sleep interrupted", timerEx.getMessage()); -} -try { - if (proxyUser == null) { -proxyUser = Utils.getUGI(); - } - FileSystem.closeAllForUGI(proxyUser); -} catch (Exception ex) { - LOG.warn("Unable to closeAllForUGI for user " + proxyUser, ex); -} -return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode(); - } - @Override public int execute() { String distCpDoAsUser = conf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER); +Retryable retryable = Retryable.builder() + .withHiveConf(conf) + .withRetryOnException(IOException.class).build(); +try { + return retryable.executeCallable(() -> { +UserGroupInformation proxyUser = null; +Path sourcePath = work.getFullyQualifiedSourcePath(); +Path targetPath = work.getFullyQualifiedTargetPath(); +try { + if (conf.getBoolVar(HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE)) { +sourcePath = reservedRawPath(work.getFullyQualifiedSourcePath().toUri()); +targetPath = reservedRawPath(work.getFullyQualifiedTargetPath().toUri()); + } + UserGroupInformation ugi = Utils.getUGI(); + String currentUser = ugi.getShortUserName(); + if (distCpDoAsUser != null && !currentUser.equals(distCpDoAsUser)) { +proxyUser = UserGroupInformation.createProxyUser( + distCpDoAsUser, UserGroupInformation.getLoginUser()); + } -Path sourcePath = work.getFullyQualifiedSourcePath(); -Path targetPath = work.getFullyQualifiedTargetPath(); -if (conf.getBoolVar(HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE)) { - sourcePath = reservedRawPath(work.getFullyQualifiedSourcePath().toUri()); - targetPath = reservedRawPath(work.getFullyQualifiedTargetPath().toUri()); -} -int currentRetry = 0; -int error = 0; -UserGroupInformation proxyUser = null; -while (currentRetry <= MAX_COPY_RETRY) { - try { -UserGroupInformation ugi = Utils.getUGI(); -String currentUser = ugi.getShortUserName(); -if (distCpDoAsUser != null && !currentUser.equals(distCpDoAsUser)) { - proxyUser = UserGroupInformation.createProxyUser( - distCpDoAsUser, UserGroupInformation.getLoginUser()); -} - -setTargetPathOwner(targetPath, sourcePath, proxyUser); - -// do we create a new conf and only here provide this additional option so that we get away from -// differences of data in two location for the same directories ? -// basically add distcp.options.delete to hiveconf new object ? -FileUtils.distCp( -sourcePath.getFileSystem(conf), // source file system -Collections.singletonList(sourcePath), // list of source paths -targetPath, -false, -proxyUser, -conf, -ShimLoader.getHadoopShims()); -return 0; - } catch (Exception e) { -currentRetry++; -error = handleException(e, sourcePath, targetPath, currentRetry, proxyUser); -if (error == 0) { -
[GitHub] [hive] pkumarsinha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
pkumarsinha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466576634 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java ## @@ -196,12 +203,14 @@ private String checkHiveEntityGuid(AtlasRequestBuilder atlasRequestBuilder, Stri AtlasObjectId objectId = atlasRequestBuilder.getItemToExport(clusterName, srcDb); Set> entries = objectId.getUniqueAttributes().entrySet(); if (entries == null || entries.isEmpty()) { - throw new SemanticException("Could find entries in objectId for:" + clusterName); + throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE.format("Could find " + +"entries in objectId for:" + clusterName, "atlas")); Review comment: was referring to "atlas" part here and in other places like "ranger", "hive" . Should we have one const defined per service and use that in stead?, something like: final String ReplUtils.ATLAS_SVC = "atlas"; ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE.format("Could find " + "entries in objectId for:" + clusterName, ReplUtils.ATLAS_SVC); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] jcamachor commented on a change in pull request #1357: HIVE-23963: UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule
jcamachor commented on a change in pull request #1357: URL: https://github.com/apache/hive/pull/1357#discussion_r466575229 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelDistribution.java ## @@ -81,8 +84,16 @@ public RelDistribution apply(TargetMapping mapping) { return this; } List newKeys = new ArrayList<>(keys.size()); + +// Instead of using a HashMap for lookup newKeys.add(mapping.getTargetOpt(key)); should be called but not all the +// mapping supports that. See HIVE-23963. Replace this when this is fixed in calcite. Review comment: @kasakrisz , please create the Calcite JIRA so it is easier to track. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
aasha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466574105 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java ## @@ -132,31 +130,40 @@ private AtlasReplInfo createAtlasReplInfo() throws SemanticException, MalformedU private long lastStoredTimeStamp() throws SemanticException { Path prevMetadataPath = new Path(work.getPrevAtlasDumpDir(), EximUtil.METADATA_NAME); -BufferedReader br = null; +Retryable retryable = Retryable.builder() + .withHiveConf(conf) + .withRetryOnException(IOException.class) + .withFailOnException(FileNotFoundException.class).build(); try { - FileSystem fs = prevMetadataPath.getFileSystem(conf); - br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset())); - String line = br.readLine(); - if (line == null) { -throw new SemanticException("Could not read lastStoredTimeStamp from atlas metadata file"); - } - String[] lineContents = line.split("\t", 5); - return Long.parseLong(lineContents[1]); -} catch (Exception ex) { - throw new SemanticException(ex); -} finally { - if (br != null) { + return retryable.executeCallable(() -> { +BufferedReader br = null; try { - br.close(); -} catch (IOException e) { - throw new SemanticException(e); + FileSystem fs = prevMetadataPath.getFileSystem(conf); + br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset())); + String line = br.readLine(); + if (line == null) { +throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE Review comment: Named it as REPL_INVALID_INTERNAL_CONFIG_FOR_SERVICE This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pkumarsinha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
pkumarsinha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466574279 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java ## @@ -132,31 +130,40 @@ private AtlasReplInfo createAtlasReplInfo() throws SemanticException, MalformedU private long lastStoredTimeStamp() throws SemanticException { Path prevMetadataPath = new Path(work.getPrevAtlasDumpDir(), EximUtil.METADATA_NAME); -BufferedReader br = null; +Retryable retryable = Retryable.builder() + .withHiveConf(conf) + .withRetryOnException(IOException.class) + .withFailOnException(FileNotFoundException.class).build(); try { - FileSystem fs = prevMetadataPath.getFileSystem(conf); - br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset())); - String line = br.readLine(); - if (line == null) { -throw new SemanticException("Could not read lastStoredTimeStamp from atlas metadata file"); - } - String[] lineContents = line.split("\t", 5); - return Long.parseLong(lineContents[1]); -} catch (Exception ex) { - throw new SemanticException(ex); -} finally { - if (br != null) { + return retryable.executeCallable(() -> { +BufferedReader br = null; try { - br.close(); -} catch (IOException e) { - throw new SemanticException(e); + FileSystem fs = prevMetadataPath.getFileSystem(conf); + br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset())); + String line = br.readLine(); + if (line == null) { +throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE + .format("Could not read lastStoredTimeStamp from atlas metadata file", "atlas")); + } + String[] lineContents = line.split("\t", 5); + return Long.parseLong(lineContents[1]); +} finally { + if (br != null) { +try { + br.close(); +} catch (IOException e) { + //Do nothing +} + } } - } + }); +} catch (Exception e) { + throw new SemanticException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e); Review comment: yes, here we are catching as that, I was referring to line 144:if (line == null) { throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE .format("Could not read lastStoredTimeStamp from atlas metadata file", "atlas")); } where we are throwing SemanticException already which we catch here, can't we just throw the same e in that case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 commented on a change in pull request #1315: URL: https://github.com/apache/hive/pull/1315#discussion_r466570882 ## File path: ql/src/test/results/clientpositive/llap/prepare_plan.q.out ## @@ -0,0 +1,1575 @@ +PREHOOK: query: explain extended prepare pcount from select count(*) from src where key > ? +PREHOOK: type: QUERY +PREHOOK: Input: default@src + A masked pattern was here +POSTHOOK: query: explain extended prepare pcount from select count(*) from src where key > ? +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src + A masked pattern was here +OPTIMIZED SQL: SELECT COUNT(*) AS `$f0` +FROM `default`.`src` +WHERE `key` > CAST(? AS STRING) +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Tez + A masked pattern was here + Edges: +Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) + A masked pattern was here + Vertices: +Map 1 +Map Operator Tree: +TableScan + alias: src + filterExpr: (key > CAST( Dynamic Parameter index: 1 AS STRING)) (type: boolean) + Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE + GatherStats: false + Filter Operator +isSamplingPred: false +predicate: (key > CAST( Dynamic Parameter index: 1 AS STRING)) (type: boolean) +Statistics: Num rows: 166 Data size: 14442 Basic stats: COMPLETE Column stats: COMPLETE +Select Operator + Statistics: Num rows: 166 Data size: 14442 Basic stats: COMPLETE Column stats: COMPLETE + Group By Operator +aggregations: count() +minReductionHashAggr: 0.99 +mode: hash +outputColumnNames: _col0 +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +Reduce Output Operator + bucketingVersion: 2 + null sort order: + numBuckets: -1 + sort order: + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + tag: -1 + value expressions: _col0 (type: bigint) + auto parallelism: false +Execution mode: llap +LLAP IO: no inputs +Path -> Alias: + A masked pattern was here +Path -> Partition: + A masked pattern was here +Partition + base file name: src + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + properties: +bucket_count -1 +bucketing_version 2 +column.name.delimiter , +columns key,value +columns.types string:string + A masked pattern was here +name default.src +serialization.format 1 +serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + +input format: org.apache.hadoop.mapred.TextInputFormat +output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat +properties: + bucketing_version 2 + column.name.delimiter , + columns key,value + columns.comments 'default','default' + columns.types string:string + A masked pattern was here + name default.src + serialization.format 1 + serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe +serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe +name: default.src + name: default.src +Truncated Path -> Alias: + /src [src] +Reducer 2 +Execution mode: llap +Needs Tagging: false +Reduce Operator Tree: + Group By Operator +aggregations: count(VALUE._col0) +mode: mergepartial +outputColumnNames: _col0 +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +File Output Operator + bucketingVersion: 2 + compressed: false + GlobalTableId: 0
[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 commented on a change in pull request #1315: URL: https://github.com/apache/hive/pull/1315#discussion_r466568964 ## File path: ql/src/test/results/clientpositive/llap/prepare_plan.q.out ## @@ -0,0 +1,1575 @@ +PREHOOK: query: explain extended prepare pcount from select count(*) from src where key > ? +PREHOOK: type: QUERY +PREHOOK: Input: default@src + A masked pattern was here +POSTHOOK: query: explain extended prepare pcount from select count(*) from src where key > ? +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src + A masked pattern was here +OPTIMIZED SQL: SELECT COUNT(*) AS `$f0` +FROM `default`.`src` +WHERE `key` > CAST(? AS STRING) +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Tez + A masked pattern was here + Edges: +Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) + A masked pattern was here + Vertices: +Map 1 +Map Operator Tree: +TableScan + alias: src + filterExpr: (key > CAST( Dynamic Parameter index: 1 AS STRING)) (type: boolean) + Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE + GatherStats: false + Filter Operator +isSamplingPred: false +predicate: (key > CAST( Dynamic Parameter index: 1 AS STRING)) (type: boolean) +Statistics: Num rows: 166 Data size: 14442 Basic stats: COMPLETE Column stats: COMPLETE +Select Operator + Statistics: Num rows: 166 Data size: 14442 Basic stats: COMPLETE Column stats: COMPLETE + Group By Operator +aggregations: count() +minReductionHashAggr: 0.99 +mode: hash +outputColumnNames: _col0 +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +Reduce Output Operator + bucketingVersion: 2 + null sort order: + numBuckets: -1 + sort order: + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + tag: -1 + value expressions: _col0 (type: bigint) + auto parallelism: false +Execution mode: llap +LLAP IO: no inputs +Path -> Alias: + A masked pattern was here +Path -> Partition: + A masked pattern was here +Partition + base file name: src + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + properties: +bucket_count -1 +bucketing_version 2 +column.name.delimiter , +columns key,value +columns.types string:string + A masked pattern was here +name default.src +serialization.format 1 +serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + +input format: org.apache.hadoop.mapred.TextInputFormat +output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat +properties: + bucketing_version 2 + column.name.delimiter , + columns key,value + columns.comments 'default','default' + columns.types string:string + A masked pattern was here + name default.src + serialization.format 1 + serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe +serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe +name: default.src + name: default.src +Truncated Path -> Alias: + /src [src] +Reducer 2 +Execution mode: llap +Needs Tagging: false +Reduce Operator Tree: + Group By Operator +aggregations: count(VALUE._col0) +mode: mergepartial +outputColumnNames: _col0 +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +File Output Operator + bucketingVersion: 2 + compressed: false + GlobalTableId: 0
[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
aasha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466563797 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java ## @@ -196,12 +203,14 @@ private String checkHiveEntityGuid(AtlasRequestBuilder atlasRequestBuilder, Stri AtlasObjectId objectId = atlasRequestBuilder.getItemToExport(clusterName, srcDb); Set> entries = objectId.getUniqueAttributes().entrySet(); if (entries == null || entries.isEmpty()) { - throw new SemanticException("Could find entries in objectId for:" + clusterName); + throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE.format("Could find " + +"entries in objectId for:" + clusterName, "atlas")); Review comment: Format will replace the config name and service name and helps us to reuse the same error code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 commented on a change in pull request #1315: URL: https://github.com/apache/hive/pull/1315#discussion_r466563430 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/drop/ExecuteStatementAnalyzer.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.ddl.table.drop; + +import org.apache.hadoop.hive.ql.QueryState; +import org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.DDLType; +import org.apache.hadoop.hive.ql.exec.ExplainTask; +import org.apache.hadoop.hive.ql.exec.FetchTask; +import org.apache.hadoop.hive.ql.exec.FilterOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorUtils; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.SerializationUtilities; +import org.apache.hadoop.hive.ql.exec.Task; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.exec.tez.TezTask; +import org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator; +import org.apache.hadoop.hive.ql.parse.ASTNode; +import org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer; +import org.apache.hadoop.hive.ql.parse.HiveParser; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.parse.type.ExprNodeDescExprFactory; +import org.apache.hadoop.hive.ql.plan.BaseWork; +import org.apache.hadoop.hive.ql.plan.ExprDynamicParamDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.session.SessionState; +import org.apache.hadoop.hive.serde2.typeinfo.CharTypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; +import org.apache.hadoop.hive.serde2.typeinfo.VarcharTypeInfo; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * Analyzer for Execute statement. + * This analyzer + * retreives cached {@link BaseSemanticAnalyzer}, + * makes copy of all tasks by serializing/deserializing it, + * bind dynamic parameters inside cached {@link BaseSemanticAnalyzer} using values provided + */ +@DDLType(types = HiveParser.TOK_EXECUTE) +public class ExecuteStatementAnalyzer extends BaseSemanticAnalyzer { + + public ExecuteStatementAnalyzer(QueryState queryState) throws SemanticException { +super(queryState); + } + + /** + * This class encapsulate all {@link Task} required to be copied. + * This is required because {@link FetchTask} list of {@link Task} may hold reference to same + * objects (e.g. list of result files) and are required to be serialized/de-serialized together. + */ + private class PlanCopy { +FetchTask fetchTask; +List> tasks; + +PlanCopy(FetchTask fetchTask, List> tasks) { + this.fetchTask = fetchTask; + this.tasks = tasks; +} + +FetchTask getFetchTask() { + return fetchTask; +} + +List> getTasks() { + return tasks; +} + } + + private String getQueryName(ASTNode root) { +ASTNode queryNameAST = (ASTNode)(root.getChild(1)); +return queryNameAST.getText(); + } + + /** + * Utility method to create copy of provided object using kyro serialization/de-serialization. + */ + private T makeCopy(final Object task, Class objClass) { +ByteArrayOutputStream baos = new ByteArrayOutputStream(); +SerializationUtilities.serializePlan(task, baos); + +return SerializationUtilities.deserializePlan( +new ByteArrayInputStream(baos.toByteArray()), objClass); + } + + /** + * Given a {@link BaseSemanticAnalyzer} (cached) this method make copies of all tasks + * (including {@link FetchTask}) and update the existing {@link ExecuteStatementAnalyzer} + */ + private void createTaskCopy(final BaseSemanticAnalyzer cachedPlan) { Review comment: Follow-up: https://issues.apache.org/jira/browse/HIVE-24005
[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 commented on a change in pull request #1315: URL: https://github.com/apache/hive/pull/1315#discussion_r466561873 ## File path: parser/src/java/org/apache/hadoop/hive/ql/parse/PrepareStatementParser.g ## @@ -0,0 +1,66 @@ +/** + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ +parser grammar PrepareStatementParser; + +options +{ +output=AST; +ASTLabelType=ASTNode; +backtrack=false; +k=3; +} + +@members { + @Override + public Object recoverFromMismatchedSet(IntStream input, + RecognitionException re, BitSet follow) throws RecognitionException { +throw re; + } + @Override + public void displayRecognitionError(String[] tokenNames, + RecognitionException e) { +gParent.errors.add(new ParseError(gParent, e, tokenNames)); + } +} + +@rulecatch { +catch (RecognitionException e) { + throw e; +} +} + +//--- Rules for parsing Prepare statement- +prepareStatement +@init { gParent.pushMsg("prepare statement ", state); } +@after { gParent.popMsg(state); } +: KW_PREPARE identifier KW_FROM queryStatementExpression +-> ^(TOK_PREPARE queryStatementExpression identifier) +; + +executeStatement +@init { gParent.pushMsg("execute statement ", state); } +@after { gParent.popMsg(state); } +: KW_EXECUTE identifier KW_USING executeParamList +-> ^(TOK_EXECUTE executeParamList identifier) +; + +executeParamList +@init { gParent.pushMsg("execute param list", state); } +@after { gParent.popMsg(state); } +: constant (COMMA constant)* Review comment: https://issues.apache.org/jira/browse/HIVE-24002 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] ashish-kumar-sharma opened a new pull request #1370: HIVE-23887: Reset Columns stats in Export Statement
ashish-kumar-sharma opened a new pull request #1370: URL: https://github.com/apache/hive/pull/1370 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
aasha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466546396 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java ## @@ -86,107 +87,59 @@ private boolean checkIfPathExist(Path sourcePath, UserGroupInformation proxyUser return proxyUser.doAs((PrivilegedExceptionAction) () -> sourcePath.getFileSystem(conf).exists(sourcePath)); } - private int handleException(Exception e, Path sourcePath, Path targetPath, - int currentRetry, UserGroupInformation proxyUser) { -try { - LOG.info("Checking if source path " + sourcePath + " is missing for exception ", e); - if (!checkIfPathExist(sourcePath, proxyUser)) { -LOG.info("Source path is missing. Ignoring exception."); -return 0; - } -} catch (Exception ex) { - LOG.warn("Source path missing check failed. ", ex); -} -// retry logic only for i/o exception -if (!(e instanceof IOException)) { - LOG.error("Unable to copy {} to {}", sourcePath, targetPath, e); - setException(e); - return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode(); -} - -if (currentRetry <= MAX_COPY_RETRY) { - LOG.warn("Unable to copy {} to {}", sourcePath, targetPath, e); -} else { - LOG.error("Unable to copy {} to {} even after retrying for {} time", sourcePath, targetPath, currentRetry, e); - setException(e); - return ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getErrorCode(); -} -int sleepTime = FileUtils.getSleepTime(currentRetry); -LOG.info("Sleep for " + sleepTime + " milliseconds before retry no " + (currentRetry)); -try { - Thread.sleep(sleepTime); -} catch (InterruptedException timerEx) { - LOG.info("Sleep interrupted", timerEx.getMessage()); -} -try { - if (proxyUser == null) { -proxyUser = Utils.getUGI(); - } - FileSystem.closeAllForUGI(proxyUser); -} catch (Exception ex) { - LOG.warn("Unable to closeAllForUGI for user " + proxyUser, ex); -} -return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode(); - } - @Override public int execute() { String distCpDoAsUser = conf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER); +Retryable retryable = Retryable.builder() + .withHiveConf(conf) + .withRetryOnException(IOException.class).build(); +try { + return retryable.executeCallable(() -> { +UserGroupInformation proxyUser = null; +Path sourcePath = work.getFullyQualifiedSourcePath(); +Path targetPath = work.getFullyQualifiedTargetPath(); +try { + if (conf.getBoolVar(HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE)) { +sourcePath = reservedRawPath(work.getFullyQualifiedSourcePath().toUri()); +targetPath = reservedRawPath(work.getFullyQualifiedTargetPath().toUri()); + } + UserGroupInformation ugi = Utils.getUGI(); + String currentUser = ugi.getShortUserName(); + if (distCpDoAsUser != null && !currentUser.equals(distCpDoAsUser)) { +proxyUser = UserGroupInformation.createProxyUser( + distCpDoAsUser, UserGroupInformation.getLoginUser()); + } -Path sourcePath = work.getFullyQualifiedSourcePath(); -Path targetPath = work.getFullyQualifiedTargetPath(); -if (conf.getBoolVar(HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE)) { - sourcePath = reservedRawPath(work.getFullyQualifiedSourcePath().toUri()); - targetPath = reservedRawPath(work.getFullyQualifiedTargetPath().toUri()); -} -int currentRetry = 0; -int error = 0; -UserGroupInformation proxyUser = null; -while (currentRetry <= MAX_COPY_RETRY) { - try { -UserGroupInformation ugi = Utils.getUGI(); -String currentUser = ugi.getShortUserName(); -if (distCpDoAsUser != null && !currentUser.equals(distCpDoAsUser)) { - proxyUser = UserGroupInformation.createProxyUser( - distCpDoAsUser, UserGroupInformation.getLoginUser()); -} - -setTargetPathOwner(targetPath, sourcePath, proxyUser); - -// do we create a new conf and only here provide this additional option so that we get away from -// differences of data in two location for the same directories ? -// basically add distcp.options.delete to hiveconf new object ? -FileUtils.distCp( -sourcePath.getFileSystem(conf), // source file system -Collections.singletonList(sourcePath), // list of source paths -targetPath, -false, -proxyUser, -conf, -ShimLoader.getHadoopShims()); -return 0; - } catch (Exception e) { -currentRetry++; -error = handleException(e, sourcePath, targetPath, currentRetry, proxyUser); -if (error == 0) { -
[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
aasha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466544742 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java ## @@ -132,31 +130,40 @@ private AtlasReplInfo createAtlasReplInfo() throws SemanticException, MalformedU private long lastStoredTimeStamp() throws SemanticException { Path prevMetadataPath = new Path(work.getPrevAtlasDumpDir(), EximUtil.METADATA_NAME); -BufferedReader br = null; +Retryable retryable = Retryable.builder() + .withHiveConf(conf) + .withRetryOnException(IOException.class) + .withFailOnException(FileNotFoundException.class).build(); try { - FileSystem fs = prevMetadataPath.getFileSystem(conf); - br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset())); - String line = br.readLine(); - if (line == null) { -throw new SemanticException("Could not read lastStoredTimeStamp from atlas metadata file"); - } - String[] lineContents = line.split("\t", 5); - return Long.parseLong(lineContents[1]); -} catch (Exception ex) { - throw new SemanticException(ex); -} finally { - if (br != null) { + return retryable.executeCallable(() -> { +BufferedReader br = null; try { - br.close(); -} catch (IOException e) { - throw new SemanticException(e); + FileSystem fs = prevMetadataPath.getFileSystem(conf); + br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset())); + String line = br.readLine(); + if (line == null) { +throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE + .format("Could not read lastStoredTimeStamp from atlas metadata file", "atlas")); + } + String[] lineContents = line.split("\t", 5); + return Long.parseLong(lineContents[1]); +} finally { + if (br != null) { +try { + br.close(); +} catch (IOException e) { + //Do nothing +} + } } - } + }); +} catch (Exception e) { + throw new SemanticException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e); Review comment: Yes but this is of type exception This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 commented on a change in pull request #1315: URL: https://github.com/apache/hive/pull/1315#discussion_r466534477 ## File path: ql/src/test/queries/clientpositive/prepare_plan.q ## @@ -0,0 +1,113 @@ +--! qt:dataset:src +--! qt:dataset:alltypesorc + +set hive.explain.user=false; +set hive.vectorized.execution.enabled=false; + +explain extended prepare pcount from select count(*) from src where key > ?; +prepare pcount from select count(*) from src where key > ?; +execute pcount using 200; + +-- single param +explain extended prepare p1 from select * from src where key > ? order by key limit 10; +prepare p1 from select * from src where key > ? order by key limit 10; + +execute p1 using 200; + +-- same query, different param +execute p1 using 0; + +-- same query, negative param +--TODO: fails (constant in grammar do not support negatives) +-- execute p1 using -1; Review comment: https://issues.apache.org/jira/browse/HIVE-24002 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 commented on a change in pull request #1315: URL: https://github.com/apache/hive/pull/1315#discussion_r466533583 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -1619,6 +1620,12 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis colName = enfd.getFieldName(); colType = enfd.getTypeString(); countDistincts = numRows; +} else if (end instanceof ExprDynamicParamDesc) { + //skip collecting stats for parameters Review comment: https://issues.apache.org/jira/browse/HIVE-24003 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 commented on a change in pull request #1315: URL: https://github.com/apache/hive/pull/1315#discussion_r466533366 ## File path: ql/src/test/results/clientpositive/llap/udf_greatest.q.out ## @@ -63,7 +63,7 @@ STAGE PLANS: alias: src Row Limit Per Split: 1 Select Operator -expressions: 'c' (type: string), 'a' (type: string), 'AaA' (type: string), 'AAA' (type: string), '13' (type: string), '2' (type: string), '03' (type: string), '1' (type: string), null (type: double), null (type: double), null (type: double), null (type: double), null (type: double), null (type: double) Review comment: There is a small change in the patch which updates the type inference rule for void/null. Prior to the change the expressions were being inferred as `Double` in this case. With the change it is appropriately inferred as `String` (since rest of the expressions within this UDF (`GREATEST('a', 'b', null )`) is interpreted as string. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause
vineetgarg02 commented on a change in pull request #1315: URL: https://github.com/apache/hive/pull/1315#discussion_r466531974 ## File path: ql/src/test/results/clientpositive/llap/prepare_plan.q.out ## @@ -0,0 +1,2512 @@ +PREHOOK: query: explain extended prepare pcount from select count(*) from src where key > ? +PREHOOK: type: QUERY +PREHOOK: Input: default@src + A masked pattern was here +POSTHOOK: query: explain extended prepare pcount from select count(*) from src where key > ? +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src + A masked pattern was here +OPTIMIZED SQL: SELECT COUNT(*) AS `$f0` +FROM `default`.`src` +WHERE `key` > CAST(? AS STRING) +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Tez + A masked pattern was here + Edges: +Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) + A masked pattern was here + Vertices: +Map 1 +Map Operator Tree: +TableScan + alias: src + filterExpr: (key > CAST( $1 AS STRING)) (type: boolean) + Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE + GatherStats: false + Filter Operator +isSamplingPred: false +predicate: (key > CAST( $1 AS STRING)) (type: boolean) +Statistics: Num rows: 166 Data size: 14442 Basic stats: COMPLETE Column stats: COMPLETE +Select Operator + Statistics: Num rows: 166 Data size: 14442 Basic stats: COMPLETE Column stats: COMPLETE + Group By Operator +aggregations: count() +minReductionHashAggr: 0.99 +mode: hash +outputColumnNames: _col0 +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +Reduce Output Operator + bucketingVersion: 2 + null sort order: + numBuckets: -1 + sort order: + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + tag: -1 + value expressions: _col0 (type: bigint) + auto parallelism: false +Execution mode: llap +LLAP IO: all inputs +Path -> Alias: + A masked pattern was here +Path -> Partition: + A masked pattern was here +Partition + base file name: src + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + properties: +bucket_count -1 +bucketing_version 2 +column.name.delimiter , +columns key,value +columns.types string:string + A masked pattern was here +name default.src +serialization.format 1 +serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + +input format: org.apache.hadoop.mapred.TextInputFormat +output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat +properties: + bucketing_version 2 + column.name.delimiter , + columns key,value + columns.comments 'default','default' + columns.types string:string + A masked pattern was here + name default.src + serialization.format 1 + serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe +serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe +name: default.src + name: default.src +Truncated Path -> Alias: + /src [src] +Reducer 2 +Execution mode: llap +Needs Tagging: false +Reduce Operator Tree: + Group By Operator +aggregations: count(VALUE._col0) +mode: mergepartial +outputColumnNames: _col0 +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +File Output Operator + bucketingVersion: 2 + compressed: false + GlobalTableId: 0 + A masked pattern was here +
[GitHub] [hive] viirya commented on pull request #1365: HIVE-23998: Upgrade guava to 27 for Hive 2.3 branch
viirya commented on pull request #1365: URL: https://github.com/apache/hive/pull/1365#issuecomment-670017177 @sunchao Ok, I see. I will do. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] deniskuzZ opened a new pull request #1369: HIVE-24000: Put exclusive MERGE INSERT under the feature flag
deniskuzZ opened a new pull request #1369: URL: https://github.com/apache/hive/pull/1369 (cherry picked from commit 0e4b02af485cb1972ebc4f251d853c710e70164f) ### What changes were proposed in this pull request? Pushed exclusive MERGE INSERT under the feature flag ### Why are the changes needed? Backward compatibility ### Does this PR introduce _any_ user-facing change? new feature flag property was introduced 'hive.txn.xlock.mergeinsert' ### How was this patch tested? TestDbTxnManager2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] deniskuzZ closed pull request #1366: HIVE-24000: Put exclusive MERGE INSERT under the feature flag
deniskuzZ closed pull request #1366: URL: https://github.com/apache/hive/pull/1366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] klcopp opened a new pull request #1368: HIVE-24001: Don't cache MapWork in tez/ObjectCache during query-based compaction
klcopp opened a new pull request #1368: URL: https://github.com/apache/hive/pull/1368 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pkumarsinha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication
pkumarsinha commented on a change in pull request #1358: URL: https://github.com/apache/hive/pull/1358#discussion_r466420110 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java ## @@ -132,31 +130,40 @@ private AtlasReplInfo createAtlasReplInfo() throws SemanticException, MalformedU private long lastStoredTimeStamp() throws SemanticException { Path prevMetadataPath = new Path(work.getPrevAtlasDumpDir(), EximUtil.METADATA_NAME); -BufferedReader br = null; +Retryable retryable = Retryable.builder() + .withHiveConf(conf) + .withRetryOnException(IOException.class) + .withFailOnException(FileNotFoundException.class).build(); try { - FileSystem fs = prevMetadataPath.getFileSystem(conf); - br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset())); - String line = br.readLine(); - if (line == null) { -throw new SemanticException("Could not read lastStoredTimeStamp from atlas metadata file"); - } - String[] lineContents = line.split("\t", 5); - return Long.parseLong(lineContents[1]); -} catch (Exception ex) { - throw new SemanticException(ex); -} finally { - if (br != null) { + return retryable.executeCallable(() -> { +BufferedReader br = null; try { - br.close(); -} catch (IOException e) { - throw new SemanticException(e); + FileSystem fs = prevMetadataPath.getFileSystem(conf); + br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset())); + String line = br.readLine(); + if (line == null) { +throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE Review comment: lastStoredTimeStamp is maintained by hive itself. Should we have better error message category for this? ## File path: common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java ## @@ -505,18 +505,9 @@ " queue: {1}. Please fix and try again.", true), SPARK_RUNTIME_OOM(20015, "Spark job failed because of out of memory."), - //if the error message is changed for REPL_EVENTS_MISSING_IN_METASTORE, then need modification in getNextNotification - //method in HiveMetaStoreClient - REPL_EVENTS_MISSING_IN_METASTORE(20016, "Notification events are missing in the meta store."), - REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID(20017, "Load path {0} not valid as target database is bootstrapped " + - "from some other path : {1}."), - REPL_FILE_MISSING_FROM_SRC_AND_CM_PATH(20018, "File is missing from both source and cm path."), - REPL_LOAD_PATH_NOT_FOUND(20019, "Load path does not exist."), - REPL_DATABASE_IS_NOT_SOURCE_OF_REPLICATION(20020, - "Source of replication (repl.source.for) is not set in the database properties."), - REPL_INVALID_DB_OR_TABLE_PATTERN(20021, - "Invalid pattern for the DB or table name in the replication policy. " - + "It should be a valid regex enclosed within single or double quotes."), + REPL_FILE_MISSING_FROM_SRC_AND_CM_PATH(20016, "File is missing from both source and cm path."), + REPL_EXTERNAL_SERVICE_CONNECTION_ERROR(20017, "Failed to connect to {0} service. Error code {1}.", +true), Review comment: nit: Can accommodate in same line. ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java ## @@ -42,11 +43,7 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; -import java.io.BufferedReader; -import java.io.IOException; -import java.io.InputStream; -import java.io.InputStreamReader; -import java.io.Serializable; +import java.io.*; Review comment: Should we revert this? ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/atlas/AtlasRestClientImpl.java ## @@ -125,17 +127,15 @@ private AtlasImportResult getDefaultAtlasImportResult(AtlasImportRequest request return new AtlasImportResult(request, "", "", "", 0L); } - public AtlasServer getServer(String endpoint) throws SemanticException { + public AtlasServer getServer(String endpoint, HiveConf conf) throws SemanticException { +Retryable retryable = Retryable.builder() + .withHiveConf(conf) + .withRetryOnException(Exception.class).build(); Review comment: Should we not retry on just AtlasServiceException and catch finally only that exception as that's what getServer says to throw? ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java ## @@ -43,15 +44,15 @@ */ public class DirCopyTask extends Task implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(DirCopyTask.class); - private static final int MAX_COPY_RETRY = 5; private boolean createAndSetPathOwner(Path destPath, Path sourcePath) throws IOException { FileSystem targetFs =
[GitHub] [hive] sunchao commented on pull request #1365: HIVE-23998: Upgrade guava to 27 for Hive 2.3 branch
sunchao commented on pull request #1365: URL: https://github.com/apache/hive/pull/1365#issuecomment-669959284 @viirya It may be because the PR action only support master and branch-2 right now, not branch-2.3. I suggest you submit a patch to the JIRA ticket similar to https://issues.apache.org/jira/browse/HIVE-22249. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] aasha opened a new pull request #1367: HIVE-23993 Handle irrecoverable errors
aasha opened a new pull request #1367: URL: https://github.com/apache/hive/pull/1367 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466421484 ## File path: ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out ## @@ -79,27 +79,25 @@ STAGE PLANS: Tez A masked pattern was here Edges: Review comment: The plan in `dynamic_semijoin_reduction_2.q` has three single column semijoin reducers that get merged to one multi column one. As a result three reducers get merged to one thus making the plan more compact. Apart from that, you are right that the multi column transformation can lead to further optimization opportunities. An example can be seen in query24.q.out (Check commit https://github.com/apache/hive/pull/1325/commits/c9f9112d0802906dce7442f3d4c01535a584af11). There the `SharedWorkOptimizer` kicks in and merges two semijoin reducer branches on the same scan operator. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] deniskuzZ opened a new pull request #1366: HIVE-24000: Put exclusive MERGE INSERT under the feature flag
deniskuzZ opened a new pull request #1366: URL: https://github.com/apache/hive/pull/1366 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466414395 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java ## @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.ColumnInfo; +import org.apache.hadoop.hive.ql.exec.FilterOperator; +import org.apache.hadoop.hive.ql.exec.GroupByOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorFactory; +import org.apache.hadoop.hive.ql.exec.OperatorUtils; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.RowSchema; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.io.AcidUtils; +import org.apache.hadoop.hive.ql.parse.GenTezUtils; +import org.apache.hadoop.hive.ql.parse.ParseContext; +import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo; +import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo; +import org.apache.hadoop.hive.ql.plan.AggregationDesc; +import org.apache.hadoop.hive.ql.plan.DynamicValue; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc; +import org.apache.hadoop.hive.ql.plan.FilterDesc; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.PlanUtils; +import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc; +import org.apache.hadoop.hive.ql.plan.SelectDesc; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd; +import org.apache.hadoop.hive.ql.util.NullOrdering; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; + +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Comparator; +import java.util.Deque; +import java.util.EnumSet; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.SortedMap; +import java.util.TreeMap; + +public class SemiJoinReductionMerge extends Transform { + + public ParseContext transform(ParseContext parseContext) throws SemanticException { +Map map = parseContext.getRsToSemiJoinBranchInfo(); +if (map.isEmpty()) { + return parseContext; +} +HiveConf hiveConf = parseContext.getConf(); + +// Order does not really matter but it is necessary to keep plans stable +SortedMap> sameTableSJ = +new TreeMap<>(Comparator.comparing(SJSourceTarget::toString)); +for (Map.Entry smjEntry : map.entrySet()) { + TableScanOperator ts = smjEntry.getValue().getTsOp(); + // Semijoin optimization branch should look like -SEL-GB1-RS1-GB2-RS2 + SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), SelectOperator.class, 0, 0, 0, 0); + assert selOp != null; + assert selOp.getParentOperators().size() == 1; + Operator source = selOp.getParentOperators().get(0); + SJSourceTarget sjKey = new SJSourceTarget(source, ts); + List ops = sameTableSJ.computeIfAbsent(sjKey, tableScanOperator -> new
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466410669 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java ## @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.ColumnInfo; +import org.apache.hadoop.hive.ql.exec.FilterOperator; +import org.apache.hadoop.hive.ql.exec.GroupByOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorFactory; +import org.apache.hadoop.hive.ql.exec.OperatorUtils; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.RowSchema; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.io.AcidUtils; +import org.apache.hadoop.hive.ql.parse.GenTezUtils; +import org.apache.hadoop.hive.ql.parse.ParseContext; +import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo; +import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo; +import org.apache.hadoop.hive.ql.plan.AggregationDesc; +import org.apache.hadoop.hive.ql.plan.DynamicValue; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc; +import org.apache.hadoop.hive.ql.plan.FilterDesc; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.PlanUtils; +import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc; +import org.apache.hadoop.hive.ql.plan.SelectDesc; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd; +import org.apache.hadoop.hive.ql.util.NullOrdering; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; + +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Comparator; +import java.util.Deque; +import java.util.EnumSet; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.SortedMap; +import java.util.TreeMap; + +public class SemiJoinReductionMerge extends Transform { + + public ParseContext transform(ParseContext parseContext) throws SemanticException { +Map map = parseContext.getRsToSemiJoinBranchInfo(); +if (map.isEmpty()) { + return parseContext; +} +HiveConf hiveConf = parseContext.getConf(); + +// Order does not really matter but it is necessary to keep plans stable +SortedMap> sameTableSJ = +new TreeMap<>(Comparator.comparing(SJSourceTarget::toString)); +for (Map.Entry smjEntry : map.entrySet()) { + TableScanOperator ts = smjEntry.getValue().getTsOp(); + // Semijoin optimization branch should look like -SEL-GB1-RS1-GB2-RS2 + SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), SelectOperator.class, 0, 0, 0, 0); + assert selOp != null; + assert selOp.getParentOperators().size() == 1; + Operator source = selOp.getParentOperators().get(0); + SJSourceTarget sjKey = new SJSourceTarget(source, ts); + List ops = sameTableSJ.computeIfAbsent(sjKey, tableScanOperator -> new
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466404775 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -2054,7 +2067,8 @@ private void markSemiJoinForDPP(OptimizeTezProcContext procCtx) // Lookup nDVs on TS side. RuntimeValuesInfo rti = procCtx.parseContext .getRsToRuntimeValuesInfoMap().get(rs); - ExprNodeDesc tsExpr = rti.getTsColExpr(); + // TODO Adapt for multi column semi-joins. Review comment: I meant to handle this as part of HIVE-23934. I added the reference to the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466404256 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1887,13 +1898,14 @@ private void removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx) // Check the ndv/rows from the SEL vs the destination tablescan the semijoin opt is going to. TableScanOperator ts = sjInfo.getTsOp(); RuntimeValuesInfo rti = procCtx.parseContext.getRsToRuntimeValuesInfoMap().get(rs); -ExprNodeDesc tsExpr = rti.getTsColExpr(); -// In the SEL operator of the semijoin branch, there should be only one column in the operator -ExprNodeDesc selExpr = sel.getConf().getColList().get(0); +List targetColumns = rti.getTargetColumns(); +// In multi column semijoin branches the last column of the SEL operator is hash(c1, c2, ..., cn) +// so we shouldn't consider it. +List sourceColumns = sel.getConf().getColList().subList(0, targetColumns.size()); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466368067 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1737,35 +1737,46 @@ private static double getBloomFilterBenefit( } } -// Selectivity: key cardinality of semijoin / domain cardinality -// Benefit (rows filtered from ts): (1 - selectivity) * # ts rows -double selectivity = selKeyCardinality / (double) keyDomainCardinality; -selectivity = Math.min(selectivity, 1); -benefit = tsRows * (1 - selectivity); - if (LOG.isDebugEnabled()) { - LOG.debug("BloomFilter benefit for " + selCol + " to " + tsCol - + ", selKeyCardinality=" + selKeyCardinality - + ", tsKeyCardinality=" + tsKeyCardinality - + ", tsRows=" + tsRows - + ", keyDomainCardinality=" + keyDomainCardinality); - LOG.debug("SemiJoin key selectivity=" + selectivity - + ", benefit=" + benefit); + LOG.debug("BloomFilter selectivity for " + selCol + " to " + tsCol + ", selKeyCardinality=" + selKeyCardinality + + ", tsKeyCardinality=" + tsKeyCardinality + ", keyDomainCardinality=" + keyDomainCardinality); } +// Selectivity: key cardinality of semijoin / domain cardinality +return selKeyCardinality / (double) keyDomainCardinality; + } -return benefit; + private static double getBloomFilterBenefit( + SelectOperator sel, List selExpr, + Statistics filStats, List tsExpr) { +if (sel.getStatistics() == null || filStats == null) { + LOG.debug("No stats available to compute BloomFilter benefit"); + return -1; +} +double selectivity = 0.0; +for (int i = 0; i < tsExpr.size(); i++) { + selectivity = Math.max(selectivity, getBloomFilterSelectivity(sel, selExpr.get(i), filStats, tsExpr.get(i))); Review comment: You are right, I was the one confused. I applied the change along with some doc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466367481 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java ## @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.ColumnInfo; +import org.apache.hadoop.hive.ql.exec.FilterOperator; +import org.apache.hadoop.hive.ql.exec.GroupByOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorFactory; +import org.apache.hadoop.hive.ql.exec.OperatorUtils; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.RowSchema; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.io.AcidUtils; +import org.apache.hadoop.hive.ql.parse.GenTezUtils; +import org.apache.hadoop.hive.ql.parse.ParseContext; +import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo; +import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo; +import org.apache.hadoop.hive.ql.plan.AggregationDesc; +import org.apache.hadoop.hive.ql.plan.DynamicValue; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc; +import org.apache.hadoop.hive.ql.plan.FilterDesc; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.PlanUtils; +import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc; +import org.apache.hadoop.hive.ql.plan.SelectDesc; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd; +import org.apache.hadoop.hive.ql.util.NullOrdering; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; + +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Comparator; +import java.util.Deque; +import java.util.EnumSet; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.SortedMap; +import java.util.TreeMap; + +public class SemiJoinReductionMerge extends Transform { + + public ParseContext transform(ParseContext parseContext) throws SemanticException { +Map map = parseContext.getRsToSemiJoinBranchInfo(); +if (map.isEmpty()) { + return parseContext; +} +HiveConf hiveConf = parseContext.getConf(); + +// Order does not really matter but it is necessary to keep plans stable +SortedMap> sameTableSJ = +new TreeMap<>(Comparator.comparing(SJSourceTarget::toString)); +for (Map.Entry smjEntry : map.entrySet()) { + TableScanOperator ts = smjEntry.getValue().getTsOp(); + // Semijoin optimization branch should look like -SEL-GB1-RS1-GB2-RS2 + SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), SelectOperator.class, 0, 0, 0, 0); + assert selOp != null; + assert selOp.getParentOperators().size() == 1; + Operator source = selOp.getParentOperators().get(0); + SJSourceTarget sjKey = new SJSourceTarget(source, ts); + List ops = sameTableSJ.computeIfAbsent(sjKey, tableScanOperator -> new
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466366197 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java ## @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.ColumnInfo; +import org.apache.hadoop.hive.ql.exec.FilterOperator; +import org.apache.hadoop.hive.ql.exec.GroupByOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorFactory; +import org.apache.hadoop.hive.ql.exec.OperatorUtils; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.RowSchema; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.io.AcidUtils; +import org.apache.hadoop.hive.ql.parse.GenTezUtils; +import org.apache.hadoop.hive.ql.parse.ParseContext; +import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo; +import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo; +import org.apache.hadoop.hive.ql.plan.AggregationDesc; +import org.apache.hadoop.hive.ql.plan.DynamicValue; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc; +import org.apache.hadoop.hive.ql.plan.FilterDesc; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.PlanUtils; +import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc; +import org.apache.hadoop.hive.ql.plan.SelectDesc; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd; +import org.apache.hadoop.hive.ql.util.NullOrdering; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; + +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Comparator; +import java.util.Deque; +import java.util.EnumSet; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.SortedMap; +import java.util.TreeMap; + +public class SemiJoinReductionMerge extends Transform { + + public ParseContext transform(ParseContext parseContext) throws SemanticException { +Map map = parseContext.getRsToSemiJoinBranchInfo(); +if (map.isEmpty()) { + return parseContext; +} +HiveConf hiveConf = parseContext.getConf(); + +// Order does not really matter but it is necessary to keep plans stable +SortedMap> sameTableSJ = +new TreeMap<>(Comparator.comparing(SJSourceTarget::toString)); +for (Map.Entry smjEntry : map.entrySet()) { + TableScanOperator ts = smjEntry.getValue().getTsOp(); + // Semijoin optimization branch should look like -SEL-GB1-RS1-GB2-RS2 + SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), SelectOperator.class, 0, 0, 0, 0); + assert selOp != null; + assert selOp.getParentOperators().size() == 1; + Operator source = selOp.getParentOperators().get(0); + SJSourceTarget sjKey = new SJSourceTarget(source, ts); + List ops = sameTableSJ.computeIfAbsent(sjKey, tableScanOperator -> new
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466365583 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java ## @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.ColumnInfo; +import org.apache.hadoop.hive.ql.exec.FilterOperator; +import org.apache.hadoop.hive.ql.exec.GroupByOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorFactory; +import org.apache.hadoop.hive.ql.exec.OperatorUtils; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.RowSchema; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.io.AcidUtils; +import org.apache.hadoop.hive.ql.parse.GenTezUtils; +import org.apache.hadoop.hive.ql.parse.ParseContext; +import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo; +import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo; +import org.apache.hadoop.hive.ql.plan.AggregationDesc; +import org.apache.hadoop.hive.ql.plan.DynamicValue; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc; +import org.apache.hadoop.hive.ql.plan.FilterDesc; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.PlanUtils; +import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc; +import org.apache.hadoop.hive.ql.plan.SelectDesc; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd; +import org.apache.hadoop.hive.ql.util.NullOrdering; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; + +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Comparator; +import java.util.Deque; +import java.util.EnumSet; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.SortedMap; +import java.util.TreeMap; + +public class SemiJoinReductionMerge extends Transform { + + public ParseContext transform(ParseContext parseContext) throws SemanticException { +Map map = parseContext.getRsToSemiJoinBranchInfo(); +if (map.isEmpty()) { + return parseContext; +} +HiveConf hiveConf = parseContext.getConf(); + +// Order does not really matter but it is necessary to keep plans stable +SortedMap> sameTableSJ = +new TreeMap<>(Comparator.comparing(SJSourceTarget::toString)); +for (Map.Entry smjEntry : map.entrySet()) { + TableScanOperator ts = smjEntry.getValue().getTsOp(); + // Semijoin optimization branch should look like -SEL-GB1-RS1-GB2-RS2 + SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), SelectOperator.class, 0, 0, 0, 0); + assert selOp != null; + assert selOp.getParentOperators().size() == 1; + Operator source = selOp.getParentOperators().get(0); + SJSourceTarget sjKey = new SJSourceTarget(source, ts); + List ops = sameTableSJ.computeIfAbsent(sjKey, tableScanOperator -> new
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466365453 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java ## @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.ColumnInfo; +import org.apache.hadoop.hive.ql.exec.FilterOperator; +import org.apache.hadoop.hive.ql.exec.GroupByOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorFactory; +import org.apache.hadoop.hive.ql.exec.OperatorUtils; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.RowSchema; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.io.AcidUtils; +import org.apache.hadoop.hive.ql.parse.GenTezUtils; +import org.apache.hadoop.hive.ql.parse.ParseContext; +import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo; +import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo; +import org.apache.hadoop.hive.ql.plan.AggregationDesc; +import org.apache.hadoop.hive.ql.plan.DynamicValue; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc; +import org.apache.hadoop.hive.ql.plan.FilterDesc; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.PlanUtils; +import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc; +import org.apache.hadoop.hive.ql.plan.SelectDesc; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd; +import org.apache.hadoop.hive.ql.util.NullOrdering; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; + +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Comparator; +import java.util.Deque; +import java.util.EnumSet; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.SortedMap; +import java.util.TreeMap; + +public class SemiJoinReductionMerge extends Transform { + + public ParseContext transform(ParseContext parseContext) throws SemanticException { +Map map = parseContext.getRsToSemiJoinBranchInfo(); +if (map.isEmpty()) { + return parseContext; +} +HiveConf hiveConf = parseContext.getConf(); + +// Order does not really matter but it is necessary to keep plans stable +SortedMap> sameTableSJ = +new TreeMap<>(Comparator.comparing(SJSourceTarget::toString)); +for (Map.Entry smjEntry : map.entrySet()) { + TableScanOperator ts = smjEntry.getValue().getTsOp(); + // Semijoin optimization branch should look like -SEL-GB1-RS1-GB2-RS2 + SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), SelectOperator.class, 0, 0, 0, 0); + assert selOp != null; Review comment: Done. Didn't add `checkNotNull` since NPE will be thrown anyways and it is rather informative as well. This is an automated message from the Apache Git Service. To respond
[GitHub] [hive] GuoPhilipse commented on pull request #1363: HIVE-23996: Remove unused line to keep code clean
GuoPhilipse commented on pull request #1363: URL: https://github.com/apache/hive/pull/1363#issuecomment-669876714 cc @pvary This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets
zabetak commented on a change in pull request #1325: URL: https://github.com/apache/hive/pull/1325#discussion_r466347783 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java ## @@ -53,6 +53,34 @@ private static final Logger LOG = LoggerFactory.getLogger(OperatorUtils.class); + /** + * Return the ancestor of the specified operator at the provided path or null if the path is invalid. + * + * The method is equivalent to following code: + * {@code + * op.getParentOperators().get(path[0]) + * .getParentOperators().get(path[1]) + * ... + * .getParentOperators().get(path[n]) + * } + * with additional checks about the validity of the provided path and the type of the ancestor. + * + * @param op the operator for which we + * @param clazz the class of the ancestor operator + * @param path the path leading to the desired ancestor + * @param the type of the ancestor + * @return the ancestor of the specified operator at the provided path or null if the path is invalid. + */ + public static T ancestor(Operator op, Class clazz, int... path) { +Operator target = op; +for (int i = 0; i < path.length; i++) { + if (target.getParentOperators() == null || path[i] > target.getParentOperators().size()) Review comment: Done. I also configured IntelliJ to force their usage in single line statements so hopefully they should never appear. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java ## @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.ColumnInfo; +import org.apache.hadoop.hive.ql.exec.FilterOperator; +import org.apache.hadoop.hive.ql.exec.GroupByOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorFactory; +import org.apache.hadoop.hive.ql.exec.OperatorUtils; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.RowSchema; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.io.AcidUtils; +import org.apache.hadoop.hive.ql.parse.GenTezUtils; +import org.apache.hadoop.hive.ql.parse.ParseContext; +import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo; +import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo; +import org.apache.hadoop.hive.ql.plan.AggregationDesc; +import org.apache.hadoop.hive.ql.plan.DynamicValue; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc; +import org.apache.hadoop.hive.ql.plan.FilterDesc; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.PlanUtils; +import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc; +import org.apache.hadoop.hive.ql.plan.SelectDesc; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd; +import org.apache.hadoop.hive.ql.util.NullOrdering; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; + +import java.util.ArrayDeque; +import java.util.ArrayList; +import
[GitHub] [hive] zabetak commented on a change in pull request #1357: HIVE-23963: UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule
zabetak commented on a change in pull request #1357: URL: https://github.com/apache/hive/pull/1357#discussion_r466296959 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelDistribution.java ## @@ -81,8 +84,16 @@ public RelDistribution apply(TargetMapping mapping) { return this; } List newKeys = new ArrayList<>(keys.size()); + +// Instead of using a HashMap for lookup newKeys.add(mapping.getTargetOpt(key)); should be called but not all the +// mapping supports that. See HIVE-23963. Replace this when this is fixed in calcite. Review comment: If it is meant to be fixed in Calcite then we should create a JIRA and add an entry in `org.apache.hadoop.hive.ql.optimizer.calcite.Bug`. We could even skip the JIRA creation and move this comment in `Bug` as `CALCITE-X_fixed` or something similar. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] dengzhhu653 commented on pull request #1205: HIVE-23800: Add hooks when HiveServer2 stops due to OutOfMemoryError
dengzhhu653 commented on pull request #1205: URL: https://github.com/apache/hive/pull/1205#issuecomment-669832758 > sorry @dengzhhu653 lately I was a little bit flooded with all kind of stuff...and right now I'm on holiday - will get back to your patch next week! sorry for disturbing you. Have a nice holiday! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] dengzhhu653 edited a comment on pull request #1205: HIVE-23800: Add hooks when HiveServer2 stops due to OutOfMemoryError
dengzhhu653 edited a comment on pull request #1205: URL: https://github.com/apache/hive/pull/1205#issuecomment-669832758 > sorry @dengzhhu653 lately I was a little bit flooded with all kind of stuff...and right now I'm on holiday - will get back to your patch next week! sorry for disturbing you. Have a nice holiday! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on pull request #1205: HIVE-23800: Add hooks when HiveServer2 stops due to OutOfMemoryError
kgyrtkirk commented on pull request #1205: URL: https://github.com/apache/hive/pull/1205#issuecomment-669804983 sorry @dengzhhu653 lately I was a little bit flooded with all kind of stuff...and right now I'm on holiday - will get back to your patch next week! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kishendas commented on pull request #1355: send tableId to get_partition APIs
kishendas commented on pull request #1355: URL: https://github.com/apache/hive/pull/1355#issuecomment-669758040 > Can you add (or modify) some tests to make sure these APIs don't regress in future. Sure, I have added tests to make sure both validWriteIdList and tableId are sent from HS2 for the newly added HMS get_* APIs that take validWriteIdList and tableId in the input. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org