[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466826953 ## File path: ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query94_anti_join.q.out ## @@ -0,0 +1,94 @@ +PREHOOK: query: explain cbo +select + count(distinct ws_order_number) as `order count` + ,sum(ws_ext_ship_cost) as `total shipping cost` + ,sum(ws_net_profit) as `total net profit` +from + web_sales ws1 + ,date_dim + ,customer_address + ,web_site +where +d_date between '1999-5-01' and + (cast('1999-5-01' as date) + 60 days) +and ws1.ws_ship_date_sk = d_date_sk +and ws1.ws_ship_addr_sk = ca_address_sk +and ca_state = 'TX' +and ws1.ws_web_site_sk = web_site_sk +and web_company_name = 'pri' +and exists (select * +from web_sales ws2 +where ws1.ws_order_number = ws2.ws_order_number + and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) +and not exists(select * + from web_returns wr1 + where ws1.ws_order_number = wr1.wr_order_number) +order by count(distinct ws_order_number) +limit 100 +PREHOOK: type: QUERY +PREHOOK: Input: default@customer_address +PREHOOK: Input: default@date_dim +PREHOOK: Input: default@web_returns +PREHOOK: Input: default@web_sales +PREHOOK: Input: default@web_site +PREHOOK: Output: hdfs://### HDFS PATH ### +POSTHOOK: query: explain cbo +select + count(distinct ws_order_number) as `order count` + ,sum(ws_ext_ship_cost) as `total shipping cost` + ,sum(ws_net_profit) as `total net profit` +from + web_sales ws1 + ,date_dim + ,customer_address + ,web_site +where +d_date between '1999-5-01' and + (cast('1999-5-01' as date) + 60 days) +and ws1.ws_ship_date_sk = d_date_sk +and ws1.ws_ship_addr_sk = ca_address_sk +and ca_state = 'TX' +and ws1.ws_web_site_sk = web_site_sk +and web_company_name = 'pri' +and exists (select * +from web_sales ws2 +where ws1.ws_order_number = ws2.ws_order_number + and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) +and not exists(select * + from web_returns wr1 + where ws1.ws_order_number = wr1.wr_order_number) +order by count(distinct ws_order_number) +limit 100 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@customer_address +POSTHOOK: Input: default@date_dim +POSTHOOK: Input: default@web_returns +POSTHOOK: Input: default@web_sales +POSTHOOK: Input: default@web_site +POSTHOOK: Output: hdfs://### HDFS PATH ### +CBO PLAN: +HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], agg#2=[sum($6)]) + HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], cost=[not available]) Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466826675 ## File path: ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out ## @@ -0,0 +1,99 @@ +PREHOOK: query: explain cbo Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466826103 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. \n" + "If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the\n" + "specified size, the join is directly converted to a mapjoin (there is no conditional task)."), - +HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466825617 ## File path: ql/src/test/results/clientpositive/llap/subquery_notexists_having.q.out ## @@ -31,7 +31,8 @@ STAGE PLANS: Tez A masked pattern was here Edges: -Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE) +Reducer 2 <- Map 1 (SIMPLE_EDGE) Review comment: yes ..the join is getting converted to SMB join ..and so no reducer is required. In case of anti join its not getting converted. That is because left outer is adding an extra group by which is making the RS node on left and right side equal, the pre-condition for converting to SMB join. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466822516 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -2129,6 +2133,16 @@ private RelNode applyPreJoinOrderingTransforms(RelNode basePlan, RelMetadataProv HiveRemoveSqCountCheck.INSTANCE); } + // 10. Convert left outer join + null filter on right side table column to anti join. Add this + // rule after all the optimization for which calcite support for anti join is missing. + // Needs to be done before ProjectRemoveRule as it expect a project over filter. + // This is done before join re-ordering as join re-ordering is converting the left outer Review comment: As discussed, i have created a Jira https://issues.apache.org/jira/browse/HIVE-24013 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466819572 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java ## @@ -92,7 +104,7 @@ public void onMatch(RelOptRuleCall call) { Set rightPushedPredicates = Sets.newHashSet(registry.getPushedPredicates(join, 1)); boolean genPredOnLeft = join.getJoinType() == JoinRelType.RIGHT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin(); -boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin(); +boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin()|| join.getJoinType() == JoinRelType.ANTI; Review comment: yes ..that is taken care of. // For anti join, we should proceed to emit records if the right side is empty or not matching. if (type == JoinDesc.ANTI_JOIN && !producedRow) { This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466819149 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAntiSemiJoinRule.java ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAntiJoin; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveAntiSemiJoinRule extends RelOptRule { + protected static final Logger LOG = LoggerFactory.getLogger(HiveAntiSemiJoinRule.class); + public static final HiveAntiSemiJoinRule INSTANCE = new HiveAntiSemiJoinRule(); + + //HiveProject(fld=[$0]) + // HiveFilter(condition=[IS NULL($1)]) + //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[not available]) + // + // TO + // + //HiveProject(fld_tbl=[$0]) + // HiveAntiJoin(condition=[=($0, $1)], joinType=[anti]) + // + public HiveAntiSemiJoinRule() { +super(operand(Project.class, operand(Filter.class, operand(Join.class, RelOptRule.any(, +"HiveJoinWithFilterToAntiJoinRule:filter"); + } + + // is null filter over a left join. + public void onMatch(final RelOptRuleCall call) { +final Project project = call.rel(0); +final Filter filter = call.rel(1); +final Join join = call.rel(2); +perform(call, project, filter, join); + } + + protected void perform(RelOptRuleCall call, Project project, Filter filter, Join join) { +LOG.debug("Start Matching HiveAntiJoinRule"); + +//TODO : Need to support this scenario. +if (join.getCondition().isAlwaysTrue()) { + return; +} + +//We support conversion from left outer join only. +if (join.getJoinType() != JoinRelType.LEFT) { + return; +} + +assert (filter != null); + +// If null filter is not present from right side then we can not convert to anti join. +List aboveFilters = RelOptUtil.conjunctions(filter.getCondition()); +Stream nullFilters = aboveFilters.stream().filter(filterNode -> filterNode.getKind() == SqlKind.IS_NULL); +boolean hasNullFilter = HiveCalciteUtil.hasAnyExpressionFromRightSide(join, nullFilters.collect(Collectors.toList())); +if (!hasNullFilter) { + return; +} + +// If any projection is there from right side, then we can not convert to anti join. +boolean hasProjection = HiveCalciteUtil.hasAnyExpressionFromRightSide(join, project.getProjects()); +if (hasProjection) { + return; +} + +LOG.debug("Matched HiveAntiJoinRule"); + +// Build anti join with same left, right child and condition as original left outer join. +Join anti = HiveAntiJoin.getAntiJoin(join.getLeft().getCluster(), join.getLeft().getTraitSet(), +join.getLeft(), join.getRight(), join.getCondition()); +RelNode newProject = project.copy(project.getTraitSet(), anti, project.getProjects(), project.getRowType()); +call.transformTo(newProject); Review comment: for normal filter, its being pushed down. here we get filters which can not be pushed down. I have modified the code to handle those filters. And added these extra tests to verify. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466818492 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinMultiKeyOperator.java ## @@ -0,0 +1,400 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinBytesHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.apache.hadoop.hive.serde2.ByteStream.Output; +import org.apache.hadoop.hive.serde2.binarysortable.fast.BinarySortableSerializeWrite; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// Multi-Key hash table import. +// Multi-Key specific imports. + +// TODO : Duplicate codes need to merge with semi join. +/* + * Specialized class for doing a vectorized map join that is an anti join on Multi-Key + * using hash set. + */ +public class VectorMapJoinAntiJoinMultiKeyOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + + // + + private static final String CLASS_NAME = VectorMapJoinAntiJoinMultiKeyOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // + + // (none) + + // The above members are initialized by the constructor and must not be + // transient. + //--- + + // The hash map for this specialized class. + private transient VectorMapJoinBytesHashSet hashSet; + + //--- + // Multi-Key specific members. + // + + // Object that can take a set of columns in row in a vectorized row batch and serialized it. + // Known to not have any nulls. + private transient VectorSerializeRow keyVectorSerializeWrite; + + // The BinarySortable serialization of the current key. + private transient Output currentKeyOutput; + + // The BinarySortable serialization of the saved key for a possible series of equal keys. + private transient Output saveKeyOutput; + + //--- + // Pass-thru constructors. + // + + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinMultiKeyOperator() { +super(); + } + + public VectorMapJoinAntiJoinMultiKeyOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinMultiKeyOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + //--- + // Process Multi-Key Anti Join on a vectorized row batch. + // + + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +/* + * Initialize Multi-Key members for this specialized class. + */ + +keyVectorSerializeWrite = new VectorSerializeRow(BinarySortableSerializeWrite.with( +this.getConf().getKeyTblDesc().getProperties(), bigTableKeyColumnMap.length)); +keyVectorSerializeWrite.init(bigTableKeyTypeInfos, bigTableKeyColumnMap); + +currentKeyOutput = new
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r466818194 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java ## @@ -0,0 +1,315 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// TODO : Duplicate codes need to merge with semi join. +// Single-Column Long hash table import. +// Single-Column Long specific imports. + +/* + * Specialized class for doing a vectorized map join that is an anti join on a Single-Column Long + * using a hash set. + */ +public class VectorMapJoinAntiJoinLongOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + private static final String CLASS_NAME = VectorMapJoinAntiJoinLongOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // The above members are initialized by the constructor and must not be + // transient. + + // The hash map for this specialized class. + private transient VectorMapJoinLongHashSet hashSet; + + // Single-Column Long specific members. + // For integers, we have optional min/max filtering. + private transient boolean useMinMax; + private transient long min; + private transient long max; + + // The column number for this one column join specialization. + private transient int singleJoinColumn; + + // Pass-thru constructors. + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinLongOperator() { +super(); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + // Process Single-Column Long Anti Join on a vectorized row batch. + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +// Initialize Single-Column Long members for this specialized class. +singleJoinColumn = bigTableKeyColumnMap[0]; + } + + @Override + public void hashTableSetup() throws HiveException { +super.hashTableSetup(); + +// Get our Single-Column Long hash set information for this specialized class. +hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable; +useMinMax = hashSet.useMinMax(); +if (useMinMax) { + min = hashSet.min(); + max = hashSet.max(); +} + } + + @Override + public void processBatch(VectorizedRowBatch batch) throws HiveException { + +try { + // (Currently none) + // antiPerBatchSetup(batch); + + // For anti joins, we may apply the filter(s) now. + for(VectorExpression ve : bigTableFilterExpressions) { +ve.evaluate(batch); + } + + final int inputLogicalSize = batch.size; + if (inputLogicalSize == 0) { +return; + } + + // Perform any key expressions. Results will go into scratch columns. + if (bigTableKeyExpressions != null) { +for (VectorExpression ve : bigTableKeyExpressions) { + ve.evaluate(batch); +} + } + + // The one join column for this specialized class. +
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r465297563 ## File path: ql/src/test/queries/clientpositive/subquery_in_having.q ## @@ -140,6 +140,22 @@ CREATE TABLE src_null_n4 (key STRING COMMENT 'default', value STRING COMMENT 'de LOAD DATA LOCAL INPATH "../../data/files/kv1.txt" INTO TABLE src_null_n4; INSERT INTO src_null_n4 values('5444', null); +explain +select key, value, count(*) Review comment: By default anti join conversion is set to true. I have added few test cases with anti join set to false. ## File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java ## @@ -203,6 +203,7 @@ private boolean filterExists(ReduceSinkOperator target, ExprNodeDesc replaced) { vector.add(right, left); break; case JoinDesc.LEFT_OUTER_JOIN: +case JoinDesc.ANTI_JOIN: //TODO : need to test Review comment: removed the comment. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java ## @@ -183,6 +189,7 @@ public void onMatch(RelOptRuleCall call) { switch (joinType) { case SEMI: case INNER: +case ANTI: Review comment: done ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java ## @@ -92,7 +104,7 @@ public void onMatch(RelOptRuleCall call) { Set rightPushedPredicates = Sets.newHashSet(registry.getPushedPredicates(join, 1)); boolean genPredOnLeft = join.getJoinType() == JoinRelType.RIGHT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin(); -boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin(); +boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || join.getJoinType() == JoinRelType.INNER || join.isSemiJoin()|| join.getJoinType() == JoinRelType.ANTI; Review comment: Yes ..if right side is null then it emits all the right side records ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAntiSemiJoinRule.java ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAntiJoin; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveAntiSemiJoinRule extends RelOptRule { + protected static final Logger LOG = LoggerFactory.getLogger(HiveAntiSemiJoinRule.class); + public static final HiveAntiSemiJoinRule INSTANCE = new HiveAntiSemiJoinRule(); + + //HiveProject(fld=[$0]) + // HiveFilter(condition=[IS NULL($1)]) + //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[not available]) + // + // TO + // + //HiveProject(fld_tbl=[$0]) + // HiveAntiJoin(condition=[=($0, $1)], joinType=[anti]) + // + public HiveAntiSemiJoinRule() { +super(operand(Project.class, operand(Filter.class, operand(Join.class, RelOptRule.any(, +"HiveJoinWithFilterToAntiJoinRule:filter"); + } + + // is null filter over a left join. + public void onMatch(final RelOptRuleCall call) { +final Project project = call.rel(0); +final Filter filter = call.rel(1); +final Join join = call.rel(2); +perform(call, project, filter, join); + } + + protected void
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460606016 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinGenerateResultOperator.java ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSet; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSetResult; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashTableResult; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; + +// TODO : This class is duplicate of semi join. Need to do a refactoring to merge it with semi join. +/** + * This class has methods for generating vectorized join results for Anti joins. + * The big difference between inner joins and anti joins is existence testing. + * Inner joins use a hash map to lookup the 1 or more small table values. + * Anti joins are a specialized join for outputting big table rows whose key exists Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460605730 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ## @@ -523,11 +533,19 @@ private boolean createForwardJoinObject(boolean[] skip) throws HiveException { forward = true; } } +return forward; + } + + // returns whether a record was forwarded + private boolean createForwardJoinObject(boolean[] skip, boolean antiJoin) throws HiveException { +boolean forward = fillFwdCache(skip); if (forward) { if (needsPostEvaluation) { forward = !JoinUtil.isFiltered(forwardCache, residualJoinFilters, residualJoinFiltersOIs); } - if (forward) { + + // For anti join, check all right side and if nothing is matched then only forward. Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460605836 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinGenerateResultOperator.java ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSet; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSetResult; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashTableResult; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; + +// TODO : This class is duplicate of semi join. Need to do a refactoring to merge it with semi join. +/** + * This class has methods for generating vectorized join results for Anti joins. + * The big difference between inner joins and anti joins is existence testing. + * Inner joins use a hash map to lookup the 1 or more small table values. + * Anti joins are a specialized join for outputting big table rows whose key exists + * in the small table. + * + * No small table values are needed for anti since they would be empty. So, + * we use a hash set as the hash table. Hash sets just report whether a key exists. This + * is a big performance optimization. + */ +public abstract class VectorMapJoinAntiJoinGenerateResultOperator +extends VectorMapJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + private static final Logger LOG = LoggerFactory.getLogger(VectorMapJoinAntiJoinGenerateResultOperator.class.getName()); + + // Anti join specific members. + + // An array of hash set results so we can do lookups on the whole batch before output result + // generation. + protected transient VectorMapJoinHashSetResult hashSetResults[]; + + // Pre-allocated member for storing the (physical) batch index of matching row (single- or + // multi-small-table-valued) indexes during a process call. + protected transient int[] allMatchs; + + // Pre-allocated member for storing the (physical) batch index of rows that need to be spilled. + protected transient int[] spills; + + // Pre-allocated member for storing index into the hashSetResults for each spilled row. + protected transient int[] spillHashMapResultIndices; + + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinGenerateResultOperator() { +super(); + } + + public VectorMapJoinAntiJoinGenerateResultOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinGenerateResultOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + /* + * Setup our anti join specific members. + */ + protected void commonSetup() throws HiveException { +super.commonSetup(); + +// Anti join specific. +VectorMapJoinHashSet baseHashSet = (VectorMapJoinHashSet) vectorMapJoinHashTable; + +hashSetResults = new VectorMapJoinHashSetResult[VectorizedRowBatch.DEFAULT_SIZE]; +for (int i = 0; i < hashSetResults.length; i++) { + hashSetResults[i] = baseHashSet.createHashSetResult(); +} + +allMatchs = new int[VectorizedRowBatch.DEFAULT_SIZE]; + +spills = new int[VectorizedRowBatch.DEFAULT_SIZE]; +spillHashMapResultIndices = new int[VectorizedRowBatch.DEFAULT_SIZE]; + } + + //--- + + /* + * Anti join (hash
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460605498 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ## @@ -638,6 +657,12 @@ private void genObject(int aliasNum, boolean allLeftFirst, boolean allLeftNull) // skipping the rest of the rows in the rhs table of the semijoin done = !needsPostEvaluation; } + } else if (type == JoinDesc.ANTI_JOIN) { +if (innerJoin(skip, left, right)) { + // if anti join found a match then the condition is not matched for anti join, so we can skip rest of the Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460522562 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java ## @@ -0,0 +1,315 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// TODO : Duplicate codes need to merge with semi join. +// Single-Column Long hash table import. +// Single-Column Long specific imports. + +/* + * Specialized class for doing a vectorized map join that is an anti join on a Single-Column Long + * using a hash set. + */ +public class VectorMapJoinAntiJoinLongOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + private static final String CLASS_NAME = VectorMapJoinAntiJoinLongOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // The above members are initialized by the constructor and must not be + // transient. + + // The hash map for this specialized class. + private transient VectorMapJoinLongHashSet hashSet; + + // Single-Column Long specific members. + // For integers, we have optional min/max filtering. + private transient boolean useMinMax; + private transient long min; + private transient long max; + + // The column number for this one column join specialization. + private transient int singleJoinColumn; + + // Pass-thru constructors. + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinLongOperator() { +super(); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + // Process Single-Column Long Anti Join on a vectorized row batch. + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +// Initialize Single-Column Long members for this specialized class. +singleJoinColumn = bigTableKeyColumnMap[0]; + } + + @Override + public void hashTableSetup() throws HiveException { +super.hashTableSetup(); + +// Get our Single-Column Long hash set information for this specialized class. +hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable; +useMinMax = hashSet.useMinMax(); +if (useMinMax) { + min = hashSet.min(); + max = hashSet.max(); +} + } + + @Override + public void processBatch(VectorizedRowBatch batch) throws HiveException { + +try { + // (Currently none) + // antiPerBatchSetup(batch); + + // For anti joins, we may apply the filter(s) now. + for(VectorExpression ve : bigTableFilterExpressions) { +ve.evaluate(batch); + } + + final int inputLogicalSize = batch.size; + if (inputLogicalSize == 0) { +return; + } + + // Perform any key expressions. Results will go into scratch columns. + if (bigTableKeyExpressions != null) { +for (VectorExpression ve : bigTableKeyExpressions) { + ve.evaluate(batch); +} + } + + // The one join column for this specialized class. +
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460522454 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java ## @@ -0,0 +1,315 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// TODO : Duplicate codes need to merge with semi join. +// Single-Column Long hash table import. +// Single-Column Long specific imports. + +/* + * Specialized class for doing a vectorized map join that is an anti join on a Single-Column Long + * using a hash set. + */ +public class VectorMapJoinAntiJoinLongOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + private static final String CLASS_NAME = VectorMapJoinAntiJoinLongOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // The above members are initialized by the constructor and must not be + // transient. + + // The hash map for this specialized class. + private transient VectorMapJoinLongHashSet hashSet; + + // Single-Column Long specific members. + // For integers, we have optional min/max filtering. + private transient boolean useMinMax; + private transient long min; + private transient long max; + + // The column number for this one column join specialization. + private transient int singleJoinColumn; + + // Pass-thru constructors. + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinLongOperator() { +super(); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + // Process Single-Column Long Anti Join on a vectorized row batch. + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +// Initialize Single-Column Long members for this specialized class. +singleJoinColumn = bigTableKeyColumnMap[0]; + } + + @Override + public void hashTableSetup() throws HiveException { +super.hashTableSetup(); + +// Get our Single-Column Long hash set information for this specialized class. +hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable; +useMinMax = hashSet.useMinMax(); +if (useMinMax) { + min = hashSet.min(); + max = hashSet.max(); +} + } + + @Override + public void processBatch(VectorizedRowBatch batch) throws HiveException { + +try { + // (Currently none) + // antiPerBatchSetup(batch); + + // For anti joins, we may apply the filter(s) now. + for(VectorExpression ve : bigTableFilterExpressions) { +ve.evaluate(batch); + } + + final int inputLogicalSize = batch.size; + if (inputLogicalSize == 0) { +return; + } + + // Perform any key expressions. Results will go into scratch columns. + if (bigTableKeyExpressions != null) { +for (VectorExpression ve : bigTableKeyExpressions) { + ve.evaluate(batch); +} + } + + // The one join column for this specialized class. +
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460522312 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java ## @@ -0,0 +1,315 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// TODO : Duplicate codes need to merge with semi join. +// Single-Column Long hash table import. +// Single-Column Long specific imports. + +/* + * Specialized class for doing a vectorized map join that is an anti join on a Single-Column Long + * using a hash set. + */ +public class VectorMapJoinAntiJoinLongOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + private static final String CLASS_NAME = VectorMapJoinAntiJoinLongOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // The above members are initialized by the constructor and must not be + // transient. + + // The hash map for this specialized class. + private transient VectorMapJoinLongHashSet hashSet; + + // Single-Column Long specific members. + // For integers, we have optional min/max filtering. + private transient boolean useMinMax; + private transient long min; + private transient long max; + + // The column number for this one column join specialization. + private transient int singleJoinColumn; + + // Pass-thru constructors. + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinLongOperator() { +super(); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + // Process Single-Column Long Anti Join on a vectorized row batch. + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +// Initialize Single-Column Long members for this specialized class. +singleJoinColumn = bigTableKeyColumnMap[0]; + } + + @Override + public void hashTableSetup() throws HiveException { +super.hashTableSetup(); + +// Get our Single-Column Long hash set information for this specialized class. +hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable; +useMinMax = hashSet.useMinMax(); +if (useMinMax) { + min = hashSet.min(); + max = hashSet.max(); +} + } + + @Override + public void processBatch(VectorizedRowBatch batch) throws HiveException { + +try { + // (Currently none) + // antiPerBatchSetup(batch); + + // For anti joins, we may apply the filter(s) now. + for(VectorExpression ve : bigTableFilterExpressions) { +ve.evaluate(batch); + } + + final int inputLogicalSize = batch.size; + if (inputLogicalSize == 0) { +return; + } + + // Perform any key expressions. Results will go into scratch columns. + if (bigTableKeyExpressions != null) { +for (VectorExpression ve : bigTableKeyExpressions) { + ve.evaluate(batch); +} + } + + // The one join column for this specialized class. +
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460522261 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinStringOperator.java ## @@ -0,0 +1,371 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.StringExpr; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinBytesHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// Single-Column String hash table import. +// Single-Column String specific imports. + +// TODO : Duplicate codes need to merge with semi join. +/* + * Specialized class for doing a vectorized map join that is an anti join on a Single-Column String + * using a hash set. + */ +public class VectorMapJoinAntiJoinStringOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + + // + + private static final String CLASS_NAME = VectorMapJoinAntiJoinStringOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // + + // (none) + + // The above members are initialized by the constructor and must not be + // transient. + //--- + + // The hash map for this specialized class. + private transient VectorMapJoinBytesHashSet hashSet; + + //--- + // Single-Column String specific members. + // + + // The column number for this one column join specialization. + private transient int singleJoinColumn; + + //--- + // Pass-thru constructors. + // + + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinStringOperator() { +super(); + } + + public VectorMapJoinAntiJoinStringOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinStringOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + //--- + // Process Single-Column String anti Join on a vectorized row batch. + // + + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +/* + * Initialize Single-Column String members for this specialized class. + */ + +singleJoinColumn = bigTableKeyColumnMap[0]; + } + + @Override + public void hashTableSetup() throws HiveException { +super.hashTableSetup(); + +/* + * Get our Single-Column String hash set information for this specialized class. + */ + +hashSet = (VectorMapJoinBytesHashSet) vectorMapJoinHashTable; + } + + @Override + public void processBatch(VectorizedRowBatch batch) throws HiveException { + +try { + + // Do the per-batch setup for an anti join. + + // (Currently none) + // antiPerBatchSetup(batch); + + // For anti joins, we may apply the filter(s) now. + for(VectorExpression ve :
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460521384 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ## @@ -509,11 +513,17 @@ protected void addToAliasFilterTags(byte alias, List object, boolean isN } } + private void createForwardJoinObjectForAntiJoin(boolean[] skip) throws HiveException { +boolean forward = fillFwdCache(skip); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460521246 ## File path: parser/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g ## @@ -145,6 +145,7 @@ joinToken | KW_RIGHT (KW_OUTER)? KW_JOIN -> TOK_RIGHTOUTERJOIN | KW_FULL (KW_OUTER)? KW_JOIN -> TOK_FULLOUTERJOIN | KW_LEFT KW_SEMI KW_JOIN -> TOK_LEFTSEMIJOIN +| KW_ANTI KW_JOIN -> TOK_ANTIJOIN Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460521056 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ## @@ -153,6 +153,8 @@ transient boolean hasLeftSemiJoin = false; + transient boolean hasAntiJoin = false; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460520903 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptMaterializationValidator.java ## @@ -253,6 +256,14 @@ private RelNode visit(HiveSemiJoin semiJoin) { return visitChildren(semiJoin); } + // Note: Not currently part of the HiveRelNode interface + private RelNode visit(HiveAntiJoin antiJoin) { Review comment: Not sure ..copy pasted from semi join. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460520647 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java ## @@ -1112,7 +1112,7 @@ public RexNode field(RexNode e, String name) { } public HiveSubQRemoveRelBuilder join(JoinRelType joinType, RexNode condition, - Set variablesSet, boolean createSemiJoin) { + Set variablesSet, JoinRelType semiJoinType) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460519523 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java ## @@ -56,6 +57,9 @@ public static final HiveJoinAddNotNullRule INSTANCE_SEMIJOIN = new HiveJoinAddNotNullRule(HiveSemiJoin.class, HiveRelFactories.HIVE_FILTER_FACTORY); + public static final HiveJoinAddNotNullRule INSTANCE_ANTIJOIN = + new HiveJoinAddNotNullRule(HiveAntiJoin.class, HiveRelFactories.HIVE_FILTER_FACTORY); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460519111 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinDesc.java ## @@ -89,7 +89,8 @@ public PrimitiveTypeInfo getPrimitiveTypeInfo() { INNER_BIG_ONLY, LEFT_SEMI, OUTER, -FULL_OUTER +FULL_OUTER, +ANTI Review comment: LEFT_ANTI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460518974 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.calcite.util.ImmutableBitSet; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460518799 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.calcite.util.ImmutableBitSet; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule { + protected static final Logger LOG = LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class); + public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new HiveJoinWithFilterToAntiJoinRule(); + + //HiveProject(fld=[$0]) + // HiveFilter(condition=[IS NULL($1)]) + //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[not available]) + // + // TO + // + //HiveProject(fld_tbl=[$0]) + // HiveAntiJoin(condition=[=($0, $1)], joinType=[anti]) + // + public HiveJoinWithFilterToAntiJoinRule() { +super(operand(Project.class, operand(Filter.class, operand(Join.class, RelOptRule.any(, +"HiveJoinWithFilterToAntiJoinRule:filter"); + } + + // is null filter over a left join. + public void onMatch(final RelOptRuleCall call) { +final Project project = call.rel(0); +final Filter filter = call.rel(1); +final Join join = call.rel(2); +perform(call, project, filter, join); + } + + protected void perform(RelOptRuleCall call, Project project, Filter filter, Join join) { +LOG.debug("Matched HiveAntiJoinRule"); + +if (join.getCondition().isAlwaysTrue()) { + return; +} + +//We support conversion from left outer join only. +if (join.getJoinType() != JoinRelType.LEFT) { + return; +} + +assert (filter != null); + +List aboveFilters = RelOptUtil.conjunctions(filter.getCondition()); +boolean hasIsNull = false; + +// Get all filter condition and check if any of them is a "is null" kind. +for (RexNode filterNode : aboveFilters) { + if (filterNode.getKind() == SqlKind.IS_NULL && + isFilterFromRightSide(join, filterNode, join.getJoinType())) { +hasIsNull = true; +break; + } +} + +// Is null should be on a key from right side of the join. +if (!hasIsNull) { + return; +} + +// Build anti join with same left, right child and condition as original left outer join. +Join anti = join.copy(join.getTraitSet(), join.getCondition(), +join.getLeft(), join.getRight(), JoinRelType.ANTI, false); + +//TODO : Do we really need it +call.getPlanner().onCopy(join, anti); + +RelNode newProject = getNewProjectNode(project, anti); +if (newProject != null) { + call.getPlanner().onCopy(project, newProject); Review comment: done ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460518454 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java ## @@ -79,6 +80,11 @@ public Double getDistinctRowCount(HiveSemiJoin rel, RelMetadataQuery mq, Immutab return super.getDistinctRowCount(rel, mq, groupKey, predicate); } + public Double getDistinctRowCount(HiveAntiJoin rel, RelMetadataQuery mq, ImmutableBitSet groupKey, +RexNode predicate) { +return super.getDistinctRowCount(rel, mq, groupKey, predicate); Review comment: https://issues.apache.org/jira/browse/HIVE-23933 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460518411 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java ## @@ -118,6 +119,15 @@ public Double getRowCount(HiveJoin join, RelMetadataQuery mq) { } public Double getRowCount(HiveSemiJoin rel, RelMetadataQuery mq) { +return getRowCountInt(rel, mq); + } + + public Double getRowCount(HiveAntiJoin rel, RelMetadataQuery mq) { +return getRowCountInt(rel, mq); + } + + private Double getRowCountInt(Join rel, RelMetadataQuery mq) { Review comment: super.getRowCount(rel, mq) does not support Anti join. I think we need to handle it. https://issues.apache.org/jira/browse/HIVE-23933 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460515695 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java ## @@ -118,6 +119,15 @@ public Double getRowCount(HiveJoin join, RelMetadataQuery mq) { } public Double getRowCount(HiveSemiJoin rel, RelMetadataQuery mq) { +return getRowCountInt(rel, mq); + } + + public Double getRowCount(HiveAntiJoin rel, RelMetadataQuery mq) { +return getRowCountInt(rel, mq); + } + + private Double getRowCountInt(Join rel, RelMetadataQuery mq) { Review comment: Yes done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460515257 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRemoveGBYSemiJoinRule.java ## @@ -41,17 +41,19 @@ public HiveRemoveGBYSemiJoinRule() { super( -operand(HiveSemiJoin.class, +operand(Join.class, some( operand(RelNode.class, any()), operand(Aggregate.class, any(, HiveRelFactories.HIVE_BUILDER, "HiveRemoveGBYSemiJoinRule"); } @Override public void onMatch(RelOptRuleCall call) { -final HiveSemiJoin semijoin= call.rel(0); +final Join join= call.rel(0); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460001925 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ## @@ -2606,6 +2607,17 @@ private long computeFinalRowCount(List rowCountParents, long interimRowCou // max # of rows = rows from left side result = Math.min(rowCountParents.get(joinCond.getLeft()), result); break; +case JoinDesc.ANTI_JOIN: + long leftRowCount = rowCountParents.get(joinCond.getLeft()); + if (leftRowCount < result) { +// Ideally the inner join count should be less than the left row count. but if its not calculated +// properly then we can assume whole of left table will be selected. +result = leftRowCount; Review comment: This case will come if the stats are not proper. So to be on safer side, i assume that all rows from the left side will be projected. That is the max value. If set it to 0, it should not trigger some re-write, assuming the join result is empty. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460002261 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java ## @@ -142,7 +146,7 @@ private Double computeInnerJoinSelectivity(Join j, RelMetadataQuery mq, RexNode ndvEstimate = exponentialBackoff(peLst, colStatMap); } - if (j.isSemiJoin()) { + if (j.isSemiJoin() || (j instanceof HiveJoin && j.getJoinType().equals(JoinRelType.ANTI))) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r460001236 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -1901,6 +1905,11 @@ public RelNode apply(RelOptCluster cluster, RelOptSchema relOptSchema, SchemaPlu calcitePreCboPlan = applyPreJoinOrderingTransforms(calciteGenPlan, mdProvider.getMetadataProvider(), executorProvider); + if (conf.getBoolVar(ConfVars.HIVE_CONVERT_ANTI_JOIN)) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459998845 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.calcite.util.ImmutableBitSet; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule { + protected static final Logger LOG = LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class); + public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new HiveJoinWithFilterToAntiJoinRule(); + + //HiveProject(fld=[$0]) + // HiveFilter(condition=[IS NULL($1)]) + //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[not available]) + // + // TO + // + //HiveProject(fld_tbl=[$0]) + // HiveAntiJoin(condition=[=($0, $1)], joinType=[anti]) + // + public HiveJoinWithFilterToAntiJoinRule() { +super(operand(Project.class, operand(Filter.class, operand(Join.class, RelOptRule.any(, +"HiveJoinWithFilterToAntiJoinRule:filter"); + } + + // is null filter over a left join. + public void onMatch(final RelOptRuleCall call) { +final Project project = call.rel(0); +final Filter filter = call.rel(1); +final Join join = call.rel(2); +perform(call, project, filter, join); + } + + protected void perform(RelOptRuleCall call, Project project, Filter filter, Join join) { +LOG.debug("Matched HiveAntiJoinRule"); + +assert (filter != null); + +//We support conversion from left outer join only. +if (join.getJoinType() != JoinRelType.LEFT) { + return; +} + +List aboveFilters = RelOptUtil.conjunctions(filter.getCondition()); +boolean hasIsNull = false; + +// Get all filter condition and check if any of them is a "is null" kind. +for (RexNode filterNode : aboveFilters) { + if (filterNode.getKind() == SqlKind.IS_NULL && + isFilterFromRightSide(join, filterNode, join.getJoinType())) { +hasIsNull = true; +break; + } +} + +// Is null should be on a key from right side of the join. +if (!hasIsNull) { + return; +} + +// Build anti join with same left, right child and condition as original left outer join. +Join anti = join.copy(join.getTraitSet(), join.getCondition(), Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459998986 ## File path: ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out ## @@ -0,0 +1,99 @@ +PREHOOK: query: explain cbo +select + count(distinct cs_order_number) as `order count` + ,sum(cs_ext_ship_cost) as `total shipping cost` + ,sum(cs_net_profit) as `total net profit` +from + catalog_sales cs1 + ,date_dim + ,customer_address + ,call_center +where +d_date between '2001-4-01' and + (cast('2001-4-01' as date) + 60 days) +and cs1.cs_ship_date_sk = d_date_sk +and cs1.cs_ship_addr_sk = ca_address_sk +and ca_state = 'NY' +and cs1.cs_call_center_sk = cc_call_center_sk +and cc_county in ('Ziebach County','Levy County','Huron County','Franklin Parish', + 'Daviess County' +) +and exists (select * +from catalog_sales cs2 +where cs1.cs_order_number = cs2.cs_order_number + and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) +and not exists(select * + from catalog_returns cr1 + where cs1.cs_order_number = cr1.cr_order_number) +order by count(distinct cs_order_number) +limit 100 +PREHOOK: type: QUERY +PREHOOK: Input: default@call_center +PREHOOK: Input: default@catalog_returns +PREHOOK: Input: default@catalog_sales +PREHOOK: Input: default@customer_address +PREHOOK: Input: default@date_dim +PREHOOK: Output: hdfs://### HDFS PATH ### +POSTHOOK: query: explain cbo +select + count(distinct cs_order_number) as `order count` + ,sum(cs_ext_ship_cost) as `total shipping cost` + ,sum(cs_net_profit) as `total net profit` +from + catalog_sales cs1 + ,date_dim + ,customer_address + ,call_center +where +d_date between '2001-4-01' and + (cast('2001-4-01' as date) + 60 days) +and cs1.cs_ship_date_sk = d_date_sk +and cs1.cs_ship_addr_sk = ca_address_sk +and ca_state = 'NY' +and cs1.cs_call_center_sk = cc_call_center_sk +and cc_county in ('Ziebach County','Levy County','Huron County','Franklin Parish', + 'Daviess County' +) +and exists (select * +from catalog_sales cs2 +where cs1.cs_order_number = cs2.cs_order_number + and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) +and not exists(select * + from catalog_returns cr1 + where cs1.cs_order_number = cr1.cr_order_number) +order by count(distinct cs_order_number) +limit 100 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@call_center +POSTHOOK: Input: default@catalog_returns +POSTHOOK: Input: default@catalog_sales +POSTHOOK: Input: default@customer_address +POSTHOOK: Input: default@date_dim +POSTHOOK: Output: hdfs://### HDFS PATH ### +CBO PLAN: +HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], agg#2=[sum($6)]) + HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], cost=[not available]) +HiveSemiJoin(condition=[AND(<>($3, $13), =($4, $14))], joinType=[semi]) Review comment: done ..creating the HiveAntiJoin operator directly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459990495 ## File path: ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out ## @@ -0,0 +1,99 @@ +PREHOOK: query: explain cbo +select + count(distinct cs_order_number) as `order count` + ,sum(cs_ext_ship_cost) as `total shipping cost` + ,sum(cs_net_profit) as `total net profit` +from + catalog_sales cs1 + ,date_dim + ,customer_address + ,call_center +where +d_date between '2001-4-01' and + (cast('2001-4-01' as date) + 60 days) +and cs1.cs_ship_date_sk = d_date_sk +and cs1.cs_ship_addr_sk = ca_address_sk +and ca_state = 'NY' +and cs1.cs_call_center_sk = cc_call_center_sk +and cc_county in ('Ziebach County','Levy County','Huron County','Franklin Parish', + 'Daviess County' +) +and exists (select * +from catalog_sales cs2 +where cs1.cs_order_number = cs2.cs_order_number + and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) +and not exists(select * + from catalog_returns cr1 + where cs1.cs_order_number = cr1.cr_order_number) +order by count(distinct cs_order_number) +limit 100 +PREHOOK: type: QUERY +PREHOOK: Input: default@call_center +PREHOOK: Input: default@catalog_returns +PREHOOK: Input: default@catalog_sales +PREHOOK: Input: default@customer_address +PREHOOK: Input: default@date_dim +PREHOOK: Output: hdfs://### HDFS PATH ### +POSTHOOK: query: explain cbo +select + count(distinct cs_order_number) as `order count` + ,sum(cs_ext_ship_cost) as `total shipping cost` + ,sum(cs_net_profit) as `total net profit` +from + catalog_sales cs1 + ,date_dim + ,customer_address + ,call_center +where +d_date between '2001-4-01' and + (cast('2001-4-01' as date) + 60 days) +and cs1.cs_ship_date_sk = d_date_sk +and cs1.cs_ship_addr_sk = ca_address_sk +and ca_state = 'NY' +and cs1.cs_call_center_sk = cc_call_center_sk +and cc_county in ('Ziebach County','Levy County','Huron County','Franklin Parish', + 'Daviess County' +) +and exists (select * +from catalog_sales cs2 +where cs1.cs_order_number = cs2.cs_order_number + and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) +and not exists(select * + from catalog_returns cr1 + where cs1.cs_order_number = cr1.cr_order_number) +order by count(distinct cs_order_number) +limit 100 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@call_center +POSTHOOK: Input: default@catalog_returns +POSTHOOK: Input: default@catalog_sales +POSTHOOK: Input: default@customer_address +POSTHOOK: Input: default@date_dim +POSTHOOK: Output: hdfs://### HDFS PATH ### +CBO PLAN: +HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], agg#2=[sum($6)]) + HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], cost=[not available]) Review comment: I think, it's not a problem. The filed index are for different input. So even though the number is same the condition is different. Even without anti join, the condition is same. HiveFilter(condition=[IS NULL($13)]) HiveJoin(condition=[=($4, $14)], joinType=[left], algorithm=[none], cost=[not available]) HiveSemiJoin(condition=[AND(<>($3, $13), =($4, $14))], joinType=[semi]) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459988214 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java ## @@ -79,6 +80,11 @@ public Double getDistinctRowCount(HiveSemiJoin rel, RelMetadataQuery mq, Immutab return super.getDistinctRowCount(rel, mq, groupKey, predicate); } + public Double getDistinctRowCount(HiveAntiJoin rel, RelMetadataQuery mq, ImmutableBitSet groupKey, +RexNode predicate) { +return super.getDistinctRowCount(rel, mq, groupKey, predicate); Review comment: calcite 21 does not support distinct calculation for Anti join. if (join.isSemiJoin()) { return getSemiJoinDistinctRowCount(join, mq, groupKey, predicate); } else { Builder leftMask = ImmutableBitSet.builder(); I think these rules will not get triggered as of now for Anti join as i am not converting the not-exists to anti join. As of now all these rules will be applied on left outer and then we convert the left outer to anti join. I This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459986329 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java ## @@ -414,6 +416,13 @@ private RexNode rewriteInExists(RexSubQuery e, Set variablesSet, // null keys we do not need to generate count(*), count(c) if (e.getKind() == SqlKind.EXISTS) { logic = RelOptUtil.Logic.TRUE_FALSE; +if (conf.getBoolVar(HiveConf.ConfVars.HIVE_CONVERT_ANTI_JOIN)) { + //TODO : As of now anti join is first converted to left outer join Review comment: Now also the conversion is not done. The code is present but actual conversion is not done and logic is still TRUE_FALSE. For the code to be effective , the logic should be changed to FALSE. I have not done it yet, as it was causing some change in plan which i could not judge to be expected or not. Anyways i have created a JIRA to track this. https://issues.apache.org/jira/browse/HIVE-23928 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459984171 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.calcite.util.ImmutableBitSet; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule { + protected static final Logger LOG = LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class); + public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new HiveJoinWithFilterToAntiJoinRule(); + + //HiveProject(fld=[$0]) + // HiveFilter(condition=[IS NULL($1)]) + //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[not available]) + // + // TO + // + //HiveProject(fld_tbl=[$0]) + // HiveAntiJoin(condition=[=($0, $1)], joinType=[anti]) + // + public HiveJoinWithFilterToAntiJoinRule() { +super(operand(Project.class, operand(Filter.class, operand(Join.class, RelOptRule.any(, +"HiveJoinWithFilterToAntiJoinRule:filter"); + } + + // is null filter over a left join. + public void onMatch(final RelOptRuleCall call) { +final Project project = call.rel(0); +final Filter filter = call.rel(1); +final Join join = call.rel(2); +perform(call, project, filter, join); + } + + protected void perform(RelOptRuleCall call, Project project, Filter filter, Join join) { +LOG.debug("Matched HiveAntiJoinRule"); + +if (join.getCondition().isAlwaysTrue()) { + return; +} + +//We support conversion from left outer join only. +if (join.getJoinType() != JoinRelType.LEFT) { + return; +} + +assert (filter != null); + +List aboveFilters = RelOptUtil.conjunctions(filter.getCondition()); +boolean hasIsNull = false; + +// Get all filter condition and check if any of them is a "is null" kind. +for (RexNode filterNode : aboveFilters) { + if (filterNode.getKind() == SqlKind.IS_NULL && + isFilterFromRightSide(join, filterNode, join.getJoinType())) { +hasIsNull = true; +break; + } +} + +// Is null should be on a key from right side of the join. +if (!hasIsNull) { + return; +} + +// Build anti join with same left, right child and condition as original left outer join. +Join anti = join.copy(join.getTraitSet(), join.getCondition(), +join.getLeft(), join.getRight(), JoinRelType.ANTI, false); + +//TODO : Do we really need it +call.getPlanner().onCopy(join, anti); + +RelNode newProject = getNewProjectNode(project, anti); +if (newProject != null) { + call.getPlanner().onCopy(project, newProject); + call.transformTo(newProject); +} + } + + protected RelNode getNewProjectNode(Project oldProject, Join newJoin) { Review comment: I didn't find any such utility method, so added this into HiveCalciteUtil and used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459841087 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinProjectTransposeRule.java ## @@ -133,6 +135,10 @@ private HiveJoinProjectTransposeRuleBase( public void onMatch(RelOptRuleCall call) { //TODO: this can be removed once CALCITE-3824 is released + Join joinRel = call.rel(0); + if (joinRel.getJoinType() == JoinRelType.ANTI) { Review comment: This was causing some issue with having clause. https://issues.apache.org/jira/browse/HIVE-23921 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459840139 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java ## @@ -100,7 +100,8 @@ public void onMatch(RelOptRuleCall call) { // These boolean values represent corresponding left, right input which is potential FK boolean leftInputPotentialFK = topRefs.intersects(leftBits); boolean rightInputPotentialFK = topRefs.intersects(rightBits); -if (leftInputPotentialFK && rightInputPotentialFK && (joinType == JoinRelType.INNER || joinType == JoinRelType.SEMI)) { +if (leftInputPotentialFK && rightInputPotentialFK && +(joinType == JoinRelType.INNER || joinType == JoinRelType.SEMI || joinType == JoinRelType.ANTI)) { Review comment: https://issues.apache.org/jira/browse/HIVE-23920 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459827002 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java ## @@ -74,7 +78,14 @@ public HiveJoinAddNotNullRule(Class clazz, @Override public void onMatch(RelOptRuleCall call) { Join join = call.rel(0); -if (join.getJoinType() == JoinRelType.FULL || join.getCondition().isAlwaysTrue()) { + +// For anti join case add the not null on right side if the condition is +// always true. This is done because during execution, anti join expect the right side to +// be empty and if we dont put null check on right, for null only right side table and condition +// always true, execution will produce 0 records. +// eg select * from left_tbl where (select 1 from all_null_right limit 1) is null +if (join.getJoinType() == JoinRelType.FULL || +(join.getJoinType() != JoinRelType.ANTI && join.getCondition().isAlwaysTrue())) { Review comment: Yes, the comment is not proper. It's like we will add a not null condition for anti join even if the condition is always true. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459826636 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAntiJoin.java ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.reloperators; + +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Sets; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelTraitSet; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexNode; +import org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil; +import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRulesRegistry; + +import java.util.ArrayList; +import java.util.List; + +public class HiveAntiJoin extends Join implements HiveRelNode { Review comment: https://issues.apache.org/jira/browse/HIVE-23919 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459825977 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAntiJoin.java ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.reloperators; + +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Sets; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelTraitSet; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexNode; +import org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil; +import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRulesRegistry; + +import java.util.ArrayList; +import java.util.List; + +public class HiveAntiJoin extends Join implements HiveRelNode { + + private final RexNode joinFilter; Review comment: The joinjoinFilter holds the residual filter which is used during post processing. These are the join conditions that are not part of the join key. I think condition in Join hold the full condition. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459253392 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java ## @@ -747,7 +747,7 @@ public static RewritablePKFKJoinInfo isRewritablePKFKJoin(Join join, final RelNode nonFkInput = leftInputPotentialFK ? join.getRight() : join.getLeft(); final RewritablePKFKJoinInfo nonRewritable = RewritablePKFKJoinInfo.of(false, null); -if (joinType != JoinRelType.INNER && !join.isSemiJoin()) { +if (joinType != JoinRelType.INNER && !join.isSemiJoin() && joinType != JoinRelType.ANTI) { Review comment: https://issues.apache.org/jira/browse/HIVE-23906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459220734 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinGenerateResultOperator.java ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSet; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSetResult; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashTableResult; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; + +// TODO : This class is duplicate of semi join. Need to do a refactoring to merge it with semi join. Review comment: https://issues.apache.org/jira/browse/HIVE-23905 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459219966 ## File path: ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/TestMapJoinOperator.java ## @@ -1792,6 +1794,8 @@ private void executeTest(MapJoinTestDescription testDesc, MapJoinTestData testDa case FULL_OUTER: executeTestFullOuter(testDesc, testData, title); break; +case ANTI: //TODO Review comment: https://issues.apache.org/jira/browse/HIVE-23904 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459216878 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java ## @@ -339,6 +339,12 @@ String getFuncText(String funcText, final int srcPos) { vector.add(right, left); break; case JoinDesc.LEFT_OUTER_JOIN: +case JoinDesc.ANTI_JOIN: +//TODO : In case of anti join, bloom filter can be created on left side also ("IN (keylist right table)"). +// But the filter should be "not-in" ("NOT IN (keylist right table)") as we want to select the records from +// left side which are not present in the right side. But it may cause wrong result as +// bloom filter may have false positive and thus simply adding not is not correct, +// special handling is required for "NOT IN". Review comment: created a Jira ..https://issues.apache.org/jira/browse/HIVE-23903 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459215352 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java ## @@ -74,7 +78,14 @@ public HiveJoinAddNotNullRule(Class clazz, @Override public void onMatch(RelOptRuleCall call) { Join join = call.rel(0); -if (join.getJoinType() == JoinRelType.FULL || join.getCondition().isAlwaysTrue()) { + +// For anti join case add the not null on right side if the condition is Review comment: For the case when we have. join condition which gets evaluated, it will return false while comparing with a null on the right side. But for always true join condition, we will not do a match for right side assuming it's always true. So for anti join, the left side records will not be emitted. To avoid this we put a null check on right side table and for all null entry, no records will be projected from right side and thus all records from left side will be emitted. So the comment is not very accurate. It's like even if the condition is always true, we add a null check on right side for anti join. I will update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459208067 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java ## @@ -0,0 +1,315 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.mapjoin; + +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.JoinUtil; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.VectorDesc; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; + +// TODO : Duplicate codes need to merge with semi join. +// Single-Column Long hash table import. +// Single-Column Long specific imports. + +/* + * Specialized class for doing a vectorized map join that is an anti join on a Single-Column Long + * using a hash set. + */ +public class VectorMapJoinAntiJoinLongOperator extends VectorMapJoinAntiJoinGenerateResultOperator { + + private static final long serialVersionUID = 1L; + private static final String CLASS_NAME = VectorMapJoinAntiJoinLongOperator.class.getName(); + private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + protected String getLoggingPrefix() { +return super.getLoggingPrefix(CLASS_NAME); + } + + // The above members are initialized by the constructor and must not be + // transient. + + // The hash map for this specialized class. + private transient VectorMapJoinLongHashSet hashSet; + + // Single-Column Long specific members. + // For integers, we have optional min/max filtering. + private transient boolean useMinMax; + private transient long min; + private transient long max; + + // The column number for this one column join specialization. + private transient int singleJoinColumn; + + // Pass-thru constructors. + /** Kryo ctor. */ + protected VectorMapJoinAntiJoinLongOperator() { +super(); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) { +super(ctx); + } + + public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, OperatorDesc conf, + VectorizationContext vContext, VectorDesc vectorDesc) throws HiveException { +super(ctx, conf, vContext, vectorDesc); + } + + // Process Single-Column Long Anti Join on a vectorized row batch. + @Override + protected void commonSetup() throws HiveException { +super.commonSetup(); + +// Initialize Single-Column Long members for this specialized class. +singleJoinColumn = bigTableKeyColumnMap[0]; + } + + @Override + public void hashTableSetup() throws HiveException { +super.hashTableSetup(); + +// Get our Single-Column Long hash set information for this specialized class. +hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable; +useMinMax = hashSet.useMinMax(); +if (useMinMax) { + min = hashSet.min(); + max = hashSet.max(); +} + } + + @Override + public void processBatch(VectorizedRowBatch batch) throws HiveException { + +try { + // (Currently none) Review comment: The pre batch processing done only for joins which emits the right table records. For semi join and anti join, it's not required. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r459198718 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ## @@ -523,11 +533,19 @@ private boolean createForwardJoinObject(boolean[] skip) throws HiveException { forward = true; } } +return forward; + } + + // returns whether a record was forwarded + private boolean createForwardJoinObject(boolean[] skip, boolean antiJoin) throws HiveException { +boolean forward = fillFwdCache(skip); if (forward) { if (needsPostEvaluation) { forward = !JoinUtil.isFiltered(forwardCache, residualJoinFilters, residualJoinFiltersOIs); } - if (forward) { + + // For anti join, check all right side and if nothing is matched then only forward. Review comment: For anti join we don't emit the record here. It's done after all the records are checked and none of the record matches the condition. Here if forward is false we don't forward and as its a "&" we don't forward for anti join == true even if forward is true. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r457545081 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.calcite.util.ImmutableBitSet; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule { + protected static final Logger LOG = LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class); + public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new HiveJoinWithFilterToAntiJoinRule(); + + //HiveProject(fld=[$0]) + // HiveFilter(condition=[IS NULL($1)]) + //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[not available]) + // + // TO + // + //HiveProject(fld_tbl=[$0]) + // HiveAntiJoin(condition=[=($0, $1)], joinType=[anti]) + // + public HiveJoinWithFilterToAntiJoinRule() { +super(operand(Project.class, operand(Filter.class, operand(Join.class, RelOptRule.any(, +"HiveJoinWithFilterToAntiJoinRule:filter"); + } + + // is null filter over a left join. + public void onMatch(final RelOptRuleCall call) { +final Project project = call.rel(0); +final Filter filter = call.rel(1); +final Join join = call.rel(2); +perform(call, project, filter, join); + } + + protected void perform(RelOptRuleCall call, Project project, Filter filter, Join join) { +LOG.debug("Matched HiveAntiJoinRule"); Review comment: sure ..will do that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r456594608 ## File path: ql/src/test/results/clientpositive/llap/antijoin.q.out ## @@ -0,0 +1,1007 @@ +PREHOOK: query: create table t1_n55 as select cast(key as int) key, value from src where key <= 10 +PREHOOK: type: CREATETABLE_AS_SELECT +PREHOOK: Input: default@src +PREHOOK: Output: database:default +PREHOOK: Output: default@t1_n55 +POSTHOOK: query: create table t1_n55 as select cast(key as int) key, value from src where key <= 10 +POSTHOOK: type: CREATETABLE_AS_SELECT +POSTHOOK: Input: default@src +POSTHOOK: Output: database:default +POSTHOOK: Output: default@t1_n55 +POSTHOOK: Lineage: t1_n55.key EXPRESSION [(src)src.FieldSchema(name:key, type:string, comment:default), ] +POSTHOOK: Lineage: t1_n55.value SIMPLE [(src)src.FieldSchema(name:value, type:string, comment:default), ] +PREHOOK: query: select * from t1_n55 sort by key +PREHOOK: type: QUERY +PREHOOK: Input: default@t1_n55 + A masked pattern was here +POSTHOOK: query: select * from t1_n55 sort by key +POSTHOOK: type: QUERY +POSTHOOK: Input: default@t1_n55 + A masked pattern was here +0 val_0 Review comment: These all new test cases are added from the failure test cases of a dry run with anti join enabled true. Manually i have verified that the resultant records are same and plan difference is as per expected behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive
maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r456593908 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. \n" + "If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the\n" + "specified size, the join is directly converted to a mapjoin (there is no conditional task)."), - +HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false, Review comment: Yes, i had triggered a ptest run with this config enabled to true by default. There were some 26 failures. I had analyzed those and some fixes were done to make sure that the result is same for both and difference in plan is as expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org