[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466826953



##
File path: 
ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query94_anti_join.q.out
##
@@ -0,0 +1,94 @@
+PREHOOK: query: explain cbo
+select  
+   count(distinct ws_order_number) as `order count`
+  ,sum(ws_ext_ship_cost) as `total shipping cost`
+  ,sum(ws_net_profit) as `total net profit`
+from
+   web_sales ws1
+  ,date_dim
+  ,customer_address
+  ,web_site
+where
+d_date between '1999-5-01' and 
+   (cast('1999-5-01' as date) + 60 days)
+and ws1.ws_ship_date_sk = d_date_sk
+and ws1.ws_ship_addr_sk = ca_address_sk
+and ca_state = 'TX'
+and ws1.ws_web_site_sk = web_site_sk
+and web_company_name = 'pri'
+and exists (select *
+from web_sales ws2
+where ws1.ws_order_number = ws2.ws_order_number
+  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
+and not exists(select *
+   from web_returns wr1
+   where ws1.ws_order_number = wr1.wr_order_number)
+order by count(distinct ws_order_number)
+limit 100
+PREHOOK: type: QUERY
+PREHOOK: Input: default@customer_address
+PREHOOK: Input: default@date_dim
+PREHOOK: Input: default@web_returns
+PREHOOK: Input: default@web_sales
+PREHOOK: Input: default@web_site
+PREHOOK: Output: hdfs://### HDFS PATH ###
+POSTHOOK: query: explain cbo
+select  
+   count(distinct ws_order_number) as `order count`
+  ,sum(ws_ext_ship_cost) as `total shipping cost`
+  ,sum(ws_net_profit) as `total net profit`
+from
+   web_sales ws1
+  ,date_dim
+  ,customer_address
+  ,web_site
+where
+d_date between '1999-5-01' and 
+   (cast('1999-5-01' as date) + 60 days)
+and ws1.ws_ship_date_sk = d_date_sk
+and ws1.ws_ship_addr_sk = ca_address_sk
+and ca_state = 'TX'
+and ws1.ws_web_site_sk = web_site_sk
+and web_company_name = 'pri'
+and exists (select *
+from web_sales ws2
+where ws1.ws_order_number = ws2.ws_order_number
+  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
+and not exists(select *
+   from web_returns wr1
+   where ws1.ws_order_number = wr1.wr_order_number)
+order by count(distinct ws_order_number)
+limit 100
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@customer_address
+POSTHOOK: Input: default@date_dim
+POSTHOOK: Input: default@web_returns
+POSTHOOK: Input: default@web_sales
+POSTHOOK: Input: default@web_site
+POSTHOOK: Output: hdfs://### HDFS PATH ###
+CBO PLAN:
+HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], 
agg#2=[sum($6)])
+  HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], 
cost=[not available])

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466826675



##
File path: 
ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out
##
@@ -0,0 +1,99 @@
+PREHOOK: query: explain cbo

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466826103



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Whether Hive enables the optimization about converting common join 
into mapjoin based on the input file size. \n" +
 "If this parameter is on, and the sum of size for n-1 of the 
tables/partitions for a n-way join is smaller than the\n" +
 "specified size, the join is directly converted to a mapjoin (there is 
no conditional task)."),
-
+HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false,

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466825617



##
File path: 
ql/src/test/results/clientpositive/llap/subquery_notexists_having.q.out
##
@@ -31,7 +31,8 @@ STAGE PLANS:
 Tez
  A masked pattern was here 
   Edges:
-Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
+Reducer 2 <- Map 1 (SIMPLE_EDGE)

Review comment:
   yes ..the join is getting converted to SMB join ..and so no reducer is 
required. In case of anti join its not getting converted. That is because left 
outer is adding an extra group by which is making the RS node on left and right 
side  equal, the pre-condition for converting to SMB join. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466822516



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -2129,6 +2133,16 @@ private RelNode applyPreJoinOrderingTransforms(RelNode 
basePlan, RelMetadataProv
 HiveRemoveSqCountCheck.INSTANCE);
   }
 
+  // 10. Convert left outer join + null filter on right side table column 
to anti join. Add this
+  // rule after all the optimization for which calcite support for anti 
join is missing.
+  // Needs to be done before ProjectRemoveRule as it expect a project over 
filter.
+  // This is done before join re-ordering as join re-ordering is 
converting the left outer

Review comment:
   As discussed, i have created a Jira 
https://issues.apache.org/jira/browse/HIVE-24013





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466819572



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
##
@@ -92,7 +104,7 @@ public void onMatch(RelOptRuleCall call) {
 Set rightPushedPredicates = 
Sets.newHashSet(registry.getPushedPredicates(join, 1));
 
 boolean genPredOnLeft = join.getJoinType() == JoinRelType.RIGHT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin();
-boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin();
+boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin()|| 
join.getJoinType() == JoinRelType.ANTI;

Review comment:
   yes ..that is taken care of.
   // For anti join, we should proceed to emit records if the right side is 
empty or not matching.
   if (type == JoinDesc.ANTI_JOIN && !producedRow) {





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466819149



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAntiSemiJoinRule.java
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAntiJoin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveAntiSemiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveAntiSemiJoinRule.class);
+  public static final HiveAntiSemiJoinRule INSTANCE = new 
HiveAntiSemiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveAntiSemiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void perform(RelOptRuleCall call, Project project, Filter filter, 
Join join) {
+LOG.debug("Start Matching HiveAntiJoinRule");
+
+//TODO : Need to support this scenario.
+if (join.getCondition().isAlwaysTrue()) {
+  return;
+}
+
+//We support conversion from left outer join only.
+if (join.getJoinType() != JoinRelType.LEFT) {
+  return;
+}
+
+assert (filter != null);
+
+// If null filter is not present from right side then we can not convert 
to anti join.
+List aboveFilters = 
RelOptUtil.conjunctions(filter.getCondition());
+Stream nullFilters = aboveFilters.stream().filter(filterNode -> 
filterNode.getKind() == SqlKind.IS_NULL);
+boolean hasNullFilter = 
HiveCalciteUtil.hasAnyExpressionFromRightSide(join, 
nullFilters.collect(Collectors.toList()));
+if (!hasNullFilter) {
+  return;
+}
+
+// If any projection is there from right side, then we can not convert to 
anti join.
+boolean hasProjection = 
HiveCalciteUtil.hasAnyExpressionFromRightSide(join, project.getProjects());
+if (hasProjection) {
+  return;
+}
+
+LOG.debug("Matched HiveAntiJoinRule");
+
+// Build anti join with same left, right child and condition as original 
left outer join.
+Join anti = HiveAntiJoin.getAntiJoin(join.getLeft().getCluster(), 
join.getLeft().getTraitSet(),
+join.getLeft(), join.getRight(), join.getCondition());
+RelNode newProject = project.copy(project.getTraitSet(), anti, 
project.getProjects(), project.getRowType());
+call.transformTo(newProject);

Review comment:
   for normal filter, its being pushed down. here we get filters which can 
not be pushed down. I have modified the code to handle those filters. And added 
these extra tests to verify.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.


[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466818492



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinMultiKeyOperator.java
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinBytesHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.apache.hadoop.hive.serde2.ByteStream.Output;
+import 
org.apache.hadoop.hive.serde2.binarysortable.fast.BinarySortableSerializeWrite;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// Multi-Key hash table import.
+// Multi-Key specific imports.
+
+// TODO : Duplicate codes need to merge with semi join.
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on 
Multi-Key
+ * using hash set.
+ */
+public class VectorMapJoinAntiJoinMultiKeyOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+
+  
//
+
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinMultiKeyOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  
//
+
+  // (none)
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+  //---
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinBytesHashSet hashSet;
+
+  //---
+  // Multi-Key specific members.
+  //
+
+  // Object that can take a set of columns in row in a vectorized row batch 
and serialized it.
+  // Known to not have any nulls.
+  private transient VectorSerializeRow keyVectorSerializeWrite;
+
+  // The BinarySortable serialization of the current key.
+  private transient Output currentKeyOutput;
+
+  // The BinarySortable serialization of the saved key for a possible series 
of equal keys.
+  private transient Output saveKeyOutput;
+
+  //---
+  // Pass-thru constructors.
+  //
+
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinMultiKeyOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinMultiKeyOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinMultiKeyOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  //---
+  // Process Multi-Key Anti Join on a vectorized row batch.
+  //
+
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+/*
+ * Initialize Multi-Key members for this specialized class.
+ */
+
+keyVectorSerializeWrite = new 
VectorSerializeRow(BinarySortableSerializeWrite.with(
+this.getConf().getKeyTblDesc().getProperties(), 
bigTableKeyColumnMap.length));
+keyVectorSerializeWrite.init(bigTableKeyTypeInfos, bigTableKeyColumnMap);
+
+currentKeyOutput = new 

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466818194



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java
##
@@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// TODO : Duplicate codes need to merge with semi join.
+// Single-Column Long hash table import.
+// Single-Column Long specific imports.
+
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on a 
Single-Column Long
+ * using a hash set.
+ */
+public class VectorMapJoinAntiJoinLongOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinLongOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinLongHashSet hashSet;
+
+  // Single-Column Long specific members.
+  // For integers, we have optional min/max filtering.
+  private transient boolean useMinMax;
+  private transient long min;
+  private transient long max;
+
+  // The column number for this one column join specialization.
+  private transient int singleJoinColumn;
+
+  // Pass-thru constructors.
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinLongOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  // Process Single-Column Long Anti Join on a vectorized row batch.
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+// Initialize Single-Column Long members for this specialized class.
+singleJoinColumn = bigTableKeyColumnMap[0];
+  }
+
+  @Override
+  public void hashTableSetup() throws HiveException {
+super.hashTableSetup();
+
+// Get our Single-Column Long hash set information for this specialized 
class.
+hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable;
+useMinMax = hashSet.useMinMax();
+if (useMinMax) {
+  min = hashSet.min();
+  max = hashSet.max();
+}
+  }
+
+  @Override
+  public void processBatch(VectorizedRowBatch batch) throws HiveException {
+
+try {
+  // (Currently none)
+  // antiPerBatchSetup(batch);
+
+  // For anti joins, we may apply the filter(s) now.
+  for(VectorExpression ve : bigTableFilterExpressions) {
+ve.evaluate(batch);
+  }
+
+  final int inputLogicalSize = batch.size;
+  if (inputLogicalSize == 0) {
+return;
+  }
+
+  // Perform any key expressions.  Results will go into scratch columns.
+  if (bigTableKeyExpressions != null) {
+for (VectorExpression ve : bigTableKeyExpressions) {
+  ve.evaluate(batch);
+}
+  }
+
+  // The one join column for this specialized class.
+  

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-04 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r465297563



##
File path: ql/src/test/queries/clientpositive/subquery_in_having.q
##
@@ -140,6 +140,22 @@ CREATE TABLE src_null_n4 (key STRING COMMENT 'default', 
value STRING COMMENT 'de
 LOAD DATA LOCAL INPATH "../../data/files/kv1.txt" INTO TABLE src_null_n4;
 INSERT INTO src_null_n4 values('5444', null);
 
+explain
+select key, value, count(*)

Review comment:
   By default anti join conversion is set to true. I have added few test 
cases with anti join set to false.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java
##
@@ -203,6 +203,7 @@ private boolean filterExists(ReduceSinkOperator target, 
ExprNodeDesc replaced) {
   vector.add(right, left);
   break;
 case JoinDesc.LEFT_OUTER_JOIN:
+case JoinDesc.ANTI_JOIN: //TODO : need to test

Review comment:
   removed the comment.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java
##
@@ -183,6 +189,7 @@ public void onMatch(RelOptRuleCall call) {
 switch (joinType) {
 case SEMI:
 case INNER:
+case ANTI:

Review comment:
   done

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
##
@@ -92,7 +104,7 @@ public void onMatch(RelOptRuleCall call) {
 Set rightPushedPredicates = 
Sets.newHashSet(registry.getPushedPredicates(join, 1));
 
 boolean genPredOnLeft = join.getJoinType() == JoinRelType.RIGHT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin();
-boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin();
+boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin()|| 
join.getJoinType() == JoinRelType.ANTI;

Review comment:
   Yes ..if right side is null then it emits all the right side records

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAntiSemiJoinRule.java
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAntiJoin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveAntiSemiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveAntiSemiJoinRule.class);
+  public static final HiveAntiSemiJoinRule INSTANCE = new 
HiveAntiSemiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveAntiSemiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void 

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460606016



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinGenerateResultOperator.java
##
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSet;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSetResult;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashTableResult;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+
+// TODO : This class is duplicate of semi join. Need to do a refactoring to 
merge it with semi join.
+/**
+ * This class has methods for generating vectorized join results for Anti 
joins.
+ * The big difference between inner joins and anti joins is existence testing.
+ * Inner joins use a hash map to lookup the 1 or more small table values.
+ * Anti joins are a specialized join for outputting big table rows whose key 
exists

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460605730



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
##
@@ -523,11 +533,19 @@ private boolean createForwardJoinObject(boolean[] skip) 
throws HiveException {
 forward = true;
   }
 }
+return forward;
+  }
+
+  // returns whether a record was forwarded
+  private boolean createForwardJoinObject(boolean[] skip, boolean antiJoin) 
throws HiveException {
+boolean forward = fillFwdCache(skip);
 if (forward) {
   if (needsPostEvaluation) {
 forward = !JoinUtil.isFiltered(forwardCache, residualJoinFilters, 
residualJoinFiltersOIs);
   }
-  if (forward) {
+
+  // For anti join, check all right side and if nothing is matched then 
only forward.

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460605836



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinGenerateResultOperator.java
##
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSet;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSetResult;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashTableResult;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+
+// TODO : This class is duplicate of semi join. Need to do a refactoring to 
merge it with semi join.
+/**
+ * This class has methods for generating vectorized join results for Anti 
joins.
+ * The big difference between inner joins and anti joins is existence testing.
+ * Inner joins use a hash map to lookup the 1 or more small table values.
+ * Anti joins are a specialized join for outputting big table rows whose key 
exists
+ * in the small table.
+ *
+ * No small table values are needed for anti since they would be empty.  So,
+ * we use a hash set as the hash table.  Hash sets just report whether a key 
exists.  This
+ * is a big performance optimization.
+ */
+public abstract class VectorMapJoinAntiJoinGenerateResultOperator
+extends VectorMapJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+  private static final Logger LOG = 
LoggerFactory.getLogger(VectorMapJoinAntiJoinGenerateResultOperator.class.getName());
+
+  // Anti join specific members.
+
+  // An array of hash set results so we can do lookups on the whole batch 
before output result
+  // generation.
+  protected transient VectorMapJoinHashSetResult hashSetResults[];
+
+  // Pre-allocated member for storing the (physical) batch index of matching 
row (single- or
+  // multi-small-table-valued) indexes during a process call.
+  protected transient int[] allMatchs;
+
+  // Pre-allocated member for storing the (physical) batch index of rows that 
need to be spilled.
+  protected transient int[] spills;
+
+  // Pre-allocated member for storing index into the hashSetResults for each 
spilled row.
+  protected transient int[] spillHashMapResultIndices;
+
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinGenerateResultOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinGenerateResultOperator(CompilationOpContext ctx) 
{
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinGenerateResultOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+ VectorizationContext 
vContext, VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  /*
+   * Setup our anti join specific members.
+   */
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+// Anti join specific.
+VectorMapJoinHashSet baseHashSet = (VectorMapJoinHashSet) 
vectorMapJoinHashTable;
+
+hashSetResults = new 
VectorMapJoinHashSetResult[VectorizedRowBatch.DEFAULT_SIZE];
+for (int i = 0; i < hashSetResults.length; i++) {
+  hashSetResults[i] = baseHashSet.createHashSetResult();
+}
+
+allMatchs = new int[VectorizedRowBatch.DEFAULT_SIZE];
+
+spills = new int[VectorizedRowBatch.DEFAULT_SIZE];
+spillHashMapResultIndices = new int[VectorizedRowBatch.DEFAULT_SIZE];
+  }
+
+  
//---
+
+  /*
+   * Anti join (hash 

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460605498



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
##
@@ -638,6 +657,12 @@ private void genObject(int aliasNum, boolean allLeftFirst, 
boolean allLeftNull)
   // skipping the rest of the rows in the rhs table of the semijoin
   done = !needsPostEvaluation;
 }
+  } else if (type == JoinDesc.ANTI_JOIN) {
+if (innerJoin(skip, left, right)) {
+  // if anti join found a match then the condition is not matched for 
anti join, so we can skip rest of the

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460522562



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java
##
@@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// TODO : Duplicate codes need to merge with semi join.
+// Single-Column Long hash table import.
+// Single-Column Long specific imports.
+
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on a 
Single-Column Long
+ * using a hash set.
+ */
+public class VectorMapJoinAntiJoinLongOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinLongOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinLongHashSet hashSet;
+
+  // Single-Column Long specific members.
+  // For integers, we have optional min/max filtering.
+  private transient boolean useMinMax;
+  private transient long min;
+  private transient long max;
+
+  // The column number for this one column join specialization.
+  private transient int singleJoinColumn;
+
+  // Pass-thru constructors.
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinLongOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  // Process Single-Column Long Anti Join on a vectorized row batch.
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+// Initialize Single-Column Long members for this specialized class.
+singleJoinColumn = bigTableKeyColumnMap[0];
+  }
+
+  @Override
+  public void hashTableSetup() throws HiveException {
+super.hashTableSetup();
+
+// Get our Single-Column Long hash set information for this specialized 
class.
+hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable;
+useMinMax = hashSet.useMinMax();
+if (useMinMax) {
+  min = hashSet.min();
+  max = hashSet.max();
+}
+  }
+
+  @Override
+  public void processBatch(VectorizedRowBatch batch) throws HiveException {
+
+try {
+  // (Currently none)
+  // antiPerBatchSetup(batch);
+
+  // For anti joins, we may apply the filter(s) now.
+  for(VectorExpression ve : bigTableFilterExpressions) {
+ve.evaluate(batch);
+  }
+
+  final int inputLogicalSize = batch.size;
+  if (inputLogicalSize == 0) {
+return;
+  }
+
+  // Perform any key expressions.  Results will go into scratch columns.
+  if (bigTableKeyExpressions != null) {
+for (VectorExpression ve : bigTableKeyExpressions) {
+  ve.evaluate(batch);
+}
+  }
+
+  // The one join column for this specialized class.
+  

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460522454



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java
##
@@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// TODO : Duplicate codes need to merge with semi join.
+// Single-Column Long hash table import.
+// Single-Column Long specific imports.
+
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on a 
Single-Column Long
+ * using a hash set.
+ */
+public class VectorMapJoinAntiJoinLongOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinLongOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinLongHashSet hashSet;
+
+  // Single-Column Long specific members.
+  // For integers, we have optional min/max filtering.
+  private transient boolean useMinMax;
+  private transient long min;
+  private transient long max;
+
+  // The column number for this one column join specialization.
+  private transient int singleJoinColumn;
+
+  // Pass-thru constructors.
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinLongOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  // Process Single-Column Long Anti Join on a vectorized row batch.
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+// Initialize Single-Column Long members for this specialized class.
+singleJoinColumn = bigTableKeyColumnMap[0];
+  }
+
+  @Override
+  public void hashTableSetup() throws HiveException {
+super.hashTableSetup();
+
+// Get our Single-Column Long hash set information for this specialized 
class.
+hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable;
+useMinMax = hashSet.useMinMax();
+if (useMinMax) {
+  min = hashSet.min();
+  max = hashSet.max();
+}
+  }
+
+  @Override
+  public void processBatch(VectorizedRowBatch batch) throws HiveException {
+
+try {
+  // (Currently none)
+  // antiPerBatchSetup(batch);
+
+  // For anti joins, we may apply the filter(s) now.
+  for(VectorExpression ve : bigTableFilterExpressions) {
+ve.evaluate(batch);
+  }
+
+  final int inputLogicalSize = batch.size;
+  if (inputLogicalSize == 0) {
+return;
+  }
+
+  // Perform any key expressions.  Results will go into scratch columns.
+  if (bigTableKeyExpressions != null) {
+for (VectorExpression ve : bigTableKeyExpressions) {
+  ve.evaluate(batch);
+}
+  }
+
+  // The one join column for this specialized class.
+  

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460522312



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java
##
@@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// TODO : Duplicate codes need to merge with semi join.
+// Single-Column Long hash table import.
+// Single-Column Long specific imports.
+
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on a 
Single-Column Long
+ * using a hash set.
+ */
+public class VectorMapJoinAntiJoinLongOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinLongOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinLongHashSet hashSet;
+
+  // Single-Column Long specific members.
+  // For integers, we have optional min/max filtering.
+  private transient boolean useMinMax;
+  private transient long min;
+  private transient long max;
+
+  // The column number for this one column join specialization.
+  private transient int singleJoinColumn;
+
+  // Pass-thru constructors.
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinLongOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  // Process Single-Column Long Anti Join on a vectorized row batch.
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+// Initialize Single-Column Long members for this specialized class.
+singleJoinColumn = bigTableKeyColumnMap[0];
+  }
+
+  @Override
+  public void hashTableSetup() throws HiveException {
+super.hashTableSetup();
+
+// Get our Single-Column Long hash set information for this specialized 
class.
+hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable;
+useMinMax = hashSet.useMinMax();
+if (useMinMax) {
+  min = hashSet.min();
+  max = hashSet.max();
+}
+  }
+
+  @Override
+  public void processBatch(VectorizedRowBatch batch) throws HiveException {
+
+try {
+  // (Currently none)
+  // antiPerBatchSetup(batch);
+
+  // For anti joins, we may apply the filter(s) now.
+  for(VectorExpression ve : bigTableFilterExpressions) {
+ve.evaluate(batch);
+  }
+
+  final int inputLogicalSize = batch.size;
+  if (inputLogicalSize == 0) {
+return;
+  }
+
+  // Perform any key expressions.  Results will go into scratch columns.
+  if (bigTableKeyExpressions != null) {
+for (VectorExpression ve : bigTableKeyExpressions) {
+  ve.evaluate(batch);
+}
+  }
+
+  // The one join column for this specialized class.
+  

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460522261



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinStringOperator.java
##
@@ -0,0 +1,371 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.StringExpr;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinBytesHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// Single-Column String hash table import.
+// Single-Column String specific imports.
+
+// TODO : Duplicate codes need to merge with semi join.
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on a 
Single-Column String
+ * using a hash set.
+ */
+public class VectorMapJoinAntiJoinStringOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+
+  
//
+
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinStringOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  
//
+
+  // (none)
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+  //---
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinBytesHashSet hashSet;
+
+  //---
+  // Single-Column String specific members.
+  //
+
+  // The column number for this one column join specialization.
+  private transient int singleJoinColumn;
+
+  //---
+  // Pass-thru constructors.
+  //
+
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinStringOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinStringOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinStringOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+ VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  //---
+  // Process Single-Column String anti Join on a vectorized row batch.
+  //
+
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+/*
+ * Initialize Single-Column String members for this specialized class.
+ */
+
+singleJoinColumn = bigTableKeyColumnMap[0];
+  }
+
+  @Override
+  public void hashTableSetup() throws HiveException {
+super.hashTableSetup();
+
+/*
+ * Get our Single-Column String hash set information for this specialized 
class.
+ */
+
+hashSet = (VectorMapJoinBytesHashSet) vectorMapJoinHashTable;
+  }
+
+  @Override
+  public void processBatch(VectorizedRowBatch batch) throws HiveException {
+
+try {
+
+  // Do the per-batch setup for an anti join.
+
+  // (Currently none)
+  // antiPerBatchSetup(batch);
+
+  // For anti joins, we may apply the filter(s) now.
+  for(VectorExpression ve : 

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460521384



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
##
@@ -509,11 +513,17 @@ protected void addToAliasFilterTags(byte alias, 
List object, boolean isN
 }
   }
 
+  private void createForwardJoinObjectForAntiJoin(boolean[] skip) throws 
HiveException {
+boolean forward = fillFwdCache(skip);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460521246



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g
##
@@ -145,6 +145,7 @@ joinToken
 | KW_RIGHT (KW_OUTER)? KW_JOIN -> TOK_RIGHTOUTERJOIN
 | KW_FULL  (KW_OUTER)? KW_JOIN -> TOK_FULLOUTERJOIN
 | KW_LEFT KW_SEMI KW_JOIN  -> TOK_LEFTSEMIJOIN
+| KW_ANTI KW_JOIN  -> TOK_ANTIJOIN

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460521056



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
##
@@ -153,6 +153,8 @@
 
   transient boolean hasLeftSemiJoin = false;
 
+  transient boolean hasAntiJoin = false;

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460520903



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptMaterializationValidator.java
##
@@ -253,6 +256,14 @@ private RelNode visit(HiveSemiJoin semiJoin) {
 return visitChildren(semiJoin);
   }
 
+  // Note: Not currently part of the HiveRelNode interface
+  private RelNode visit(HiveAntiJoin antiJoin) {

Review comment:
   Not sure ..copy pasted from semi join.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460520647



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
##
@@ -1112,7 +1112,7 @@ public RexNode field(RexNode e, String name) {
   }
 
   public HiveSubQRemoveRelBuilder join(JoinRelType joinType, RexNode condition,
-   Set variablesSet, 
boolean createSemiJoin) {
+   Set variablesSet, 
JoinRelType semiJoinType) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460519523



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
##
@@ -56,6 +57,9 @@
   public static final HiveJoinAddNotNullRule INSTANCE_SEMIJOIN =
   new HiveJoinAddNotNullRule(HiveSemiJoin.class, 
HiveRelFactories.HIVE_FILTER_FACTORY);
 
+  public static final HiveJoinAddNotNullRule INSTANCE_ANTIJOIN =
+  new HiveJoinAddNotNullRule(HiveAntiJoin.class, 
HiveRelFactories.HIVE_FILTER_FACTORY);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460519111



##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinDesc.java
##
@@ -89,7 +89,8 @@ public PrimitiveTypeInfo getPrimitiveTypeInfo() {
 INNER_BIG_ONLY,
 LEFT_SEMI,
 OUTER,
-FULL_OUTER
+FULL_OUTER,
+ANTI

Review comment:
   LEFT_ANTI





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460518974



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460518799



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class);
+  public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new 
HiveJoinWithFilterToAntiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveJoinWithFilterToAntiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void perform(RelOptRuleCall call, Project project, Filter filter, 
Join join) {
+LOG.debug("Matched HiveAntiJoinRule");
+
+if (join.getCondition().isAlwaysTrue()) {
+  return;
+}
+
+//We support conversion from left outer join only.
+if (join.getJoinType() != JoinRelType.LEFT) {
+  return;
+}
+
+assert (filter != null);
+
+List aboveFilters = 
RelOptUtil.conjunctions(filter.getCondition());
+boolean hasIsNull = false;
+
+// Get all filter condition and check if any of them is a "is null" kind.
+for (RexNode filterNode : aboveFilters) {
+  if (filterNode.getKind() == SqlKind.IS_NULL &&
+  isFilterFromRightSide(join, filterNode, join.getJoinType())) {
+hasIsNull = true;
+break;
+  }
+}
+
+// Is null should be on a key from right side of the join.
+if (!hasIsNull) {
+  return;
+}
+
+// Build anti join with same left, right child and condition as original 
left outer join.
+Join anti = join.copy(join.getTraitSet(), join.getCondition(),
+join.getLeft(), join.getRight(), JoinRelType.ANTI, false);
+
+//TODO : Do we really need it
+call.getPlanner().onCopy(join, anti);
+
+RelNode newProject = getNewProjectNode(project, anti);
+if (newProject != null) {
+  call.getPlanner().onCopy(project, newProject);

Review comment:
   done

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * 

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460518454



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java
##
@@ -79,6 +80,11 @@ public Double getDistinctRowCount(HiveSemiJoin rel, 
RelMetadataQuery mq, Immutab
 return super.getDistinctRowCount(rel, mq, groupKey, predicate);
   }
 
+  public Double getDistinctRowCount(HiveAntiJoin rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey,
+RexNode predicate) {
+return super.getDistinctRowCount(rel, mq, groupKey, predicate);

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23933





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460518411



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java
##
@@ -118,6 +119,15 @@ public Double getRowCount(HiveJoin join, RelMetadataQuery 
mq) {
   }
 
   public Double getRowCount(HiveSemiJoin rel, RelMetadataQuery mq) {
+return getRowCountInt(rel, mq);
+  }
+
+  public Double getRowCount(HiveAntiJoin rel, RelMetadataQuery mq) {
+return getRowCountInt(rel, mq);
+  }
+
+  private Double getRowCountInt(Join rel, RelMetadataQuery mq) {

Review comment:
   super.getRowCount(rel, mq) does not support Anti join. I think we need 
to handle it.
   https://issues.apache.org/jira/browse/HIVE-23933





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460515695



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java
##
@@ -118,6 +119,15 @@ public Double getRowCount(HiveJoin join, RelMetadataQuery 
mq) {
   }
 
   public Double getRowCount(HiveSemiJoin rel, RelMetadataQuery mq) {
+return getRowCountInt(rel, mq);
+  }
+
+  public Double getRowCount(HiveAntiJoin rel, RelMetadataQuery mq) {
+return getRowCountInt(rel, mq);
+  }
+
+  private Double getRowCountInt(Join rel, RelMetadataQuery mq) {

Review comment:
   Yes done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-26 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460515257



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRemoveGBYSemiJoinRule.java
##
@@ -41,17 +41,19 @@
 
   public HiveRemoveGBYSemiJoinRule() {
 super(
-operand(HiveSemiJoin.class,
+operand(Join.class,
 some(
 operand(RelNode.class, any()),
 operand(Aggregate.class, any(,
 HiveRelFactories.HIVE_BUILDER, "HiveRemoveGBYSemiJoinRule");
   }
 
   @Override public void onMatch(RelOptRuleCall call) {
-final HiveSemiJoin semijoin= call.rel(0);
+final Join join= call.rel(0);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460001925



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2606,6 +2607,17 @@ private long computeFinalRowCount(List 
rowCountParents, long interimRowCou
   // max # of rows = rows from left side
   result = Math.min(rowCountParents.get(joinCond.getLeft()), result);
   break;
+case JoinDesc.ANTI_JOIN:
+  long leftRowCount = rowCountParents.get(joinCond.getLeft());
+  if (leftRowCount < result) {
+// Ideally the inner join count should be less than the left row 
count. but if its not calculated
+// properly then we can assume whole of left table will be 
selected.
+result = leftRowCount;

Review comment:
   This case will come if the stats are not proper. So to be on safer side, 
i assume that all rows from the left side will be projected. That is the max 
value. If set it to 0, it should not trigger some re-write, assuming the join 
result is empty.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460002261



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java
##
@@ -142,7 +146,7 @@ private Double computeInnerJoinSelectivity(Join j, 
RelMetadataQuery mq, RexNode
 ndvEstimate = exponentialBackoff(peLst, colStatMap);
   }
 
-  if (j.isSemiJoin()) {
+  if (j.isSemiJoin() || (j instanceof HiveJoin && 
j.getJoinType().equals(JoinRelType.ANTI))) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r460001236



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -1901,6 +1905,11 @@ public RelNode apply(RelOptCluster cluster, RelOptSchema 
relOptSchema, SchemaPlu
   calcitePreCboPlan = applyPreJoinOrderingTransforms(calciteGenPlan,
   mdProvider.getMetadataProvider(), executorProvider);
 
+  if (conf.getBoolVar(ConfVars.HIVE_CONVERT_ANTI_JOIN)) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459998845



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java
##
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class);
+  public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new 
HiveJoinWithFilterToAntiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveJoinWithFilterToAntiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void perform(RelOptRuleCall call, Project project, Filter filter, 
Join join) {
+LOG.debug("Matched HiveAntiJoinRule");
+
+assert (filter != null);
+
+//We support conversion from left outer join only.
+if (join.getJoinType() != JoinRelType.LEFT) {
+  return;
+}
+
+List aboveFilters = 
RelOptUtil.conjunctions(filter.getCondition());
+boolean hasIsNull = false;
+
+// Get all filter condition and check if any of them is a "is null" kind.
+for (RexNode filterNode : aboveFilters) {
+  if (filterNode.getKind() == SqlKind.IS_NULL &&
+  isFilterFromRightSide(join, filterNode, join.getJoinType())) {
+hasIsNull = true;
+break;
+  }
+}
+
+// Is null should be on a key from right side of the join.
+if (!hasIsNull) {
+  return;
+}
+
+// Build anti join with same left, right child and condition as original 
left outer join.
+Join anti = join.copy(join.getTraitSet(), join.getCondition(),

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459998986



##
File path: 
ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out
##
@@ -0,0 +1,99 @@
+PREHOOK: query: explain cbo
+select
+   count(distinct cs_order_number) as `order count`
+  ,sum(cs_ext_ship_cost) as `total shipping cost`
+  ,sum(cs_net_profit) as `total net profit`
+from
+   catalog_sales cs1
+  ,date_dim
+  ,customer_address
+  ,call_center
+where
+d_date between '2001-4-01' and
+   (cast('2001-4-01' as date) + 60 days)
+and cs1.cs_ship_date_sk = d_date_sk
+and cs1.cs_ship_addr_sk = ca_address_sk
+and ca_state = 'NY'
+and cs1.cs_call_center_sk = cc_call_center_sk
+and cc_county in ('Ziebach County','Levy County','Huron County','Franklin 
Parish',
+  'Daviess County'
+)
+and exists (select *
+from catalog_sales cs2
+where cs1.cs_order_number = cs2.cs_order_number
+  and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
+and not exists(select *
+   from catalog_returns cr1
+   where cs1.cs_order_number = cr1.cr_order_number)
+order by count(distinct cs_order_number)
+limit 100
+PREHOOK: type: QUERY
+PREHOOK: Input: default@call_center
+PREHOOK: Input: default@catalog_returns
+PREHOOK: Input: default@catalog_sales
+PREHOOK: Input: default@customer_address
+PREHOOK: Input: default@date_dim
+PREHOOK: Output: hdfs://### HDFS PATH ###
+POSTHOOK: query: explain cbo
+select
+   count(distinct cs_order_number) as `order count`
+  ,sum(cs_ext_ship_cost) as `total shipping cost`
+  ,sum(cs_net_profit) as `total net profit`
+from
+   catalog_sales cs1
+  ,date_dim
+  ,customer_address
+  ,call_center
+where
+d_date between '2001-4-01' and
+   (cast('2001-4-01' as date) + 60 days)
+and cs1.cs_ship_date_sk = d_date_sk
+and cs1.cs_ship_addr_sk = ca_address_sk
+and ca_state = 'NY'
+and cs1.cs_call_center_sk = cc_call_center_sk
+and cc_county in ('Ziebach County','Levy County','Huron County','Franklin 
Parish',
+  'Daviess County'
+)
+and exists (select *
+from catalog_sales cs2
+where cs1.cs_order_number = cs2.cs_order_number
+  and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
+and not exists(select *
+   from catalog_returns cr1
+   where cs1.cs_order_number = cr1.cr_order_number)
+order by count(distinct cs_order_number)
+limit 100
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@call_center
+POSTHOOK: Input: default@catalog_returns
+POSTHOOK: Input: default@catalog_sales
+POSTHOOK: Input: default@customer_address
+POSTHOOK: Input: default@date_dim
+POSTHOOK: Output: hdfs://### HDFS PATH ###
+CBO PLAN:
+HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], 
agg#2=[sum($6)])
+  HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], 
cost=[not available])
+HiveSemiJoin(condition=[AND(<>($3, $13), =($4, $14))], joinType=[semi])

Review comment:
   done ..creating the HiveAntiJoin operator directly





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459990495



##
File path: 
ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out
##
@@ -0,0 +1,99 @@
+PREHOOK: query: explain cbo
+select
+   count(distinct cs_order_number) as `order count`
+  ,sum(cs_ext_ship_cost) as `total shipping cost`
+  ,sum(cs_net_profit) as `total net profit`
+from
+   catalog_sales cs1
+  ,date_dim
+  ,customer_address
+  ,call_center
+where
+d_date between '2001-4-01' and
+   (cast('2001-4-01' as date) + 60 days)
+and cs1.cs_ship_date_sk = d_date_sk
+and cs1.cs_ship_addr_sk = ca_address_sk
+and ca_state = 'NY'
+and cs1.cs_call_center_sk = cc_call_center_sk
+and cc_county in ('Ziebach County','Levy County','Huron County','Franklin 
Parish',
+  'Daviess County'
+)
+and exists (select *
+from catalog_sales cs2
+where cs1.cs_order_number = cs2.cs_order_number
+  and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
+and not exists(select *
+   from catalog_returns cr1
+   where cs1.cs_order_number = cr1.cr_order_number)
+order by count(distinct cs_order_number)
+limit 100
+PREHOOK: type: QUERY
+PREHOOK: Input: default@call_center
+PREHOOK: Input: default@catalog_returns
+PREHOOK: Input: default@catalog_sales
+PREHOOK: Input: default@customer_address
+PREHOOK: Input: default@date_dim
+PREHOOK: Output: hdfs://### HDFS PATH ###
+POSTHOOK: query: explain cbo
+select
+   count(distinct cs_order_number) as `order count`
+  ,sum(cs_ext_ship_cost) as `total shipping cost`
+  ,sum(cs_net_profit) as `total net profit`
+from
+   catalog_sales cs1
+  ,date_dim
+  ,customer_address
+  ,call_center
+where
+d_date between '2001-4-01' and
+   (cast('2001-4-01' as date) + 60 days)
+and cs1.cs_ship_date_sk = d_date_sk
+and cs1.cs_ship_addr_sk = ca_address_sk
+and ca_state = 'NY'
+and cs1.cs_call_center_sk = cc_call_center_sk
+and cc_county in ('Ziebach County','Levy County','Huron County','Franklin 
Parish',
+  'Daviess County'
+)
+and exists (select *
+from catalog_sales cs2
+where cs1.cs_order_number = cs2.cs_order_number
+  and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
+and not exists(select *
+   from catalog_returns cr1
+   where cs1.cs_order_number = cr1.cr_order_number)
+order by count(distinct cs_order_number)
+limit 100
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@call_center
+POSTHOOK: Input: default@catalog_returns
+POSTHOOK: Input: default@catalog_sales
+POSTHOOK: Input: default@customer_address
+POSTHOOK: Input: default@date_dim
+POSTHOOK: Output: hdfs://### HDFS PATH ###
+CBO PLAN:
+HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], 
agg#2=[sum($6)])
+  HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], 
cost=[not available])

Review comment:
   I think, it's not a problem. The filed index are for different input. So 
even though the number is same the condition is different. Even without anti 
join, the condition is same.
   
 HiveFilter(condition=[IS NULL($13)])
   HiveJoin(condition=[=($4, $14)], joinType=[left], algorithm=[none], 
cost=[not available])
 HiveSemiJoin(condition=[AND(<>($3, $13), =($4, $14))], joinType=[semi])





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459988214



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java
##
@@ -79,6 +80,11 @@ public Double getDistinctRowCount(HiveSemiJoin rel, 
RelMetadataQuery mq, Immutab
 return super.getDistinctRowCount(rel, mq, groupKey, predicate);
   }
 
+  public Double getDistinctRowCount(HiveAntiJoin rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey,
+RexNode predicate) {
+return super.getDistinctRowCount(rel, mq, groupKey, predicate);

Review comment:
   calcite 21 does not support distinct calculation for Anti join. 
 if (join.isSemiJoin()) {
   return getSemiJoinDistinctRowCount(join, mq, groupKey, predicate);
 } else {
   Builder leftMask = ImmutableBitSet.builder();
   I think these rules will not get triggered as of now for Anti join as i am 
not converting the not-exists to anti join. As of now all these rules will be 
applied on left outer and then we convert the left outer to anti join.
   I 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459986329



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
##
@@ -414,6 +416,13 @@ private RexNode rewriteInExists(RexSubQuery e, 
Set variablesSet,
   // null keys we do not need to generate count(*), count(c)
   if (e.getKind() == SqlKind.EXISTS) {
 logic = RelOptUtil.Logic.TRUE_FALSE;
+if (conf.getBoolVar(HiveConf.ConfVars.HIVE_CONVERT_ANTI_JOIN)) {
+  //TODO : As of now anti join is first converted to left outer join

Review comment:
   Now also the conversion is not done. The code is present but actual 
conversion is not done and logic is still TRUE_FALSE. For the code to be 
effective , the logic should be changed to FALSE. I have not done it yet, as it 
was causing some change in plan which i could not judge to be expected or not. 
Anyways i have created a JIRA to track this.
   https://issues.apache.org/jira/browse/HIVE-23928





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-24 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459984171



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class);
+  public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new 
HiveJoinWithFilterToAntiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveJoinWithFilterToAntiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void perform(RelOptRuleCall call, Project project, Filter filter, 
Join join) {
+LOG.debug("Matched HiveAntiJoinRule");
+
+if (join.getCondition().isAlwaysTrue()) {
+  return;
+}
+
+//We support conversion from left outer join only.
+if (join.getJoinType() != JoinRelType.LEFT) {
+  return;
+}
+
+assert (filter != null);
+
+List aboveFilters = 
RelOptUtil.conjunctions(filter.getCondition());
+boolean hasIsNull = false;
+
+// Get all filter condition and check if any of them is a "is null" kind.
+for (RexNode filterNode : aboveFilters) {
+  if (filterNode.getKind() == SqlKind.IS_NULL &&
+  isFilterFromRightSide(join, filterNode, join.getJoinType())) {
+hasIsNull = true;
+break;
+  }
+}
+
+// Is null should be on a key from right side of the join.
+if (!hasIsNull) {
+  return;
+}
+
+// Build anti join with same left, right child and condition as original 
left outer join.
+Join anti = join.copy(join.getTraitSet(), join.getCondition(),
+join.getLeft(), join.getRight(), JoinRelType.ANTI, false);
+
+//TODO : Do we really need it
+call.getPlanner().onCopy(join, anti);
+
+RelNode newProject = getNewProjectNode(project, anti);
+if (newProject != null) {
+  call.getPlanner().onCopy(project, newProject);
+  call.transformTo(newProject);
+}
+  }
+
+  protected RelNode getNewProjectNode(Project oldProject, Join newJoin) {

Review comment:
   I didn't find any such utility method, so added this into 
HiveCalciteUtil and used.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-23 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459841087



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinProjectTransposeRule.java
##
@@ -133,6 +135,10 @@ private HiveJoinProjectTransposeRuleBase(
 
 public void onMatch(RelOptRuleCall call) {
   //TODO: this can be removed once CALCITE-3824 is released
+  Join joinRel = call.rel(0);
+  if (joinRel.getJoinType() == JoinRelType.ANTI) {

Review comment:
   This was causing some issue with having clause. 
https://issues.apache.org/jira/browse/HIVE-23921





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-23 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459840139



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java
##
@@ -100,7 +100,8 @@ public void onMatch(RelOptRuleCall call) {
 // These boolean values represent corresponding left, right input which is 
potential FK
 boolean leftInputPotentialFK = topRefs.intersects(leftBits);
 boolean rightInputPotentialFK = topRefs.intersects(rightBits);
-if (leftInputPotentialFK && rightInputPotentialFK && (joinType == 
JoinRelType.INNER || joinType == JoinRelType.SEMI)) {
+if (leftInputPotentialFK && rightInputPotentialFK &&
+(joinType == JoinRelType.INNER || joinType == JoinRelType.SEMI || 
joinType == JoinRelType.ANTI)) {

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23920





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-23 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459827002



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
##
@@ -74,7 +78,14 @@ public HiveJoinAddNotNullRule(Class clazz,
   @Override
   public void onMatch(RelOptRuleCall call) {
 Join join = call.rel(0);
-if (join.getJoinType() == JoinRelType.FULL || 
join.getCondition().isAlwaysTrue()) {
+
+// For anti join case add the not null on right side if the condition is
+// always true. This is done because during execution, anti join expect 
the right side to
+// be empty and if we dont put null check on right, for null only right 
side table and condition
+// always true, execution will produce 0 records.
+// eg  select * from left_tbl where (select 1 from all_null_right limit 1) 
is null
+if (join.getJoinType() == JoinRelType.FULL ||
+(join.getJoinType() != JoinRelType.ANTI && 
join.getCondition().isAlwaysTrue())) {

Review comment:
   Yes, the comment is not proper. It's like we will add a not null 
condition for anti join even if the condition is always true.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-23 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459826636



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAntiJoin.java
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.reloperators;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRulesRegistry;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class HiveAntiJoin extends Join implements HiveRelNode {

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23919





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-23 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459825977



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAntiJoin.java
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.reloperators;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRulesRegistry;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class HiveAntiJoin extends Join implements HiveRelNode {
+
+  private final RexNode joinFilter;

Review comment:
   The joinjoinFilter holds the residual filter which is used during post 
processing. These are the join conditions that are not part of the join key. I 
think condition in Join hold the full condition. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-23 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459253392



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
##
@@ -747,7 +747,7 @@ public static RewritablePKFKJoinInfo 
isRewritablePKFKJoin(Join join,
 final RelNode nonFkInput = leftInputPotentialFK ? join.getRight() : 
join.getLeft();
 final RewritablePKFKJoinInfo nonRewritable = 
RewritablePKFKJoinInfo.of(false, null);
 
-if (joinType != JoinRelType.INNER && !join.isSemiJoin()) {
+if (joinType != JoinRelType.INNER && !join.isSemiJoin() && joinType != 
JoinRelType.ANTI) {

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23906





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-22 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459220734



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinGenerateResultOperator.java
##
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSet;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashSetResult;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashTableResult;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+
+// TODO : This class is duplicate of semi join. Need to do a refactoring to 
merge it with semi join.

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23905





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-22 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459219966



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/TestMapJoinOperator.java
##
@@ -1792,6 +1794,8 @@ private void executeTest(MapJoinTestDescription testDesc, 
MapJoinTestData testDa
 case FULL_OUTER:
   executeTestFullOuter(testDesc, testData, title);
   break;
+case ANTI: //TODO

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23904





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-22 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459216878



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java
##
@@ -339,6 +339,12 @@ String getFuncText(String funcText, final int srcPos) {
   vector.add(right, left);
   break;
 case JoinDesc.LEFT_OUTER_JOIN:
+case JoinDesc.ANTI_JOIN:
+//TODO : In case of anti join, bloom filter can be created on left 
side also ("IN (keylist right table)").
+// But the filter should be "not-in" ("NOT IN (keylist right table)") 
as we want to select the records from
+// left side which are not present in the right side. But it may cause 
wrong result as
+// bloom filter may have false positive and thus simply adding not is 
not correct,
+// special handling is required for "NOT IN".

Review comment:
   created a Jira ..https://issues.apache.org/jira/browse/HIVE-23903





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-22 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459215352



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
##
@@ -74,7 +78,14 @@ public HiveJoinAddNotNullRule(Class clazz,
   @Override
   public void onMatch(RelOptRuleCall call) {
 Join join = call.rel(0);
-if (join.getJoinType() == JoinRelType.FULL || 
join.getCondition().isAlwaysTrue()) {
+
+// For anti join case add the not null on right side if the condition is

Review comment:
   For the case when we have. join condition which gets evaluated, it will 
return false while comparing with a null on the right side. But for always true 
join condition, we will not do a match for right side assuming it's always 
true.  So for anti join, the left side records will not be emitted. To avoid 
this we put a null check on right side table and for all null entry, no records 
will be projected from right side and thus all records from left side will be 
emitted. So the comment is not very accurate. It's like even if the condition 
is always true, we add a null check on right side for anti join. I will update 
it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-22 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459208067



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java
##
@@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// TODO : Duplicate codes need to merge with semi join.
+// Single-Column Long hash table import.
+// Single-Column Long specific imports.
+
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on a 
Single-Column Long
+ * using a hash set.
+ */
+public class VectorMapJoinAntiJoinLongOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinLongOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinLongHashSet hashSet;
+
+  // Single-Column Long specific members.
+  // For integers, we have optional min/max filtering.
+  private transient boolean useMinMax;
+  private transient long min;
+  private transient long max;
+
+  // The column number for this one column join specialization.
+  private transient int singleJoinColumn;
+
+  // Pass-thru constructors.
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinLongOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  // Process Single-Column Long Anti Join on a vectorized row batch.
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+// Initialize Single-Column Long members for this specialized class.
+singleJoinColumn = bigTableKeyColumnMap[0];
+  }
+
+  @Override
+  public void hashTableSetup() throws HiveException {
+super.hashTableSetup();
+
+// Get our Single-Column Long hash set information for this specialized 
class.
+hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable;
+useMinMax = hashSet.useMinMax();
+if (useMinMax) {
+  min = hashSet.min();
+  max = hashSet.max();
+}
+  }
+
+  @Override
+  public void processBatch(VectorizedRowBatch batch) throws HiveException {
+
+try {
+  // (Currently none)

Review comment:
   The pre batch processing done only for joins which emits the right table 
records. For semi join and anti join, it's not required.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For 

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-22 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459198718



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
##
@@ -523,11 +533,19 @@ private boolean createForwardJoinObject(boolean[] skip) 
throws HiveException {
 forward = true;
   }
 }
+return forward;
+  }
+
+  // returns whether a record was forwarded
+  private boolean createForwardJoinObject(boolean[] skip, boolean antiJoin) 
throws HiveException {
+boolean forward = fillFwdCache(skip);
 if (forward) {
   if (needsPostEvaluation) {
 forward = !JoinUtil.isFiltered(forwardCache, residualJoinFilters, 
residualJoinFiltersOIs);
   }
-  if (forward) {
+
+  // For anti join, check all right side and if nothing is matched then 
only forward.

Review comment:
   For anti join we don't emit the record here. It's done after all the 
records are checked and none of the record matches the condition. Here if 
forward is false we don't forward and as its a "&" we don't forward for anti 
join == true even if forward is true. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-20 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r457545081



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class);
+  public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new 
HiveJoinWithFilterToAntiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveJoinWithFilterToAntiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void perform(RelOptRuleCall call, Project project, Filter filter, 
Join join) {
+LOG.debug("Matched HiveAntiJoinRule");

Review comment:
   sure ..will do that 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-17 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r456594608



##
File path: ql/src/test/results/clientpositive/llap/antijoin.q.out
##
@@ -0,0 +1,1007 @@
+PREHOOK: query: create table t1_n55 as select cast(key as int) key, value from 
src where key <= 10
+PREHOOK: type: CREATETABLE_AS_SELECT
+PREHOOK: Input: default@src
+PREHOOK: Output: database:default
+PREHOOK: Output: default@t1_n55
+POSTHOOK: query: create table t1_n55 as select cast(key as int) key, value 
from src where key <= 10
+POSTHOOK: type: CREATETABLE_AS_SELECT
+POSTHOOK: Input: default@src
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@t1_n55
+POSTHOOK: Lineage: t1_n55.key EXPRESSION [(src)src.FieldSchema(name:key, 
type:string, comment:default), ]
+POSTHOOK: Lineage: t1_n55.value SIMPLE [(src)src.FieldSchema(name:value, 
type:string, comment:default), ]
+PREHOOK: query: select * from t1_n55 sort by key
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t1_n55
+ A masked pattern was here 
+POSTHOOK: query: select * from t1_n55 sort by key
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t1_n55
+ A masked pattern was here 
+0  val_0

Review comment:
   These all new test cases are added from the failure test cases of  a dry 
run with anti join enabled  true. Manually i have verified that the resultant 
records are same and plan difference is as per expected behavior. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-07-17 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r456593908



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Whether Hive enables the optimization about converting common join 
into mapjoin based on the input file size. \n" +
 "If this parameter is on, and the sum of size for n-1 of the 
tables/partitions for a n-way join is smaller than the\n" +
 "specified size, the join is directly converted to a mapjoin (there is 
no conditional task)."),
-
+HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false,

Review comment:
   Yes, i had triggered a ptest run with this config enabled to true by 
default. There were some 26 failures. I had analyzed those and some fixes were 
done to make sure that the result is same for both and difference in plan is as 
expected. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org