[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466826953



##
File path: 
ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query94_anti_join.q.out
##
@@ -0,0 +1,94 @@
+PREHOOK: query: explain cbo
+select  
+   count(distinct ws_order_number) as `order count`
+  ,sum(ws_ext_ship_cost) as `total shipping cost`
+  ,sum(ws_net_profit) as `total net profit`
+from
+   web_sales ws1
+  ,date_dim
+  ,customer_address
+  ,web_site
+where
+d_date between '1999-5-01' and 
+   (cast('1999-5-01' as date) + 60 days)
+and ws1.ws_ship_date_sk = d_date_sk
+and ws1.ws_ship_addr_sk = ca_address_sk
+and ca_state = 'TX'
+and ws1.ws_web_site_sk = web_site_sk
+and web_company_name = 'pri'
+and exists (select *
+from web_sales ws2
+where ws1.ws_order_number = ws2.ws_order_number
+  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
+and not exists(select *
+   from web_returns wr1
+   where ws1.ws_order_number = wr1.wr_order_number)
+order by count(distinct ws_order_number)
+limit 100
+PREHOOK: type: QUERY
+PREHOOK: Input: default@customer_address
+PREHOOK: Input: default@date_dim
+PREHOOK: Input: default@web_returns
+PREHOOK: Input: default@web_sales
+PREHOOK: Input: default@web_site
+PREHOOK: Output: hdfs://### HDFS PATH ###
+POSTHOOK: query: explain cbo
+select  
+   count(distinct ws_order_number) as `order count`
+  ,sum(ws_ext_ship_cost) as `total shipping cost`
+  ,sum(ws_net_profit) as `total net profit`
+from
+   web_sales ws1
+  ,date_dim
+  ,customer_address
+  ,web_site
+where
+d_date between '1999-5-01' and 
+   (cast('1999-5-01' as date) + 60 days)
+and ws1.ws_ship_date_sk = d_date_sk
+and ws1.ws_ship_addr_sk = ca_address_sk
+and ca_state = 'TX'
+and ws1.ws_web_site_sk = web_site_sk
+and web_company_name = 'pri'
+and exists (select *
+from web_sales ws2
+where ws1.ws_order_number = ws2.ws_order_number
+  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
+and not exists(select *
+   from web_returns wr1
+   where ws1.ws_order_number = wr1.wr_order_number)
+order by count(distinct ws_order_number)
+limit 100
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@customer_address
+POSTHOOK: Input: default@date_dim
+POSTHOOK: Input: default@web_returns
+POSTHOOK: Input: default@web_sales
+POSTHOOK: Input: default@web_site
+POSTHOOK: Output: hdfs://### HDFS PATH ###
+CBO PLAN:
+HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], 
agg#2=[sum($6)])
+  HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], 
cost=[not available])

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466826675



##
File path: 
ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out
##
@@ -0,0 +1,99 @@
+PREHOOK: query: explain cbo

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466826103



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Whether Hive enables the optimization about converting common join 
into mapjoin based on the input file size. \n" +
 "If this parameter is on, and the sum of size for n-1 of the 
tables/partitions for a n-way join is smaller than the\n" +
 "specified size, the join is directly converted to a mapjoin (there is 
no conditional task)."),
-
+HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false,

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466825617



##
File path: 
ql/src/test/results/clientpositive/llap/subquery_notexists_having.q.out
##
@@ -31,7 +31,8 @@ STAGE PLANS:
 Tez
  A masked pattern was here 
   Edges:
-Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
+Reducer 2 <- Map 1 (SIMPLE_EDGE)

Review comment:
   yes ..the join is getting converted to SMB join ..and so no reducer is 
required. In case of anti join its not getting converted. That is because left 
outer is adding an extra group by which is making the RS node on left and right 
side  equal, the pre-condition for converting to SMB join. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466822516



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -2129,6 +2133,16 @@ private RelNode applyPreJoinOrderingTransforms(RelNode 
basePlan, RelMetadataProv
 HiveRemoveSqCountCheck.INSTANCE);
   }
 
+  // 10. Convert left outer join + null filter on right side table column 
to anti join. Add this
+  // rule after all the optimization for which calcite support for anti 
join is missing.
+  // Needs to be done before ProjectRemoveRule as it expect a project over 
filter.
+  // This is done before join re-ordering as join re-ordering is 
converting the left outer

Review comment:
   As discussed, i have created a Jira 
https://issues.apache.org/jira/browse/HIVE-24013





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466819572



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
##
@@ -92,7 +104,7 @@ public void onMatch(RelOptRuleCall call) {
 Set rightPushedPredicates = 
Sets.newHashSet(registry.getPushedPredicates(join, 1));
 
 boolean genPredOnLeft = join.getJoinType() == JoinRelType.RIGHT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin();
-boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin();
+boolean genPredOnRight = join.getJoinType() == JoinRelType.LEFT || 
join.getJoinType() == JoinRelType.INNER || join.isSemiJoin()|| 
join.getJoinType() == JoinRelType.ANTI;

Review comment:
   yes ..that is taken care of.
   // For anti join, we should proceed to emit records if the right side is 
empty or not matching.
   if (type == JoinDesc.ANTI_JOIN && !producedRow) {





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466819149



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAntiSemiJoinRule.java
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAntiJoin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveAntiSemiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveAntiSemiJoinRule.class);
+  public static final HiveAntiSemiJoinRule INSTANCE = new 
HiveAntiSemiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveAntiSemiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void perform(RelOptRuleCall call, Project project, Filter filter, 
Join join) {
+LOG.debug("Start Matching HiveAntiJoinRule");
+
+//TODO : Need to support this scenario.
+if (join.getCondition().isAlwaysTrue()) {
+  return;
+}
+
+//We support conversion from left outer join only.
+if (join.getJoinType() != JoinRelType.LEFT) {
+  return;
+}
+
+assert (filter != null);
+
+// If null filter is not present from right side then we can not convert 
to anti join.
+List aboveFilters = 
RelOptUtil.conjunctions(filter.getCondition());
+Stream nullFilters = aboveFilters.stream().filter(filterNode -> 
filterNode.getKind() == SqlKind.IS_NULL);
+boolean hasNullFilter = 
HiveCalciteUtil.hasAnyExpressionFromRightSide(join, 
nullFilters.collect(Collectors.toList()));
+if (!hasNullFilter) {
+  return;
+}
+
+// If any projection is there from right side, then we can not convert to 
anti join.
+boolean hasProjection = 
HiveCalciteUtil.hasAnyExpressionFromRightSide(join, project.getProjects());
+if (hasProjection) {
+  return;
+}
+
+LOG.debug("Matched HiveAntiJoinRule");
+
+// Build anti join with same left, right child and condition as original 
left outer join.
+Join anti = HiveAntiJoin.getAntiJoin(join.getLeft().getCluster(), 
join.getLeft().getTraitSet(),
+join.getLeft(), join.getRight(), join.getCondition());
+RelNode newProject = project.copy(project.getTraitSet(), anti, 
project.getProjects(), project.getRowType());
+call.transformTo(newProject);

Review comment:
   for normal filter, its being pushed down. here we get filters which can 
not be pushed down. I have modified the code to handle those filters. And added 
these extra tests to verify.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.


[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466818492



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinMultiKeyOperator.java
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinBytesHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.apache.hadoop.hive.serde2.ByteStream.Output;
+import 
org.apache.hadoop.hive.serde2.binarysortable.fast.BinarySortableSerializeWrite;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// Multi-Key hash table import.
+// Multi-Key specific imports.
+
+// TODO : Duplicate codes need to merge with semi join.
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on 
Multi-Key
+ * using hash set.
+ */
+public class VectorMapJoinAntiJoinMultiKeyOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+
+  
//
+
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinMultiKeyOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  
//
+
+  // (none)
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+  //---
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinBytesHashSet hashSet;
+
+  //---
+  // Multi-Key specific members.
+  //
+
+  // Object that can take a set of columns in row in a vectorized row batch 
and serialized it.
+  // Known to not have any nulls.
+  private transient VectorSerializeRow keyVectorSerializeWrite;
+
+  // The BinarySortable serialization of the current key.
+  private transient Output currentKeyOutput;
+
+  // The BinarySortable serialization of the saved key for a possible series 
of equal keys.
+  private transient Output saveKeyOutput;
+
+  //---
+  // Pass-thru constructors.
+  //
+
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinMultiKeyOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinMultiKeyOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinMultiKeyOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  //---
+  // Process Multi-Key Anti Join on a vectorized row batch.
+  //
+
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+/*
+ * Initialize Multi-Key members for this specialized class.
+ */
+
+keyVectorSerializeWrite = new 
VectorSerializeRow(BinarySortableSerializeWrite.with(
+this.getConf().getKeyTblDesc().getProperties(), 
bigTableKeyColumnMap.length));
+keyVectorSerializeWrite.init(bigTableKeyTypeInfos, bigTableKeyColumnMap);
+
+currentKeyOutput = new 

[GitHub] [hive] maheshk114 commented on a change in pull request #1147: HIVE-23716: Support Anti Join in Hive

2020-08-06 Thread GitBox


maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466818194



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java
##
@@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// TODO : Duplicate codes need to merge with semi join.
+// Single-Column Long hash table import.
+// Single-Column Long specific imports.
+
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on a 
Single-Column Long
+ * using a hash set.
+ */
+public class VectorMapJoinAntiJoinLongOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinLongOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinLongHashSet hashSet;
+
+  // Single-Column Long specific members.
+  // For integers, we have optional min/max filtering.
+  private transient boolean useMinMax;
+  private transient long min;
+  private transient long max;
+
+  // The column number for this one column join specialization.
+  private transient int singleJoinColumn;
+
+  // Pass-thru constructors.
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinLongOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  // Process Single-Column Long Anti Join on a vectorized row batch.
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+// Initialize Single-Column Long members for this specialized class.
+singleJoinColumn = bigTableKeyColumnMap[0];
+  }
+
+  @Override
+  public void hashTableSetup() throws HiveException {
+super.hashTableSetup();
+
+// Get our Single-Column Long hash set information for this specialized 
class.
+hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable;
+useMinMax = hashSet.useMinMax();
+if (useMinMax) {
+  min = hashSet.min();
+  max = hashSet.max();
+}
+  }
+
+  @Override
+  public void processBatch(VectorizedRowBatch batch) throws HiveException {
+
+try {
+  // (Currently none)
+  // antiPerBatchSetup(batch);
+
+  // For anti joins, we may apply the filter(s) now.
+  for(VectorExpression ve : bigTableFilterExpressions) {
+ve.evaluate(batch);
+  }
+
+  final int inputLogicalSize = batch.size;
+  if (inputLogicalSize == 0) {
+return;
+  }
+
+  // Perform any key expressions.  Results will go into scratch columns.
+  if (bigTableKeyExpressions != null) {
+for (VectorExpression ve : bigTableKeyExpressions) {
+  ve.evaluate(batch);
+}
+  }
+
+  // The one join column for this specialized class.
+  

[GitHub] [hive] GuoPhilipse removed a comment on pull request #1363: HIVE-23996: Remove unused line to keep code clean

2020-08-06 Thread GitBox


GuoPhilipse removed a comment on pull request #1363:
URL: https://github.com/apache/hive/pull/1363#issuecomment-669876714


   cc @pvary



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor opened a new pull request #1374: HIVE-24012: Support for rewriting with materialized views containing …

2020-08-06 Thread GitBox


jcamachor opened a new pull request #1374:
URL: https://github.com/apache/hive/pull/1374


   …grouping sets
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] sam-an-cloudera opened a new pull request #1373: HIVE-24004: Improve performance for filter hook for superuser path

2020-08-06 Thread GitBox


sam-an-cloudera opened a new pull request #1373:
URL: https://github.com/apache/hive/pull/1373


   This is an improvement on filter hook. For superuser if we can skip 
authorization, we don't need to create authorizer. This can save some CPU 
cycles.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] sam-an-cloudera commented on pull request #1372: HIVE-24004: Improve performance for filter hook for superuser path

2020-08-06 Thread GitBox


sam-an-cloudera commented on pull request #1372:
URL: https://github.com/apache/hive/pull/1372#issuecomment-670265392


   problem with jenkins. Do another one. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] sam-an-cloudera closed pull request #1372: HIVE-24004: Improve performance for filter hook for superuser path

2020-08-06 Thread GitBox


sam-an-cloudera closed pull request #1372:
URL: https://github.com/apache/hive/pull/1372


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] sam-an-cloudera opened a new pull request #1372: HIVE-24002: Improve performance for filter hook for superuser path

2020-08-06 Thread GitBox


sam-an-cloudera opened a new pull request #1372:
URL: https://github.com/apache/hive/pull/1372


   This is an improvement on filter hook. For superuser if we can skip 
authorization, we don't need to create authorizer. This can save some CPU 
cycles. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ closed pull request #1369: HIVE-24000: Put exclusive MERGE INSERT under the feature flag

2020-08-06 Thread GitBox


deniskuzZ closed pull request #1369:
URL: https://github.com/apache/hive/pull/1369


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ opened a new pull request #1371: HIVE-24000: Put exclusive MERGE INSERT under the feature flag

2020-08-06 Thread GitBox


deniskuzZ opened a new pull request #1371:
URL: https://github.com/apache/hive/pull/1371


   (cherry picked from commit 0e4b02af485cb1972ebc4f251d853c710e70164f)
   
   test out fix
   
   
   
   ### What changes were proposed in this pull request?
   
   Pushed exclusive MERGE INSERT under the feature flag
   
   ### Why are the changes needed?
   
   Backward compatibility
   
   ### Does this PR introduce _any_ user-facing change?
   
   new feature flag property was introduced 'hive.txn.xlock.mergeinsert'
   
   ### How was this patch tested?
   
   TestDbTxnManager2



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] vineetgarg02 merged pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 merged pull request #1315:
URL: https://github.com/apache/hive/pull/1315


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kishendas commented on a change in pull request #1355: send tableId to get_partition APIs

2020-08-06 Thread GitBox


kishendas commented on a change in pull request #1355:
URL: https://github.com/apache/hive/pull/1355#discussion_r466627274



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestTxnManager.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.lockmgr;
+
+import org.apache.hadoop.hive.common.FileUtils;
+import org.apache.hadoop.hive.common.ValidTxnWriteIdList;
+import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
+import org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse;
+import org.apache.hadoop.hive.metastore.api.TxnToWriteId;
+import org.apache.hadoop.hive.metastore.api.TxnType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidReadTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.DriverState;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.hooks.ReadEntity;
+import org.apache.hadoop.hive.ql.hooks.WriteEntity;
+import org.apache.hadoop.hive.ql.metadata.DummyPartition;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Partition;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.util.ReflectionUtils;
+
+import java.util.*;
+
+/**
+ * An implementation of {@link HiveTxnManager} that does not support
+ * transactions.
+ * This class is only used in test.
+ */
+class TestTxnManager extends DummyTxnManager {
+  final static Character SEMICOLON = ':';

Review comment:
   :-) It's a COLON. 

##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -5292,6 +5303,16 @@ public synchronized SynchronizedMetaStoreClient 
getSynchronizedMSC() throws Meta
 return syncMetaStoreClient;
   }
 
+/**
+   * @return the metastore client for the current thread
+   * @throws MetaException
+   */
+  @LimitedPrivate(value = {"Hive"})

Review comment:
   Sure





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kishendas commented on a change in pull request #1355: send tableId to get_partition APIs

2020-08-06 Thread GitBox


kishendas commented on a change in pull request #1355:
URL: https://github.com/apache/hive/pull/1355#discussion_r466626987



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreClient.java
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.metadata;

Review comment:
   Makes sense. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kishendas commented on a change in pull request #1355: send tableId to get_partition APIs

2020-08-06 Thread GitBox


kishendas commented on a change in pull request #1355:
URL: https://github.com/apache/hive/pull/1355#discussion_r466626887



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -474,6 +476,12 @@ public static Hive get() throws HiveException {
 return get(true);
   }
 
+  public static Hive get(IMetaStoreClient msc) throws HiveException, 
MetaException {

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kishendas commented on a change in pull request #1355: send tableId to get_partition APIs

2020-08-06 Thread GitBox


kishendas commented on a change in pull request #1355:
URL: https://github.com/apache/hive/pull/1355#discussion_r466625978



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreClientApiArgumentsChecker.java
##
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.metadata;
+
+import com.google.common.collect.Lists;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.TableType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqualOrGreaterThan;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.thrift.TException;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * TestHiveMetaStoreClientApiArgumentsChecker
+ *
+ * This class works with {@link TestHiveMetaStoreClient} in order to verify 
the arguments that
+ * are sent from HS2 to HMS APIs.
+ *
+ */
+public class TestHiveMetaStoreClientApiArgumentsChecker {
+
+  private Hive hive;
+  private IMetaStoreClient msc;
+  private FileSystem fs;
+  final static String DB_NAME = "db";
+  final static String TABLE_NAME = "table";
+  private IMetaStoreClient client;
+  private Table t;
+
+  protected static final String USER_NAME = "user0";
+
+  @Before
+  public void setUp() throws Exception {
+
+client = new TestHiveMetaStoreClient(new HiveConf(Hive.class));
+hive = Hive.get(client);
+
hive.getConf().set(MetastoreConf.ConfVars.FS_HANDLER_THREADS_COUNT.getVarname(),
 "15");
+
hive.getConf().set(MetastoreConf.ConfVars.MSCK_PATH_VALIDATION.getVarname(), 
"throw");
+msc = new HiveMetaStoreClient(hive.getConf());
+
+hive.getConf().setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
+
"org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
+HiveConf.setBoolVar(hive.getConf(), 
HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false);
+hive.getConf().set(ValidTxnList.VALID_TXNS_KEY, "1:");
+hive.getConf().set(ValidWriteIdList.VALID_WRITEIDS_KEY, TABLE_NAME + 
":1:");
+hive.getConf().setVar(HiveConf.ConfVars.HIVE_TXN_MANAGER, 
"org.apache.hadoop.hive.ql.lockmgr.TestTxnManager");
+SessionState.start(hive.getConf());
+SessionState.get().initTxnMgr(hive.getConf());
+Context ctx = new Context(hive.getConf());
+SessionState.get().getTxnMgr().openTxn(ctx, USER_NAME);
+
+t = new Table();
+org.apache.hadoop.hive.metastore.api.Table tTable = new 
org.apache.hadoop.hive.metastore.api.Table();
+tTable.setId(Long.MAX_VALUE);
+t.setTTable(tTable);
+Map parameters = new HashMap<>();
+parameters.put(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true");

Review comment:
   HS2 methods have explicit checks to know whether a given table is 
transactional or not. So, validWriteIdList and tableId are only set for 
transactional tables. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: 

[GitHub] [hive] vihangk1 commented on a change in pull request #1355: send tableId to get_partition APIs

2020-08-06 Thread GitBox


vihangk1 commented on a change in pull request #1355:
URL: https://github.com/apache/hive/pull/1355#discussion_r466598803



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -474,6 +476,12 @@ public static Hive get() throws HiveException {
 return get(true);
   }
 
+  public static Hive get(IMetaStoreClient msc) throws HiveException, 
MetaException {

Review comment:
   If this is only used for testing purposes would be good to annotate this 
method with @VisibleForTesting

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreClient.java
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.metadata;

Review comment:
   A more natural place for this class would be in standalone-metastore 
module. I don't see anything in this class it to be dependent on hive-exec.

##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestTxnManager.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.lockmgr;
+
+import org.apache.hadoop.hive.common.FileUtils;
+import org.apache.hadoop.hive.common.ValidTxnWriteIdList;
+import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
+import org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse;
+import org.apache.hadoop.hive.metastore.api.TxnToWriteId;
+import org.apache.hadoop.hive.metastore.api.TxnType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidReadTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.DriverState;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.hooks.ReadEntity;
+import org.apache.hadoop.hive.ql.hooks.WriteEntity;
+import org.apache.hadoop.hive.ql.metadata.DummyPartition;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Partition;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.util.ReflectionUtils;
+
+import java.util.*;
+
+/**
+ * An implementation of {@link HiveTxnManager} that does not support
+ * transactions.
+ * This class is only used in test.
+ */
+class TestTxnManager extends DummyTxnManager {
+  final static Character SEMICOLON = ':';

Review comment:
   nit, the name of the variable and its value seems off. Either change the 
value to ';' or rename the variable to colon.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -5292,6 +5303,16 @@ public synchronized SynchronizedMetaStoreClient 
getSynchronizedMSC() throws Meta
 return syncMetaStoreClient;
   }
 
+/**
+   * @return the metastore client for the current thread
+   * @throws MetaException
+   */
+  @LimitedPrivate(value = {"Hive"})

Review comment:
   I think these annotations are more useful for public APIs. Since 
Hive.java is not a public API you can just use @VisibleForTesting annotation.

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreClientApiArgumentsChecker.java
##
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation 

[GitHub] [hive] vihangk1 commented on a change in pull request #1330: HIVE-23890: Create HMS endpoint for querying file lists using FlatBuf…

2020-08-06 Thread GitBox


vihangk1 commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r466575514



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {

Review comment:
   This naming of this struct could be more generic (may be a better name 
could be GetFileMetadataRequest). Also did you consider reusing 
existing/extending getFileMetadata HMS API?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
   }
 }
 
+@Override
+public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+  String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+  String dbName = req.getDbName();
+  String tblName = req.getTableName();
+  List partitions = req.getPartVals();
+  // Will be used later, when cache is introduced
+  String validWriteIdList = req.getValidWriteIdList();
+
+  startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+  + ", partitions: " + partitions.toString());
+
+
+  GetFileListResponse response = new GetFileListResponse();
+
+  boolean success = false;
+  Exception ex = null;
+  try {
+Partition p =  getMS().getPartition(catName, dbName, tblName, 
partitions);
+Path path = new Path(p.getSd().getLocation());
+
+FileSystem fs = path.getFileSystem(conf);
+RemoteIterator itr = fs.listFiles(path, true);
+while (itr.hasNext()) {
+  FileStatus fStatus = itr.next();
+  Reader reader = OrcFile.createReader(fStatus.getPath(), 
OrcFile.readerOptions(fs.getConf()));

Review comment:
   Does this assume that the request is always for a ORC table?

##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list partVals,
+  6: optional string validWriteIdList
+}
+
+struct GetFileListResponse {

Review comment:
   I think it will be useful to have a separate struct defined for the 
FileMetadata which also includes a type field. For instance, I can see this 
being useful for various engines and the FileMetadata format could be different 
for each engine. For instance, engine1 may require FileStatus, BlockInformation 
while engine2 only is interested in FileStatus, FilemodificationTime, while 
there is another file-metadata type for ACID state which includes some ACID 
specific information like fileformat that you used below.

##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2454,10 +2467,14 @@ PartitionsResponse 
get_partitions_req(1:PartitionsRequest req)
   // partition keys in new_part should be the same as those in old partition.
   void rename_partition(1:string db_name, 2:string tbl_name, 3:list 
part_vals, 4:Partition new_part)
throws (1:InvalidOperationException o1, 2:MetaException 
o2)
-  
+
   RenamePartitionResponse rename_partition_req(1:RenamePartitionRequest req)
throws (1:InvalidOperationException o1, 2:MetaException 
o2)
 
+  // Returns a file list using FlatBuffers as serialization
+  GetFileListResponse get_file_list(1:GetFileListRequest req)
+throws(1:MetaException o1)

Review comment:
   Seems like if the partition doesn't exist you might need to throw a 
NoSuchObjectFoundException as well.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
   }
 }
 
+@Override
+public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+  String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+  String dbName = req.getDbName();
+  String tblName = req.getTableName();
+  List partitions = req.getPartVals();
+  // Will be used later, when cache is introduced
+  String validWriteIdList = req.getValidWriteIdList();
+
+  startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+  + ", partitions: " + partitions.toString());
+
+
+  GetFileListResponse response = new GetFileListResponse();
+
+  boolean success = false;
+  Exception ex = null;
+  try {

[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


aasha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466579718



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -196,12 +203,14 @@ private String checkHiveEntityGuid(AtlasRequestBuilder 
atlasRequestBuilder, Stri
 AtlasObjectId objectId = atlasRequestBuilder.getItemToExport(clusterName, 
srcDb);
 Set> entries = 
objectId.getUniqueAttributes().entrySet();
 if (entries == null || entries.isEmpty()) {
-  throw new SemanticException("Could find entries in objectId for:" + 
clusterName);
+  throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE.format("Could find " 
+
+"entries in objectId for:" + clusterName, "atlas"));

Review comment:
   ok ok. yes will do that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pkumarsinha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


pkumarsinha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466578320



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -86,107 +87,59 @@ private boolean checkIfPathExist(Path sourcePath, 
UserGroupInformation proxyUser
 return proxyUser.doAs((PrivilegedExceptionAction) () -> 
sourcePath.getFileSystem(conf).exists(sourcePath));
   }
 
-  private int handleException(Exception e, Path sourcePath, Path targetPath,
-  int currentRetry, UserGroupInformation 
proxyUser) {
-try {
-  LOG.info("Checking if source path " + sourcePath + " is missing for 
exception ", e);
-  if (!checkIfPathExist(sourcePath, proxyUser)) {
-LOG.info("Source path is missing. Ignoring exception.");
-return 0;
-  }
-} catch (Exception ex) {
-  LOG.warn("Source path missing check failed. ", ex);
-}
-// retry logic only for i/o exception
-if (!(e instanceof IOException)) {
-  LOG.error("Unable to copy {} to {}", sourcePath, targetPath, e);
-  setException(e);
-  return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
-}
-
-if (currentRetry <= MAX_COPY_RETRY) {
-  LOG.warn("Unable to copy {} to {}", sourcePath, targetPath, e);
-} else {
-  LOG.error("Unable to copy {} to {} even after retrying for {} time", 
sourcePath, targetPath, currentRetry, e);
-  setException(e);
-  return ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getErrorCode();
-}
-int sleepTime = FileUtils.getSleepTime(currentRetry);
-LOG.info("Sleep for " + sleepTime + " milliseconds before retry no " + 
(currentRetry));
-try {
-  Thread.sleep(sleepTime);
-} catch (InterruptedException timerEx) {
-  LOG.info("Sleep interrupted", timerEx.getMessage());
-}
-try {
-  if (proxyUser == null) {
-proxyUser = Utils.getUGI();
-  }
-  FileSystem.closeAllForUGI(proxyUser);
-} catch (Exception ex) {
-  LOG.warn("Unable to closeAllForUGI for user " + proxyUser, ex);
-}
-return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
-  }
-
   @Override
   public int execute() {
 String distCpDoAsUser = 
conf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER);
+Retryable retryable = Retryable.builder()
+  .withHiveConf(conf)
+  .withRetryOnException(IOException.class).build();
+try {
+  return retryable.executeCallable(() -> {
+UserGroupInformation proxyUser = null;
+Path sourcePath = work.getFullyQualifiedSourcePath();
+Path targetPath = work.getFullyQualifiedTargetPath();
+try {
+  if 
(conf.getBoolVar(HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE)) {
+sourcePath = 
reservedRawPath(work.getFullyQualifiedSourcePath().toUri());
+targetPath = 
reservedRawPath(work.getFullyQualifiedTargetPath().toUri());
+  }
+  UserGroupInformation ugi = Utils.getUGI();
+  String currentUser = ugi.getShortUserName();
+  if (distCpDoAsUser != null && !currentUser.equals(distCpDoAsUser)) {
+proxyUser = UserGroupInformation.createProxyUser(
+  distCpDoAsUser, UserGroupInformation.getLoginUser());
+  }
 
-Path sourcePath = work.getFullyQualifiedSourcePath();
-Path targetPath = work.getFullyQualifiedTargetPath();
-if (conf.getBoolVar(HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE)) {
-  sourcePath = reservedRawPath(work.getFullyQualifiedSourcePath().toUri());
-  targetPath = reservedRawPath(work.getFullyQualifiedTargetPath().toUri());
-}
-int currentRetry = 0;
-int error = 0;
-UserGroupInformation proxyUser = null;
-while (currentRetry <= MAX_COPY_RETRY) {
-  try {
-UserGroupInformation ugi = Utils.getUGI();
-String currentUser = ugi.getShortUserName();
-if (distCpDoAsUser != null && !currentUser.equals(distCpDoAsUser)) {
-  proxyUser = UserGroupInformation.createProxyUser(
-  distCpDoAsUser, UserGroupInformation.getLoginUser());
-}
-
-setTargetPathOwner(targetPath, sourcePath, proxyUser);
-
-// do we create a new conf and only here provide this additional 
option so that we get away from
-// differences of data in two location for the same directories ?
-// basically add distcp.options.delete to hiveconf new object ?
-FileUtils.distCp(
-sourcePath.getFileSystem(conf), // source file system
-Collections.singletonList(sourcePath),  // list of source paths
-targetPath,
-false,
-proxyUser,
-conf,
-ShimLoader.getHadoopShims());
-return 0;
-  } catch (Exception e) {
-currentRetry++;
-error = handleException(e, sourcePath, targetPath, currentRetry, 
proxyUser);
-if (error == 0) {
-  

[GitHub] [hive] pkumarsinha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


pkumarsinha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466576634



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -196,12 +203,14 @@ private String checkHiveEntityGuid(AtlasRequestBuilder 
atlasRequestBuilder, Stri
 AtlasObjectId objectId = atlasRequestBuilder.getItemToExport(clusterName, 
srcDb);
 Set> entries = 
objectId.getUniqueAttributes().entrySet();
 if (entries == null || entries.isEmpty()) {
-  throw new SemanticException("Could find entries in objectId for:" + 
clusterName);
+  throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE.format("Could find " 
+
+"entries in objectId for:" + clusterName, "atlas"));

Review comment:
   was referring to "atlas" part here and in other places like "ranger", 
"hive" . Should we have one const defined per service and use that in stead?, 
something like:
   final String ReplUtils.ATLAS_SVC = "atlas";
   ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE.format("Could find " +
   "entries in objectId for:" + clusterName, ReplUtils.ATLAS_SVC);





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #1357: HIVE-23963: UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule

2020-08-06 Thread GitBox


jcamachor commented on a change in pull request #1357:
URL: https://github.com/apache/hive/pull/1357#discussion_r466575229



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelDistribution.java
##
@@ -81,8 +84,16 @@ public RelDistribution apply(TargetMapping mapping) {
   return this;
 }
 List newKeys = new ArrayList<>(keys.size());
+
+// Instead of using a HashMap for lookup 
newKeys.add(mapping.getTargetOpt(key)); should be called but not all the
+// mapping supports that. See HIVE-23963. Replace this when this is fixed 
in calcite.

Review comment:
   @kasakrisz , please create the Calcite JIRA so it is easier to track. 
Thanks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


aasha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466574105



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -132,31 +130,40 @@ private AtlasReplInfo createAtlasReplInfo() throws 
SemanticException, MalformedU
 
   private long lastStoredTimeStamp() throws SemanticException {
 Path prevMetadataPath = new Path(work.getPrevAtlasDumpDir(), 
EximUtil.METADATA_NAME);
-BufferedReader br = null;
+Retryable retryable = Retryable.builder()
+  .withHiveConf(conf)
+  .withRetryOnException(IOException.class)
+  .withFailOnException(FileNotFoundException.class).build();
 try {
-  FileSystem fs = prevMetadataPath.getFileSystem(conf);
-  br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), 
Charset.defaultCharset()));
-  String line = br.readLine();
-  if (line == null) {
-throw new SemanticException("Could not read lastStoredTimeStamp from 
atlas metadata file");
-  }
-  String[] lineContents = line.split("\t", 5);
-  return Long.parseLong(lineContents[1]);
-} catch (Exception ex) {
-  throw new SemanticException(ex);
-} finally {
-  if (br != null) {
+  return retryable.executeCallable(() -> {
+BufferedReader br = null;
 try {
-  br.close();
-} catch (IOException e) {
-  throw new SemanticException(e);
+  FileSystem fs = prevMetadataPath.getFileSystem(conf);
+  br = new BufferedReader(new 
InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset()));
+  String line = br.readLine();
+  if (line == null) {
+throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE

Review comment:
   Named it as REPL_INVALID_INTERNAL_CONFIG_FOR_SERVICE





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pkumarsinha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


pkumarsinha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466574279



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -132,31 +130,40 @@ private AtlasReplInfo createAtlasReplInfo() throws 
SemanticException, MalformedU
 
   private long lastStoredTimeStamp() throws SemanticException {
 Path prevMetadataPath = new Path(work.getPrevAtlasDumpDir(), 
EximUtil.METADATA_NAME);
-BufferedReader br = null;
+Retryable retryable = Retryable.builder()
+  .withHiveConf(conf)
+  .withRetryOnException(IOException.class)
+  .withFailOnException(FileNotFoundException.class).build();
 try {
-  FileSystem fs = prevMetadataPath.getFileSystem(conf);
-  br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), 
Charset.defaultCharset()));
-  String line = br.readLine();
-  if (line == null) {
-throw new SemanticException("Could not read lastStoredTimeStamp from 
atlas metadata file");
-  }
-  String[] lineContents = line.split("\t", 5);
-  return Long.parseLong(lineContents[1]);
-} catch (Exception ex) {
-  throw new SemanticException(ex);
-} finally {
-  if (br != null) {
+  return retryable.executeCallable(() -> {
+BufferedReader br = null;
 try {
-  br.close();
-} catch (IOException e) {
-  throw new SemanticException(e);
+  FileSystem fs = prevMetadataPath.getFileSystem(conf);
+  br = new BufferedReader(new 
InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset()));
+  String line = br.readLine();
+  if (line == null) {
+throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE
+  .format("Could not read lastStoredTimeStamp from atlas metadata 
file", "atlas"));
+  }
+  String[] lineContents = line.split("\t", 5);
+  return Long.parseLong(lineContents[1]);
+} finally {
+  if (br != null) {
+try {
+  br.close();
+} catch (IOException e) {
+  //Do nothing
+}
+  }
 }
-  }
+  });
+} catch (Exception e) {
+  throw new 
SemanticException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);

Review comment:
   yes, here we are catching as that, I was referring to line 144:if (line 
== null) {
   throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE
 .format("Could not read lastStoredTimeStamp from atlas 
metadata file", "atlas"));
 }
   where we are throwing SemanticException already which we catch here, can't 
we just throw the same e in that case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 commented on a change in pull request #1315:
URL: https://github.com/apache/hive/pull/1315#discussion_r466570882



##
File path: ql/src/test/results/clientpositive/llap/prepare_plan.q.out
##
@@ -0,0 +1,1575 @@
+PREHOOK: query: explain extended prepare pcount from select count(*) from src 
where key > ?
+PREHOOK: type: QUERY
+PREHOOK: Input: default@src
+ A masked pattern was here 
+POSTHOOK: query: explain extended prepare pcount from select count(*) from src 
where key > ?
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@src
+ A masked pattern was here 
+OPTIMIZED SQL: SELECT COUNT(*) AS `$f0`
+FROM `default`.`src`
+WHERE `key` > CAST(? AS STRING)
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+Tez
+ A masked pattern was here 
+  Edges:
+Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+ A masked pattern was here 
+  Vertices:
+Map 1 
+Map Operator Tree:
+TableScan
+  alias: src
+  filterExpr: (key > CAST( Dynamic Parameter  index: 1 AS 
STRING)) (type: boolean)
+  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
+  GatherStats: false
+  Filter Operator
+isSamplingPred: false
+predicate: (key > CAST( Dynamic Parameter  index: 1 AS 
STRING)) (type: boolean)
+Statistics: Num rows: 166 Data size: 14442 Basic stats: 
COMPLETE Column stats: COMPLETE
+Select Operator
+  Statistics: Num rows: 166 Data size: 14442 Basic stats: 
COMPLETE Column stats: COMPLETE
+  Group By Operator
+aggregations: count()
+minReductionHashAggr: 0.99
+mode: hash
+outputColumnNames: _col0
+Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
+Reduce Output Operator
+  bucketingVersion: 2
+  null sort order: 
+  numBuckets: -1
+  sort order: 
+  Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
+  tag: -1
+  value expressions: _col0 (type: bigint)
+  auto parallelism: false
+Execution mode: llap
+LLAP IO: no inputs
+Path -> Alias:
+ A masked pattern was here 
+Path -> Partition:
+ A masked pattern was here 
+Partition
+  base file name: src
+  input format: org.apache.hadoop.mapred.TextInputFormat
+  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+  properties:
+bucket_count -1
+bucketing_version 2
+column.name.delimiter ,
+columns key,value
+columns.types string:string
+ A masked pattern was here 
+name default.src
+serialization.format 1
+serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+input format: org.apache.hadoop.mapred.TextInputFormat
+output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+properties:
+  bucketing_version 2
+  column.name.delimiter ,
+  columns key,value
+  columns.comments 'default','default'
+  columns.types string:string
+ A masked pattern was here 
+  name default.src
+  serialization.format 1
+  serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+name: default.src
+  name: default.src
+Truncated Path -> Alias:
+  /src [src]
+Reducer 2 
+Execution mode: llap
+Needs Tagging: false
+Reduce Operator Tree:
+  Group By Operator
+aggregations: count(VALUE._col0)
+mode: mergepartial
+outputColumnNames: _col0
+Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+File Output Operator
+  bucketingVersion: 2
+  compressed: false
+  GlobalTableId: 0

[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 commented on a change in pull request #1315:
URL: https://github.com/apache/hive/pull/1315#discussion_r466568964



##
File path: ql/src/test/results/clientpositive/llap/prepare_plan.q.out
##
@@ -0,0 +1,1575 @@
+PREHOOK: query: explain extended prepare pcount from select count(*) from src 
where key > ?
+PREHOOK: type: QUERY
+PREHOOK: Input: default@src
+ A masked pattern was here 
+POSTHOOK: query: explain extended prepare pcount from select count(*) from src 
where key > ?
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@src
+ A masked pattern was here 
+OPTIMIZED SQL: SELECT COUNT(*) AS `$f0`
+FROM `default`.`src`
+WHERE `key` > CAST(? AS STRING)
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+Tez
+ A masked pattern was here 
+  Edges:
+Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+ A masked pattern was here 
+  Vertices:
+Map 1 
+Map Operator Tree:
+TableScan
+  alias: src
+  filterExpr: (key > CAST( Dynamic Parameter  index: 1 AS 
STRING)) (type: boolean)
+  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
+  GatherStats: false
+  Filter Operator
+isSamplingPred: false
+predicate: (key > CAST( Dynamic Parameter  index: 1 AS 
STRING)) (type: boolean)
+Statistics: Num rows: 166 Data size: 14442 Basic stats: 
COMPLETE Column stats: COMPLETE
+Select Operator
+  Statistics: Num rows: 166 Data size: 14442 Basic stats: 
COMPLETE Column stats: COMPLETE
+  Group By Operator
+aggregations: count()
+minReductionHashAggr: 0.99
+mode: hash
+outputColumnNames: _col0
+Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
+Reduce Output Operator
+  bucketingVersion: 2
+  null sort order: 
+  numBuckets: -1
+  sort order: 
+  Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
+  tag: -1
+  value expressions: _col0 (type: bigint)
+  auto parallelism: false
+Execution mode: llap
+LLAP IO: no inputs
+Path -> Alias:
+ A masked pattern was here 
+Path -> Partition:
+ A masked pattern was here 
+Partition
+  base file name: src
+  input format: org.apache.hadoop.mapred.TextInputFormat
+  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+  properties:
+bucket_count -1
+bucketing_version 2
+column.name.delimiter ,
+columns key,value
+columns.types string:string
+ A masked pattern was here 
+name default.src
+serialization.format 1
+serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+input format: org.apache.hadoop.mapred.TextInputFormat
+output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+properties:
+  bucketing_version 2
+  column.name.delimiter ,
+  columns key,value
+  columns.comments 'default','default'
+  columns.types string:string
+ A masked pattern was here 
+  name default.src
+  serialization.format 1
+  serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+name: default.src
+  name: default.src
+Truncated Path -> Alias:
+  /src [src]
+Reducer 2 
+Execution mode: llap
+Needs Tagging: false
+Reduce Operator Tree:
+  Group By Operator
+aggregations: count(VALUE._col0)
+mode: mergepartial
+outputColumnNames: _col0
+Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+File Output Operator
+  bucketingVersion: 2
+  compressed: false
+  GlobalTableId: 0

[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


aasha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466563797



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -196,12 +203,14 @@ private String checkHiveEntityGuid(AtlasRequestBuilder 
atlasRequestBuilder, Stri
 AtlasObjectId objectId = atlasRequestBuilder.getItemToExport(clusterName, 
srcDb);
 Set> entries = 
objectId.getUniqueAttributes().entrySet();
 if (entries == null || entries.isEmpty()) {
-  throw new SemanticException("Could find entries in objectId for:" + 
clusterName);
+  throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE.format("Could find " 
+
+"entries in objectId for:" + clusterName, "atlas"));

Review comment:
   Format will replace the config name and service name and helps us to 
reuse the same error code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 commented on a change in pull request #1315:
URL: https://github.com/apache/hive/pull/1315#discussion_r466563430



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/drop/ExecuteStatementAnalyzer.java
##
@@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.drop;
+
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.DDLType;
+import org.apache.hadoop.hive.ql.exec.ExplainTask;
+import org.apache.hadoop.hive.ql.exec.FetchTask;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.SerializationUtilities;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.exec.tez.TezTask;
+import org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.type.ExprNodeDescExprFactory;
+import org.apache.hadoop.hive.ql.plan.BaseWork;
+import org.apache.hadoop.hive.ql.plan.ExprDynamicParamDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.hadoop.hive.serde2.typeinfo.CharTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.hadoop.hive.serde2.typeinfo.VarcharTypeInfo;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Analyzer for Execute statement.
+ * This analyzer
+ *  retreives cached {@link BaseSemanticAnalyzer},
+ *  makes copy of all tasks by serializing/deserializing it,
+ *  bind dynamic parameters inside cached {@link BaseSemanticAnalyzer} using 
values provided
+ */
+@DDLType(types = HiveParser.TOK_EXECUTE)
+public class ExecuteStatementAnalyzer extends BaseSemanticAnalyzer {
+
+  public ExecuteStatementAnalyzer(QueryState queryState) throws 
SemanticException {
+super(queryState);
+  }
+
+  /**
+   * This class encapsulate all {@link Task} required to be copied.
+   * This is required because {@link FetchTask} list of {@link Task} may hold 
reference to same
+   * objects (e.g. list of result files) and are required to be 
serialized/de-serialized together.
+   */
+  private class PlanCopy {
+FetchTask fetchTask;
+List> tasks;
+
+PlanCopy(FetchTask fetchTask, List> tasks) {
+  this.fetchTask = fetchTask;
+  this.tasks = tasks;
+}
+
+FetchTask getFetchTask() {
+  return fetchTask;
+}
+
+List> getTasks()  {
+  return tasks;
+}
+  }
+
+  private String getQueryName(ASTNode root) {
+ASTNode queryNameAST = (ASTNode)(root.getChild(1));
+return queryNameAST.getText();
+  }
+
+  /**
+   * Utility method to create copy of provided object using kyro 
serialization/de-serialization.
+   */
+  private  T makeCopy(final Object task, Class objClass) {
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+SerializationUtilities.serializePlan(task, baos);
+
+return SerializationUtilities.deserializePlan(
+new ByteArrayInputStream(baos.toByteArray()), objClass);
+  }
+
+  /**
+   * Given a {@link BaseSemanticAnalyzer} (cached) this method make copies of 
all tasks
+   * (including {@link FetchTask}) and update the existing {@link 
ExecuteStatementAnalyzer}
+   */
+  private void createTaskCopy(final BaseSemanticAnalyzer cachedPlan) {

Review comment:
   Follow-up: https://issues.apache.org/jira/browse/HIVE-24005





[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 commented on a change in pull request #1315:
URL: https://github.com/apache/hive/pull/1315#discussion_r466561873



##
File path: 
parser/src/java/org/apache/hadoop/hive/ql/parse/PrepareStatementParser.g
##
@@ -0,0 +1,66 @@
+/**
+   Licensed to the Apache Software Foundation (ASF) under one or more 
+   contributor license agreements.  See the NOTICE file distributed with 
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with 
+   the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+*/
+parser grammar PrepareStatementParser;
+
+options
+{
+output=AST;
+ASTLabelType=ASTNode;
+backtrack=false;
+k=3;
+}
+
+@members {
+  @Override
+  public Object recoverFromMismatchedSet(IntStream input,
+  RecognitionException re, BitSet follow) throws RecognitionException {
+throw re;
+  }
+  @Override
+  public void displayRecognitionError(String[] tokenNames,
+  RecognitionException e) {
+gParent.errors.add(new ParseError(gParent, e, tokenNames));
+  }
+}
+
+@rulecatch {
+catch (RecognitionException e) {
+  throw e;
+}
+}
+
+//--- Rules for parsing Prepare 
statement-
+prepareStatement
+@init { gParent.pushMsg("prepare statement ", state); }
+@after { gParent.popMsg(state); }
+: KW_PREPARE identifier KW_FROM queryStatementExpression
+-> ^(TOK_PREPARE queryStatementExpression identifier)
+;
+
+executeStatement
+@init { gParent.pushMsg("execute statement ", state); }
+@after { gParent.popMsg(state); }
+: KW_EXECUTE identifier KW_USING executeParamList
+-> ^(TOK_EXECUTE executeParamList identifier)
+;
+
+executeParamList
+@init { gParent.pushMsg("execute param list", state); }
+@after { gParent.popMsg(state); }
+: constant (COMMA constant)*

Review comment:
   https://issues.apache.org/jira/browse/HIVE-24002





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] ashish-kumar-sharma opened a new pull request #1370: HIVE-23887: Reset Columns stats in Export Statement

2020-08-06 Thread GitBox


ashish-kumar-sharma opened a new pull request #1370:
URL: https://github.com/apache/hive/pull/1370


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


aasha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466546396



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -86,107 +87,59 @@ private boolean checkIfPathExist(Path sourcePath, 
UserGroupInformation proxyUser
 return proxyUser.doAs((PrivilegedExceptionAction) () -> 
sourcePath.getFileSystem(conf).exists(sourcePath));
   }
 
-  private int handleException(Exception e, Path sourcePath, Path targetPath,
-  int currentRetry, UserGroupInformation 
proxyUser) {
-try {
-  LOG.info("Checking if source path " + sourcePath + " is missing for 
exception ", e);
-  if (!checkIfPathExist(sourcePath, proxyUser)) {
-LOG.info("Source path is missing. Ignoring exception.");
-return 0;
-  }
-} catch (Exception ex) {
-  LOG.warn("Source path missing check failed. ", ex);
-}
-// retry logic only for i/o exception
-if (!(e instanceof IOException)) {
-  LOG.error("Unable to copy {} to {}", sourcePath, targetPath, e);
-  setException(e);
-  return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
-}
-
-if (currentRetry <= MAX_COPY_RETRY) {
-  LOG.warn("Unable to copy {} to {}", sourcePath, targetPath, e);
-} else {
-  LOG.error("Unable to copy {} to {} even after retrying for {} time", 
sourcePath, targetPath, currentRetry, e);
-  setException(e);
-  return ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getErrorCode();
-}
-int sleepTime = FileUtils.getSleepTime(currentRetry);
-LOG.info("Sleep for " + sleepTime + " milliseconds before retry no " + 
(currentRetry));
-try {
-  Thread.sleep(sleepTime);
-} catch (InterruptedException timerEx) {
-  LOG.info("Sleep interrupted", timerEx.getMessage());
-}
-try {
-  if (proxyUser == null) {
-proxyUser = Utils.getUGI();
-  }
-  FileSystem.closeAllForUGI(proxyUser);
-} catch (Exception ex) {
-  LOG.warn("Unable to closeAllForUGI for user " + proxyUser, ex);
-}
-return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
-  }
-
   @Override
   public int execute() {
 String distCpDoAsUser = 
conf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER);
+Retryable retryable = Retryable.builder()
+  .withHiveConf(conf)
+  .withRetryOnException(IOException.class).build();
+try {
+  return retryable.executeCallable(() -> {
+UserGroupInformation proxyUser = null;
+Path sourcePath = work.getFullyQualifiedSourcePath();
+Path targetPath = work.getFullyQualifiedTargetPath();
+try {
+  if 
(conf.getBoolVar(HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE)) {
+sourcePath = 
reservedRawPath(work.getFullyQualifiedSourcePath().toUri());
+targetPath = 
reservedRawPath(work.getFullyQualifiedTargetPath().toUri());
+  }
+  UserGroupInformation ugi = Utils.getUGI();
+  String currentUser = ugi.getShortUserName();
+  if (distCpDoAsUser != null && !currentUser.equals(distCpDoAsUser)) {
+proxyUser = UserGroupInformation.createProxyUser(
+  distCpDoAsUser, UserGroupInformation.getLoginUser());
+  }
 
-Path sourcePath = work.getFullyQualifiedSourcePath();
-Path targetPath = work.getFullyQualifiedTargetPath();
-if (conf.getBoolVar(HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE)) {
-  sourcePath = reservedRawPath(work.getFullyQualifiedSourcePath().toUri());
-  targetPath = reservedRawPath(work.getFullyQualifiedTargetPath().toUri());
-}
-int currentRetry = 0;
-int error = 0;
-UserGroupInformation proxyUser = null;
-while (currentRetry <= MAX_COPY_RETRY) {
-  try {
-UserGroupInformation ugi = Utils.getUGI();
-String currentUser = ugi.getShortUserName();
-if (distCpDoAsUser != null && !currentUser.equals(distCpDoAsUser)) {
-  proxyUser = UserGroupInformation.createProxyUser(
-  distCpDoAsUser, UserGroupInformation.getLoginUser());
-}
-
-setTargetPathOwner(targetPath, sourcePath, proxyUser);
-
-// do we create a new conf and only here provide this additional 
option so that we get away from
-// differences of data in two location for the same directories ?
-// basically add distcp.options.delete to hiveconf new object ?
-FileUtils.distCp(
-sourcePath.getFileSystem(conf), // source file system
-Collections.singletonList(sourcePath),  // list of source paths
-targetPath,
-false,
-proxyUser,
-conf,
-ShimLoader.getHadoopShims());
-return 0;
-  } catch (Exception e) {
-currentRetry++;
-error = handleException(e, sourcePath, targetPath, currentRetry, 
proxyUser);
-if (error == 0) {
-  

[GitHub] [hive] aasha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


aasha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466544742



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -132,31 +130,40 @@ private AtlasReplInfo createAtlasReplInfo() throws 
SemanticException, MalformedU
 
   private long lastStoredTimeStamp() throws SemanticException {
 Path prevMetadataPath = new Path(work.getPrevAtlasDumpDir(), 
EximUtil.METADATA_NAME);
-BufferedReader br = null;
+Retryable retryable = Retryable.builder()
+  .withHiveConf(conf)
+  .withRetryOnException(IOException.class)
+  .withFailOnException(FileNotFoundException.class).build();
 try {
-  FileSystem fs = prevMetadataPath.getFileSystem(conf);
-  br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), 
Charset.defaultCharset()));
-  String line = br.readLine();
-  if (line == null) {
-throw new SemanticException("Could not read lastStoredTimeStamp from 
atlas metadata file");
-  }
-  String[] lineContents = line.split("\t", 5);
-  return Long.parseLong(lineContents[1]);
-} catch (Exception ex) {
-  throw new SemanticException(ex);
-} finally {
-  if (br != null) {
+  return retryable.executeCallable(() -> {
+BufferedReader br = null;
 try {
-  br.close();
-} catch (IOException e) {
-  throw new SemanticException(e);
+  FileSystem fs = prevMetadataPath.getFileSystem(conf);
+  br = new BufferedReader(new 
InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset()));
+  String line = br.readLine();
+  if (line == null) {
+throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE
+  .format("Could not read lastStoredTimeStamp from atlas metadata 
file", "atlas"));
+  }
+  String[] lineContents = line.split("\t", 5);
+  return Long.parseLong(lineContents[1]);
+} finally {
+  if (br != null) {
+try {
+  br.close();
+} catch (IOException e) {
+  //Do nothing
+}
+  }
 }
-  }
+  });
+} catch (Exception e) {
+  throw new 
SemanticException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);

Review comment:
   Yes but this is of type exception





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 commented on a change in pull request #1315:
URL: https://github.com/apache/hive/pull/1315#discussion_r466534477



##
File path: ql/src/test/queries/clientpositive/prepare_plan.q
##
@@ -0,0 +1,113 @@
+--! qt:dataset:src
+--! qt:dataset:alltypesorc
+
+set hive.explain.user=false;
+set hive.vectorized.execution.enabled=false;
+
+explain extended prepare pcount from select count(*) from src where key > ?;
+prepare pcount from select count(*) from src where key > ?;
+execute pcount using 200;
+
+-- single param
+explain extended prepare p1 from select * from src where key > ? order by key 
limit 10;
+prepare p1 from select * from src where key > ? order by key limit 10;
+
+execute p1 using 200;
+
+-- same query, different param
+execute p1 using 0;
+
+-- same query, negative param
+--TODO: fails (constant in grammar do not support negatives)
+-- execute p1 using -1;

Review comment:
   https://issues.apache.org/jira/browse/HIVE-24002





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 commented on a change in pull request #1315:
URL: https://github.com/apache/hive/pull/1315#discussion_r466533583



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
##
@@ -1619,6 +1620,12 @@ public static ColStatistics 
getColStatisticsFromExpression(HiveConf conf, Statis
   colName = enfd.getFieldName();
   colType = enfd.getTypeString();
   countDistincts = numRows;
+} else if (end instanceof ExprDynamicParamDesc) {
+  //skip collecting stats for parameters

Review comment:
   https://issues.apache.org/jira/browse/HIVE-24003





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 commented on a change in pull request #1315:
URL: https://github.com/apache/hive/pull/1315#discussion_r466533366



##
File path: ql/src/test/results/clientpositive/llap/udf_greatest.q.out
##
@@ -63,7 +63,7 @@ STAGE PLANS:
   alias: src
   Row Limit Per Split: 1
   Select Operator
-expressions: 'c' (type: string), 'a' (type: string), 'AaA' (type: 
string), 'AAA' (type: string), '13' (type: string), '2' (type: string), '03' 
(type: string), '1' (type: string), null (type: double), null (type: double), 
null (type: double), null (type: double), null (type: double), null (type: 
double)

Review comment:
   There is a small change in the patch which updates the type inference 
rule for void/null. Prior to the change the expressions were being inferred as 
`Double` in this case. With the change it is appropriately inferred as `String` 
(since rest of the expressions within this UDF (`GREATEST('a', 'b', null )`) is 
interpreted as string.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] vineetgarg02 commented on a change in pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-08-06 Thread GitBox


vineetgarg02 commented on a change in pull request #1315:
URL: https://github.com/apache/hive/pull/1315#discussion_r466531974



##
File path: ql/src/test/results/clientpositive/llap/prepare_plan.q.out
##
@@ -0,0 +1,2512 @@
+PREHOOK: query: explain extended prepare pcount from select count(*) from src 
where key > ?
+PREHOOK: type: QUERY
+PREHOOK: Input: default@src
+ A masked pattern was here 
+POSTHOOK: query: explain extended prepare pcount from select count(*) from src 
where key > ?
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@src
+ A masked pattern was here 
+OPTIMIZED SQL: SELECT COUNT(*) AS `$f0`
+FROM `default`.`src`
+WHERE `key` > CAST(? AS STRING)
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+Tez
+ A masked pattern was here 
+  Edges:
+Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+ A masked pattern was here 
+  Vertices:
+Map 1 
+Map Operator Tree:
+TableScan
+  alias: src
+  filterExpr: (key > CAST( $1 AS STRING)) (type: boolean)
+  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
+  GatherStats: false
+  Filter Operator
+isSamplingPred: false
+predicate: (key > CAST( $1 AS STRING)) (type: boolean)
+Statistics: Num rows: 166 Data size: 14442 Basic stats: 
COMPLETE Column stats: COMPLETE
+Select Operator
+  Statistics: Num rows: 166 Data size: 14442 Basic stats: 
COMPLETE Column stats: COMPLETE
+  Group By Operator
+aggregations: count()
+minReductionHashAggr: 0.99
+mode: hash
+outputColumnNames: _col0
+Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
+Reduce Output Operator
+  bucketingVersion: 2
+  null sort order: 
+  numBuckets: -1
+  sort order: 
+  Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
+  tag: -1
+  value expressions: _col0 (type: bigint)
+  auto parallelism: false
+Execution mode: llap
+LLAP IO: all inputs
+Path -> Alias:
+ A masked pattern was here 
+Path -> Partition:
+ A masked pattern was here 
+Partition
+  base file name: src
+  input format: org.apache.hadoop.mapred.TextInputFormat
+  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+  properties:
+bucket_count -1
+bucketing_version 2
+column.name.delimiter ,
+columns key,value
+columns.types string:string
+ A masked pattern was here 
+name default.src
+serialization.format 1
+serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+input format: org.apache.hadoop.mapred.TextInputFormat
+output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+properties:
+  bucketing_version 2
+  column.name.delimiter ,
+  columns key,value
+  columns.comments 'default','default'
+  columns.types string:string
+ A masked pattern was here 
+  name default.src
+  serialization.format 1
+  serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+name: default.src
+  name: default.src
+Truncated Path -> Alias:
+  /src [src]
+Reducer 2 
+Execution mode: llap
+Needs Tagging: false
+Reduce Operator Tree:
+  Group By Operator
+aggregations: count(VALUE._col0)
+mode: mergepartial
+outputColumnNames: _col0
+Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+File Output Operator
+  bucketingVersion: 2
+  compressed: false
+  GlobalTableId: 0
+ A masked pattern was here 
+ 

[GitHub] [hive] viirya commented on pull request #1365: HIVE-23998: Upgrade guava to 27 for Hive 2.3 branch

2020-08-06 Thread GitBox


viirya commented on pull request #1365:
URL: https://github.com/apache/hive/pull/1365#issuecomment-670017177


   @sunchao Ok, I see. I will do. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ opened a new pull request #1369: HIVE-24000: Put exclusive MERGE INSERT under the feature flag

2020-08-06 Thread GitBox


deniskuzZ opened a new pull request #1369:
URL: https://github.com/apache/hive/pull/1369


   (cherry picked from commit 0e4b02af485cb1972ebc4f251d853c710e70164f)
   
   
   
   ### What changes were proposed in this pull request?
   
   Pushed exclusive MERGE INSERT under the feature flag
   
   ### Why are the changes needed?
   
   Backward compatibility
   
   ### Does this PR introduce _any_ user-facing change?
   
   new feature flag property was introduced 'hive.txn.xlock.mergeinsert'
   
   ### How was this patch tested?
   
   TestDbTxnManager2



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ closed pull request #1366: HIVE-24000: Put exclusive MERGE INSERT under the feature flag

2020-08-06 Thread GitBox


deniskuzZ closed pull request #1366:
URL: https://github.com/apache/hive/pull/1366


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] klcopp opened a new pull request #1368: HIVE-24001: Don't cache MapWork in tez/ObjectCache during query-based compaction

2020-08-06 Thread GitBox


klcopp opened a new pull request #1368:
URL: https://github.com/apache/hive/pull/1368


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pkumarsinha commented on a change in pull request #1358: HIVE-23955 : Classification of Error Codes in Replication

2020-08-06 Thread GitBox


pkumarsinha commented on a change in pull request #1358:
URL: https://github.com/apache/hive/pull/1358#discussion_r466420110



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -132,31 +130,40 @@ private AtlasReplInfo createAtlasReplInfo() throws 
SemanticException, MalformedU
 
   private long lastStoredTimeStamp() throws SemanticException {
 Path prevMetadataPath = new Path(work.getPrevAtlasDumpDir(), 
EximUtil.METADATA_NAME);
-BufferedReader br = null;
+Retryable retryable = Retryable.builder()
+  .withHiveConf(conf)
+  .withRetryOnException(IOException.class)
+  .withFailOnException(FileNotFoundException.class).build();
 try {
-  FileSystem fs = prevMetadataPath.getFileSystem(conf);
-  br = new BufferedReader(new InputStreamReader(fs.open(prevMetadataPath), 
Charset.defaultCharset()));
-  String line = br.readLine();
-  if (line == null) {
-throw new SemanticException("Could not read lastStoredTimeStamp from 
atlas metadata file");
-  }
-  String[] lineContents = line.split("\t", 5);
-  return Long.parseLong(lineContents[1]);
-} catch (Exception ex) {
-  throw new SemanticException(ex);
-} finally {
-  if (br != null) {
+  return retryable.executeCallable(() -> {
+BufferedReader br = null;
 try {
-  br.close();
-} catch (IOException e) {
-  throw new SemanticException(e);
+  FileSystem fs = prevMetadataPath.getFileSystem(conf);
+  br = new BufferedReader(new 
InputStreamReader(fs.open(prevMetadataPath), Charset.defaultCharset()));
+  String line = br.readLine();
+  if (line == null) {
+throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE

Review comment:
   lastStoredTimeStamp is maintained by hive itself. Should we have better 
error message category for this? 

##
File path: common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
##
@@ -505,18 +505,9 @@
   " queue: {1}. Please fix and try again.", true),
   SPARK_RUNTIME_OOM(20015, "Spark job failed because of out of memory."),
 
-  //if the error message is changed for REPL_EVENTS_MISSING_IN_METASTORE, then 
need modification in getNextNotification
-  //method in HiveMetaStoreClient
-  REPL_EVENTS_MISSING_IN_METASTORE(20016, "Notification events are missing in 
the meta store."),
-  REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID(20017, "Load path {0} not valid as target 
database is bootstrapped " +
-  "from some other path : {1}."),
-  REPL_FILE_MISSING_FROM_SRC_AND_CM_PATH(20018, "File is missing from both 
source and cm path."),
-  REPL_LOAD_PATH_NOT_FOUND(20019, "Load path does not exist."),
-  REPL_DATABASE_IS_NOT_SOURCE_OF_REPLICATION(20020,
-  "Source of replication (repl.source.for) is not set in the database 
properties."),
-  REPL_INVALID_DB_OR_TABLE_PATTERN(20021,
-  "Invalid pattern for the DB or table name in the replication policy. 
"
-  + "It should be a valid regex enclosed within single or 
double quotes."),
+  REPL_FILE_MISSING_FROM_SRC_AND_CM_PATH(20016, "File is missing from both 
source and cm path."),
+  REPL_EXTERNAL_SERVICE_CONNECTION_ERROR(20017, "Failed to connect to {0} 
service. Error code {1}.",
+true),

Review comment:
   nit: Can accommodate in same line.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -42,11 +43,7 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import java.io.BufferedReader;
-import java.io.IOException;
-import java.io.InputStream;
-import java.io.InputStreamReader;
-import java.io.Serializable;
+import java.io.*;

Review comment:
   Should we revert this?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/atlas/AtlasRestClientImpl.java
##
@@ -125,17 +127,15 @@ private AtlasImportResult 
getDefaultAtlasImportResult(AtlasImportRequest request
 return new AtlasImportResult(request, "", "", "", 0L);
   }
 
-  public AtlasServer getServer(String endpoint) throws SemanticException {
+  public AtlasServer getServer(String endpoint, HiveConf conf) throws 
SemanticException {
+Retryable retryable = Retryable.builder()
+  .withHiveConf(conf)
+  .withRetryOnException(Exception.class).build();

Review comment:
   Should we not retry on just AtlasServiceException and catch finally only 
that exception as that's what getServer says to throw?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -43,15 +44,15 @@
  */
 public class DirCopyTask extends Task implements Serializable {
   private static final Logger LOG = LoggerFactory.getLogger(DirCopyTask.class);
-  private static final int MAX_COPY_RETRY = 5;
 
   private boolean createAndSetPathOwner(Path destPath, Path sourcePath) throws 
IOException {
 FileSystem targetFs = 

[GitHub] [hive] sunchao commented on pull request #1365: HIVE-23998: Upgrade guava to 27 for Hive 2.3 branch

2020-08-06 Thread GitBox


sunchao commented on pull request #1365:
URL: https://github.com/apache/hive/pull/1365#issuecomment-669959284


   @viirya It may be because the PR action only support master and branch-2 
right now, not branch-2.3. I suggest you submit a patch to the JIRA ticket 
similar to https://issues.apache.org/jira/browse/HIVE-22249. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] aasha opened a new pull request #1367: HIVE-23993 Handle irrecoverable errors

2020-08-06 Thread GitBox


aasha opened a new pull request #1367:
URL: https://github.com/apache/hive/pull/1367


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466421484



##
File path: 
ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out
##
@@ -79,27 +79,25 @@ STAGE PLANS:
 Tez
  A masked pattern was here 
   Edges:

Review comment:
   The plan in `dynamic_semijoin_reduction_2.q` has three single column 
semijoin reducers that get merged to one multi column one. As a result three 
reducers get merged to one thus making the plan more compact.  
   
   Apart from that, you are right that the multi column transformation can lead 
to further optimization opportunities. An example can be seen in query24.q.out 
(Check commit 
https://github.com/apache/hive/pull/1325/commits/c9f9112d0802906dce7442f3d4c01535a584af11).
 There the `SharedWorkOptimizer` kicks in and merges two semijoin reducer 
branches on the same scan operator. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ opened a new pull request #1366: HIVE-24000: Put exclusive MERGE INSERT under the feature flag

2020-08-06 Thread GitBox


deniskuzZ opened a new pull request #1366:
URL: https://github.com/apache/hive/pull/1366


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466414395



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 
-SEL-GB1-RS1-GB2-RS2
+  SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), 
SelectOperator.class, 0, 0, 0, 0);
+  assert selOp != null;
+  assert selOp.getParentOperators().size() == 1;
+  Operator source = selOp.getParentOperators().get(0);
+  SJSourceTarget sjKey = new SJSourceTarget(source, ts);
+  List ops = sameTableSJ.computeIfAbsent(sjKey, 
tableScanOperator -> new 

[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466410669



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 
-SEL-GB1-RS1-GB2-RS2
+  SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), 
SelectOperator.class, 0, 0, 0, 0);
+  assert selOp != null;
+  assert selOp.getParentOperators().size() == 1;
+  Operator source = selOp.getParentOperators().get(0);
+  SJSourceTarget sjKey = new SJSourceTarget(source, ts);
+  List ops = sameTableSJ.computeIfAbsent(sjKey, 
tableScanOperator -> new 

[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466404775



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##
@@ -2054,7 +2067,8 @@ private void markSemiJoinForDPP(OptimizeTezProcContext 
procCtx)
   // Lookup nDVs on TS side.
   RuntimeValuesInfo rti = procCtx.parseContext
   .getRsToRuntimeValuesInfoMap().get(rs);
-  ExprNodeDesc tsExpr = rti.getTsColExpr();
+  // TODO Adapt for multi column semi-joins.

Review comment:
   I meant to handle this as part of HIVE-23934. I added the reference to 
the comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466404256



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##
@@ -1887,13 +1898,14 @@ private void 
removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx)
 // Check the ndv/rows from the SEL vs the destination tablescan the 
semijoin opt is going to.
 TableScanOperator ts = sjInfo.getTsOp();
 RuntimeValuesInfo rti = 
procCtx.parseContext.getRsToRuntimeValuesInfoMap().get(rs);
-ExprNodeDesc tsExpr = rti.getTsColExpr();
-// In the SEL operator of the semijoin branch, there should be only 
one column in the operator
-ExprNodeDesc selExpr = sel.getConf().getColList().get(0);
+List targetColumns = rti.getTargetColumns();
+// In multi column semijoin branches the last column of the SEL 
operator is hash(c1, c2, ..., cn)
+// so we shouldn't consider it.
+List sourceColumns = 
sel.getConf().getColList().subList(0, targetColumns.size());

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466368067



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##
@@ -1737,35 +1737,46 @@ private static double getBloomFilterBenefit(
   }
 }
 
-// Selectivity: key cardinality of semijoin / domain cardinality
-// Benefit (rows filtered from ts): (1 - selectivity) * # ts rows
-double selectivity = selKeyCardinality / (double) keyDomainCardinality;
-selectivity = Math.min(selectivity, 1);
-benefit = tsRows * (1 - selectivity);
-
 if (LOG.isDebugEnabled()) {
-  LOG.debug("BloomFilter benefit for " + selCol + " to " + tsCol
-  + ", selKeyCardinality=" + selKeyCardinality
-  + ", tsKeyCardinality=" + tsKeyCardinality
-  + ", tsRows=" + tsRows
-  + ", keyDomainCardinality=" + keyDomainCardinality);
-  LOG.debug("SemiJoin key selectivity=" + selectivity
-  + ", benefit=" + benefit);
+  LOG.debug("BloomFilter selectivity for " + selCol + " to " + tsCol + ", 
selKeyCardinality=" + selKeyCardinality
+  + ", tsKeyCardinality=" + tsKeyCardinality + ", 
keyDomainCardinality=" + keyDomainCardinality);
 }
+// Selectivity: key cardinality of semijoin / domain cardinality
+return selKeyCardinality / (double) keyDomainCardinality;
+  }
 
-return benefit;
+  private static double getBloomFilterBenefit(
+  SelectOperator sel, List selExpr,
+  Statistics filStats, List tsExpr) {
+if (sel.getStatistics() == null || filStats == null) {
+  LOG.debug("No stats available to compute BloomFilter benefit");
+  return -1;
+}
+double selectivity = 0.0;
+for (int i = 0; i < tsExpr.size(); i++) {
+  selectivity = Math.max(selectivity, getBloomFilterSelectivity(sel, 
selExpr.get(i), filStats, tsExpr.get(i)));

Review comment:
   You are right, I was the one confused. I applied the change along with 
some doc.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466367481



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 
-SEL-GB1-RS1-GB2-RS2
+  SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), 
SelectOperator.class, 0, 0, 0, 0);
+  assert selOp != null;
+  assert selOp.getParentOperators().size() == 1;
+  Operator source = selOp.getParentOperators().get(0);
+  SJSourceTarget sjKey = new SJSourceTarget(source, ts);
+  List ops = sameTableSJ.computeIfAbsent(sjKey, 
tableScanOperator -> new 

[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466366197



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 
-SEL-GB1-RS1-GB2-RS2
+  SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), 
SelectOperator.class, 0, 0, 0, 0);
+  assert selOp != null;
+  assert selOp.getParentOperators().size() == 1;
+  Operator source = selOp.getParentOperators().get(0);
+  SJSourceTarget sjKey = new SJSourceTarget(source, ts);
+  List ops = sameTableSJ.computeIfAbsent(sjKey, 
tableScanOperator -> new 

[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466365583



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 
-SEL-GB1-RS1-GB2-RS2
+  SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), 
SelectOperator.class, 0, 0, 0, 0);
+  assert selOp != null;
+  assert selOp.getParentOperators().size() == 1;
+  Operator source = selOp.getParentOperators().get(0);
+  SJSourceTarget sjKey = new SJSourceTarget(source, ts);
+  List ops = sameTableSJ.computeIfAbsent(sjKey, 
tableScanOperator -> new 

[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466365453



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 
-SEL-GB1-RS1-GB2-RS2
+  SelectOperator selOp = OperatorUtils.ancestor(smjEntry.getKey(), 
SelectOperator.class, 0, 0, 0, 0);
+  assert selOp != null;

Review comment:
   Done. Didn't add `checkNotNull` since NPE will be thrown anyways and it 
is rather informative as well.





This is an automated message from the Apache Git Service.
To respond 

[GitHub] [hive] GuoPhilipse commented on pull request #1363: HIVE-23996: Remove unused line to keep code clean

2020-08-06 Thread GitBox


GuoPhilipse commented on pull request #1363:
URL: https://github.com/apache/hive/pull/1363#issuecomment-669876714


   cc @pvary



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1325: HIVE-21196 HIVE-23940 Multi-column semijoin reducers and TPC-H datasets

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r466347783



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
##
@@ -53,6 +53,34 @@
 
   private static final Logger LOG = 
LoggerFactory.getLogger(OperatorUtils.class);
 
+  /**
+   * Return the ancestor of the specified operator at the provided path or 
null if the path is invalid.
+   *
+   * The method is equivalent to following code:
+   * {@code
+   * op.getParentOperators().get(path[0])
+   * .getParentOperators().get(path[1])
+   * ...
+   * .getParentOperators().get(path[n])
+   * }
+   * with additional checks about the validity of the provided path and the 
type of the ancestor.
+   *
+   * @param op the operator for which we
+   * @param clazz the class of the ancestor operator
+   * @param path the path leading to the desired ancestor
+   * @param  the type of the ancestor
+   * @return the ancestor of the specified operator at the provided path or 
null if the path is invalid.
+   */
+  public static  T ancestor(Operator op, Class clazz, int... path) {
+Operator target = op;
+for (int i = 0; i < path.length; i++) {
+  if (target.getParentOperators() == null || path[i] > 
target.getParentOperators().size())

Review comment:
   Done. I also configured IntelliJ to force their usage in single line 
statements so hopefully they should never appear.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import 

[GitHub] [hive] zabetak commented on a change in pull request #1357: HIVE-23963: UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule

2020-08-06 Thread GitBox


zabetak commented on a change in pull request #1357:
URL: https://github.com/apache/hive/pull/1357#discussion_r466296959



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelDistribution.java
##
@@ -81,8 +84,16 @@ public RelDistribution apply(TargetMapping mapping) {
   return this;
 }
 List newKeys = new ArrayList<>(keys.size());
+
+// Instead of using a HashMap for lookup 
newKeys.add(mapping.getTargetOpt(key)); should be called but not all the
+// mapping supports that. See HIVE-23963. Replace this when this is fixed 
in calcite.

Review comment:
   If it is meant to be fixed in Calcite then we should create a JIRA and 
add an entry in `org.apache.hadoop.hive.ql.optimizer.calcite.Bug`. We could 
even skip the JIRA creation and move this comment in `Bug` as 
`CALCITE-X_fixed` or something similar.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] dengzhhu653 commented on pull request #1205: HIVE-23800: Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-08-06 Thread GitBox


dengzhhu653 commented on pull request #1205:
URL: https://github.com/apache/hive/pull/1205#issuecomment-669832758


   > sorry @dengzhhu653 lately I was a little bit flooded with all kind of 
stuff...and right now I'm on holiday - will get back to your patch next week!
   sorry for disturbing you. Have a nice holiday!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] dengzhhu653 edited a comment on pull request #1205: HIVE-23800: Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-08-06 Thread GitBox


dengzhhu653 edited a comment on pull request #1205:
URL: https://github.com/apache/hive/pull/1205#issuecomment-669832758


   > sorry @dengzhhu653 lately I was a little bit flooded with all kind of 
stuff...and right now I'm on holiday - will get back to your patch next week!
   
   sorry for disturbing you. Have a nice holiday!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk commented on pull request #1205: HIVE-23800: Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-08-06 Thread GitBox


kgyrtkirk commented on pull request #1205:
URL: https://github.com/apache/hive/pull/1205#issuecomment-669804983


   sorry @dengzhhu653 lately I was a little bit flooded with all kind of 
stuff...and right now I'm on holiday - will get back to your patch next week!
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kishendas commented on pull request #1355: send tableId to get_partition APIs

2020-08-06 Thread GitBox


kishendas commented on pull request #1355:
URL: https://github.com/apache/hive/pull/1355#issuecomment-669758040


   > Can you add (or modify) some tests to make sure these APIs don't regress 
in future.
   
   Sure, I have added tests to make sure both validWriteIdList and tableId are 
sent from HS2 for the newly added HMS get_* APIs that take validWriteIdList and 
tableId in the input. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org