[jira] [Commented] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-23 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164159#comment-17164159
 ] 

Aasha Medhi commented on HIVE-23835:


+1

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23835.01.patch, HIVE-23835.02.patch, 
> HIVE-23835.03.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?focusedWorklogId=462831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462831
 ]

ASF GitHub Bot logged work on HIVE-23835:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 05:46
Start Date: 24/Jul/20 05:46
Worklog Time Spent: 10m 
  Work Description: aasha commented on pull request #1249:
URL: https://github.com/apache/hive/pull/1249#issuecomment-663357180


   Please add a log line for the config value at dump and load time. Will help 
in debugging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462831)
Time Spent: 1h  (was: 50m)

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23835.01.patch, HIVE-23835.02.patch, 
> HIVE-23835.03.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-07-23 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23916:

Attachment: HIVE-23916.01.patch

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
> Attachments: HIVE-23916.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-07-23 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23916:

Attachment: (was: HIVE-23916.01.patch)

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
> Attachments: HIVE-23916.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462816
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 03:57
Start Date: 24/Jul/20 03:57
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459841087



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinProjectTransposeRule.java
##
@@ -133,6 +135,10 @@ private HiveJoinProjectTransposeRuleBase(
 
 public void onMatch(RelOptRuleCall call) {
   //TODO: this can be removed once CALCITE-3824 is released
+  Join joinRel = call.rel(0);
+  if (joinRel.getJoinType() == JoinRelType.ANTI) {

Review comment:
   This was causing some issue with having clause. 
https://issues.apache.org/jira/browse/HIVE-23921





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462816)
Time Spent: 8.5h  (was: 8h 20m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23921) Support HiveJoinProjectTransposeRule for Anti Join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23921:
---
Description: 
Support HiveJoinProjectTransposeRule for Anti Join

 

  was:
 If we have a PK-FK join that is only appending columns to the FK side, it 
basically means it is not filtering anything (everything is matching). If that 
is the case, then ANTIJOIN result would be empty. We could detect this at 
planning time and trigger the rewriting.

 


> Support HiveJoinProjectTransposeRule for Anti Join
> --
>
> Key: HIVE-23921
> URL: https://issues.apache.org/jira/browse/HIVE-23921
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Support HiveJoinProjectTransposeRule for Anti Join
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23921) Support HiveJoinProjectTransposeRule for Anti Join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23921:
--


> Support HiveJoinProjectTransposeRule for Anti Join
> --
>
> Key: HIVE-23921
> URL: https://issues.apache.org/jira/browse/HIVE-23921
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
>  If we have a PK-FK join that is only appending columns to the FK side, it 
> basically means it is not filtering anything (everything is matching). If 
> that is the case, then ANTIJOIN result would be empty. We could detect this 
> at planning time and trigger the rewriting.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462815
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 03:52
Start Date: 24/Jul/20 03:52
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459840139



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java
##
@@ -100,7 +100,8 @@ public void onMatch(RelOptRuleCall call) {
 // These boolean values represent corresponding left, right input which is 
potential FK
 boolean leftInputPotentialFK = topRefs.intersects(leftBits);
 boolean rightInputPotentialFK = topRefs.intersects(rightBits);
-if (leftInputPotentialFK && rightInputPotentialFK && (joinType == 
JoinRelType.INNER || joinType == JoinRelType.SEMI)) {
+if (leftInputPotentialFK && rightInputPotentialFK &&
+(joinType == JoinRelType.INNER || joinType == JoinRelType.SEMI || 
joinType == JoinRelType.ANTI)) {

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23920





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462815)
Time Spent: 8h 20m  (was: 8h 10m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23920) Need to handle HiveJoinConstraintsRule for Anti Join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23920:
--


> Need to handle HiveJoinConstraintsRule for Anti Join
> 
>
> Key: HIVE-23920
> URL: https://issues.apache.org/jira/browse/HIVE-23920
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Currently in Hive we create different operator for different kind of join. n 
> Calcite, it all seems to be based on a single Join class in newer releases. 
> So the classes like HiveAntiJoin, HiveSemiJoin can be merged into one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23920) Need to handle HiveJoinConstraintsRule for Anti Join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23920:
---
Description: 
 If we have a PK-FK join that is only appending columns to the FK side, it 
basically means it is not filtering anything (everything is matching). If that 
is the case, then ANTIJOIN result would be empty. We could detect this at 
planning time and trigger the rewriting.

 

  was:
Currently in Hive we create different operator for different kind of join. n 
Calcite, it all seems to be based on a single Join class in newer releases. So 
the classes like HiveAntiJoin, HiveSemiJoin can be merged into one.

 


> Need to handle HiveJoinConstraintsRule for Anti Join
> 
>
> Key: HIVE-23920
> URL: https://issues.apache.org/jira/browse/HIVE-23920
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
>  If we have a PK-FK join that is only appending columns to the FK side, it 
> basically means it is not filtering anything (everything is matching). If 
> that is the case, then ANTIJOIN result would be empty. We could detect this 
> at planning time and trigger the rewriting.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462811
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 02:41
Start Date: 24/Jul/20 02:41
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459827002



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
##
@@ -74,7 +78,14 @@ public HiveJoinAddNotNullRule(Class clazz,
   @Override
   public void onMatch(RelOptRuleCall call) {
 Join join = call.rel(0);
-if (join.getJoinType() == JoinRelType.FULL || 
join.getCondition().isAlwaysTrue()) {
+
+// For anti join case add the not null on right side if the condition is
+// always true. This is done because during execution, anti join expect 
the right side to
+// be empty and if we dont put null check on right, for null only right 
side table and condition
+// always true, execution will produce 0 records.
+// eg  select * from left_tbl where (select 1 from all_null_right limit 1) 
is null
+if (join.getJoinType() == JoinRelType.FULL ||
+(join.getJoinType() != JoinRelType.ANTI && 
join.getCondition().isAlwaysTrue())) {

Review comment:
   Yes, the comment is not proper. It's like we will add a not null 
condition for anti join even if the condition is always true.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462811)
Time Spent: 8h 10m  (was: 8h)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462810=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462810
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 02:39
Start Date: 24/Jul/20 02:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459826636



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAntiJoin.java
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.reloperators;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRulesRegistry;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class HiveAntiJoin extends Join implements HiveRelNode {

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23919





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462810)
Time Spent: 8h  (was: 7h 50m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted 

[jira] [Updated] (HIVE-23919) Merge all kind of Join operator variants (Semi, Anti, Normal) into one.

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23919:
---
Description: 
Currently in Hive we create different operator for different kind of join. n 
Calcite, it all seems to be based on a single Join class in newer releases. So 
the classes like HiveAntiJoin, HiveSemiJoin can be merged into one.

 

  was:
For Anti Join, we emit the records if the join condition does not satisfies. In 
case of PK-FK rule we have to explore if this can be exploited to speed up Anti 
Join processing.

 


> Merge all kind of Join operator variants (Semi, Anti, Normal) into one. 
> 
>
> Key: HIVE-23919
> URL: https://issues.apache.org/jira/browse/HIVE-23919
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Currently in Hive we create different operator for different kind of join. n 
> Calcite, it all seems to be based on a single Join class in newer releases. 
> So the classes like HiveAntiJoin, HiveSemiJoin can be merged into one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23919) Merge all kind of Join operator variants (Semi, Anti, Normal) into one.

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23919:
--


> Merge all kind of Join operator variants (Semi, Anti, Normal) into one. 
> 
>
> Key: HIVE-23919
> URL: https://issues.apache.org/jira/browse/HIVE-23919
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For Anti Join, we emit the records if the join condition does not satisfies. 
> In case of PK-FK rule we have to explore if this can be exploited to speed up 
> Anti Join processing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462805
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 02:35
Start Date: 24/Jul/20 02:35
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459825977



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAntiJoin.java
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.reloperators;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRulesRegistry;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class HiveAntiJoin extends Join implements HiveRelNode {
+
+  private final RexNode joinFilter;

Review comment:
   The joinjoinFilter holds the residual filter which is used during post 
processing. These are the join conditions that are not part of the join key. I 
think condition in Join hold the full condition. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462805)
Time Spent: 7h 50m  (was: 7h 40m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is 

[jira] [Resolved] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced

2020-07-23 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23736.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~kkasa]!

> Disable topn in ReduceSinkOp if a TNK is introduced
> ---
>
> Key: HIVE-23736
> URL: https://issues.apache.org/jira/browse/HIVE-23736
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Both the Reduce Sink and TopNKey operator has Top n key filtering 
> functionality. If TNK is introduced this functionality is done twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23736?focusedWorklogId=462797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462797
 ]

ASF GitHub Bot logged work on HIVE-23736:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 02:12
Start Date: 24/Jul/20 02:12
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1158:
URL: https://github.com/apache/hive/pull/1158


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462797)
Time Spent: 20m  (was: 10m)

> Disable topn in ReduceSinkOp if a TNK is introduced
> ---
>
> Key: HIVE-23736
> URL: https://issues.apache.org/jira/browse/HIVE-23736
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Both the Reduce Sink and TopNKey operator has Top n key filtering 
> functionality. If TNK is introduced this functionality is done twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced

2020-07-23 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164081#comment-17164081
 ] 

Jesus Camacho Rodriguez commented on HIVE-23736:


+1

> Disable topn in ReduceSinkOp if a TNK is introduced
> ---
>
> Key: HIVE-23736
> URL: https://issues.apache.org/jira/browse/HIVE-23736
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Both the Reduce Sink and TopNKey operator has Top n key filtering 
> functionality. If TNK is introduced this functionality is done twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23750) Rewrite plan to join back tables: support function calls in project

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23750?focusedWorklogId=462792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462792
 ]

ASF GitHub Bot logged work on HIVE-23750:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 02:04
Start Date: 24/Jul/20 02:04
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1184:
URL: https://github.com/apache/hive/pull/1184#discussion_r459819807



##
File path: 
ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query84.q.out
##
@@ -56,24 +56,24 @@ CBO PLAN:
 HiveProject(customer_id=[$0], customername=[$1])
   HiveSortLimit(sort0=[$2], dir0=[ASC], fetch=[100])
 HiveProject(customer_id=[$2], customername=[$6], c_customer_id=[$2])
-  HiveJoin(condition=[=($8, $4)], joinType=[inner], algorithm=[none], 
cost=[not available])
-HiveJoin(condition=[=($1, $3)], joinType=[inner], algorithm=[none], 
cost=[not available])
-  HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], 
cost=[not available])
+  HiveJoin(condition=[=($8, $4)], joinType=[inner], 
algorithm=[CommonJoin], cost=[not available])

Review comment:
   This is not nice, it seem a side effect of using the new cost model. I 
am wondering why we did not hit it before? Are we changing the metadata 
provider correctly?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
##
@@ -161,127 +174,118 @@ public RelNode trim(RelBuilder relBuilder, RelNode 
root) {
 return root;
   }
 
-  // 4. Collect fields for new Project on the top of Join backs
+  // 3. Join back tables to the top of original plan
   Mapping newInputMapping = trimResult.right;
-  RexNode[] newProjects = new RexNode[rootFieldList.size()];
-  String[] newColumnNames = new String[rootFieldList.size()];
-  projectsFromOriginalPlan(rexBuilder, 
newInput.getRowType().getFieldCount(), newInput, newInputMapping,
-  newProjects, newColumnNames);
+  Map tableInputRefMapping = new HashMap<>();
 
-  // 5. Join back tables to the top of original plan
   for (TableToJoinBack tableToJoinBack : tableToJoinBackList) {
-LOG.debug("Joining back table " + 
tableToJoinBack.projectedFields.relOptHiveTable.getName());
+LOG.debug("Joining back table " + 
tableToJoinBack.joinedBackFields.relOptHiveTable.getName());
 
-// 5.1 Create new TableScan of tables to join back
-RelOptHiveTable relOptTable = 
tableToJoinBack.projectedFields.relOptHiveTable;
+// 3.1. Create new TableScan of tables to join back
+RelOptHiveTable relOptTable = 
tableToJoinBack.joinedBackFields.relOptHiveTable;
 RelOptCluster cluster = relBuilder.getCluster();
 HiveTableScan tableScan = new HiveTableScan(cluster, 
cluster.traitSetOf(HiveRelNode.CONVENTION),
 relOptTable, relOptTable.getHiveTableMD().getTableName(), null, 
false, false);
-// 5.2 Project only required fields from this table
+// 3.2. Create Project with the required fields from this table
 RelNode projectTableAccessRel = tableScan.project(
-tableToJoinBack.projectedFields.fieldsInSourceTable, new 
HashSet<>(0), REL_BUILDER.get());
+tableToJoinBack.joinedBackFields.fieldsInSourceTable, new 
HashSet<>(0), REL_BUILDER.get());
 
-Mapping keyMapping = Mappings.create(MappingType.INVERSE_SURJECTION,
-tableScan.getRowType().getFieldCount(), 
tableToJoinBack.keys.cardinality());
+// 3.3. Create mapping between the Project and TableScan
+Mapping projectMapping = 
Mappings.create(MappingType.INVERSE_SURJECTION,
+tableScan.getRowType().getFieldCount(),
+
tableToJoinBack.joinedBackFields.fieldsInSourceTable.cardinality());
 int projectIndex = 0;
+for (int i : tableToJoinBack.joinedBackFields.fieldsInSourceTable) {
+  projectMapping.set(i, projectIndex);
+  ++projectIndex;
+}
+
 int offset = newInput.getRowType().getFieldCount();
 
-for (int source : tableToJoinBack.projectedFields.fieldsInSourceTable) 
{
-  if (tableToJoinBack.keys.get(source)) {
-// 5.3 Map key field to it's index in the Project on the TableScan
-keyMapping.set(source, projectIndex);
-  } else {
-// 5.4 if this is not a key field then we need it in the new 
Project on the top of Join backs
-ProjectMapping currentProjectMapping =
-tableToJoinBack.projectedFields.mapping.stream()
-.filter(projectMapping -> 
projectMapping.indexInSourceTable == source)
-.findFirst().get();
-addToProject(projectTableAccessRel, projectIndex, rexBuilder,
-offset + 

[jira] [Updated] (HIVE-23863) UGI doAs privilege action to make calls to Ranger Service

2020-07-23 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23863:
---
Status: In Progress  (was: Patch Available)

> UGI doAs privilege action  to make calls to Ranger Service
> --
>
> Key: HIVE-23863
> URL: https://issues.apache.org/jira/browse/HIVE-23863
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23863.01.patch, HIVE-23863.02.patch, 
> HIVE-23863.03.patch, UGI and Replication.pdf
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23863) UGI doAs privilege action to make calls to Ranger Service

2020-07-23 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23863:
---
Attachment: HIVE-23863.03.patch
Status: Patch Available  (was: In Progress)

> UGI doAs privilege action  to make calls to Ranger Service
> --
>
> Key: HIVE-23863
> URL: https://issues.apache.org/jira/browse/HIVE-23863
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23863.01.patch, HIVE-23863.02.patch, 
> HIVE-23863.03.patch, UGI and Replication.pdf
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=462786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462786
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 01:33
Start Date: 24/Jul/20 01:33
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459813384



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
 maxHashTblMemory/1024/1024,
 gcCanary.get() == null ? "dead" : "alive"));
   }
+  int avgAccess = computeAvgAccess();
 
   /* Iterate the global (keywrapper,aggregationbuffers) map and emit
a row for each key */
   Iterator> iter =
   mapKeysAggregationBuffers.entrySet().iterator();
   while(iter.hasNext()) {
 Map.Entry pair = iter.next();
+if (!all && avgAccess >= 1) {
+  // Retain entries when access pattern is > than average access
+  if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23917





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462786)
Time Spent: 2.5h  (was: 2h 20m)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23917) Reset key access count during eviction in VectorGroupByOperator

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23917?focusedWorklogId=462785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462785
 ]

ASF GitHub Bot logged work on HIVE-23917:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 01:32
Start Date: 24/Jul/20 01:32
Worklog Time Spent: 10m 
  Work Description: rbalamohan opened a new pull request #1306:
URL: https://github.com/apache/hive/pull/1306


   https://issues.apache.org/jira/browse/HIVE-23917
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462785)
Remaining Estimate: 0h
Time Spent: 10m

> Reset key access count during eviction in VectorGroupByOperator
> ---
>
> Key: HIVE-23917
> URL: https://issues.apache.org/jira/browse/HIVE-23917
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Follow up of https://issues.apache.org/jira/browse/HIVE-23843
> There can be a case (depending on data) where large number of entries in the 
> aggregation map, exceed the average access and it could prevent 10% flushing 
> limit. Adding reset on the evicted entries would help preventing this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23917) Reset key access count during eviction in VectorGroupByOperator

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23917:
--
Labels: pull-request-available  (was: )

> Reset key access count during eviction in VectorGroupByOperator
> ---
>
> Key: HIVE-23917
> URL: https://issues.apache.org/jira/browse/HIVE-23917
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Follow up of https://issues.apache.org/jira/browse/HIVE-23843
> There can be a case (depending on data) where large number of entries in the 
> aggregation map, exceed the average access and it could prevent 10% flushing 
> limit. Adding reset on the evicted entries would help preventing this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=462780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462780
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 01:22
Start Date: 24/Jul/20 01:22
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459811176



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
 maxHashTblMemory/1024/1024,
 gcCanary.get() == null ? "dead" : "alive"));
   }
+  int avgAccess = computeAvgAccess();
 
   /* Iterate the global (keywrapper,aggregationbuffers) map and emit
a row for each key */
   Iterator> iter =
   mapKeysAggregationBuffers.entrySet().iterator();
   while(iter.hasNext()) {
 Map.Entry pair = iter.next();
+if (!all && avgAccess >= 1) {
+  // Retain entries when access pattern is > than average access
+  if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
   >> keys could retain their places for a long time because of very old 
cache hits - and they will keep their place in the cache
   
   This depends on incoming data and would be the worst case scenario similar 
to earlier implementation.  However, there is a corner case (again depending on 
data) that large number of entries in the map exceeds the average threshold 
which could prevent 10% flushing limit. Adding the reset would help preventing 
this. I will create a follow up ticket on this.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462780)
Time Spent: 2h 20m  (was: 2h 10m)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23898) Query fails if identifier contains double quotes or semicolon char

2020-07-23 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23898.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~kkasa]!

> Query fails if identifier contains double quotes or semicolon char
> --
>
> Key: HIVE-23898
> URL: https://issues.apache.org/jira/browse/HIVE-23898
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE `t;`(a int);
> {code}
> {code}
> [ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:62 Client execution failed 
> with error code = 4 
> running 
> CREATE TABLE `t 
> fname=test.q
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.ParseException: line 2:15 character '' 
> not supported here
> {code}
> {code}
> CREATE TABLE `t"`(a int);
> {code}
> {code}
> [ERROR] Failures: 
> [ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:62 Client execution failed 
> with error code = 4 
> running 
> CREATE TABLE `t"`(a int);
>  
> fname=test.q
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.ParseException: line 3:24 extraneous input 
> ';' expecting EOF near ''
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23898) Query fails if identifier contains double quotes or semicolon char

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23898?focusedWorklogId=462775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462775
 ]

ASF GitHub Bot logged work on HIVE-23898:
-

Author: ASF GitHub Bot
Created on: 24/Jul/20 01:13
Start Date: 24/Jul/20 01:13
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1295:
URL: https://github.com/apache/hive/pull/1295


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462775)
Time Spent: 20m  (was: 10m)

> Query fails if identifier contains double quotes or semicolon char
> --
>
> Key: HIVE-23898
> URL: https://issues.apache.org/jira/browse/HIVE-23898
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE `t;`(a int);
> {code}
> {code}
> [ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:62 Client execution failed 
> with error code = 4 
> running 
> CREATE TABLE `t 
> fname=test.q
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.ParseException: line 2:15 character '' 
> not supported here
> {code}
> {code}
> CREATE TABLE `t"`(a int);
> {code}
> {code}
> [ERROR] Failures: 
> [ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:62 Client execution failed 
> with error code = 4 
> running 
> CREATE TABLE `t"`(a int);
>  
> fname=test.q
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.ParseException: line 3:24 extraneous input 
> ';' expecting EOF near ''
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23898) Query fails if identifier contains double quotes or semicolon char

2020-07-23 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164068#comment-17164068
 ] 

Jesus Camacho Rodriguez commented on HIVE-23898:


+1

> Query fails if identifier contains double quotes or semicolon char
> --
>
> Key: HIVE-23898
> URL: https://issues.apache.org/jira/browse/HIVE-23898
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE `t;`(a int);
> {code}
> {code}
> [ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:62 Client execution failed 
> with error code = 4 
> running 
> CREATE TABLE `t 
> fname=test.q
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.ParseException: line 2:15 character '' 
> not supported here
> {code}
> {code}
> CREATE TABLE `t"`(a int);
> {code}
> {code}
> [ERROR] Failures: 
> [ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:62 Client execution failed 
> with error code = 4 
> running 
> CREATE TABLE `t"`(a int);
>  
> fname=test.q
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.ParseException: line 3:24 extraneous input 
> ';' expecting EOF near ''
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23878) Aggregate after join throws off MV rewrite

2020-07-23 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163955#comment-17163955
 ] 

Vineet Garg commented on HIVE-23878:


+1. Super creative solution (y)

> Aggregate after join throws off MV rewrite 
> ---
>
> Key: HIVE-23878
> URL: https://issues.apache.org/jira/browse/HIVE-23878
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: q81_eg.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> E.g Q81, Q30, Q45, Q68: In all these queries, MV rewrites are disabled for 
> {{customer, customer-address}} MV.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23892) Test Interpretation String fix

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23892?focusedWorklogId=462682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462682
 ]

ASF GitHub Bot logged work on HIVE-23892:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 19:55
Start Date: 23/Jul/20 19:55
Worklog Time Spent: 10m 
  Work Description: jcamachor opened a new pull request #1305:
URL: https://github.com/apache/hive/pull/1305


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462682)
Time Spent: 20m  (was: 10m)

> Test Interpretation String fix
> --
>
> Key: HIVE-23892
> URL: https://issues.apache.org/jira/browse/HIVE-23892
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Testing a fix to interpretation of strings



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-07-23 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23916:

Attachment: HIVE-23916.01.patch

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
> Attachments: HIVE-23916.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-07-23 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23916:

Status: Patch Available  (was: In Progress)

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23916) Fix Atlas client dependency version

2020-07-23 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23916 started by Pravin Sinha.
---
> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-07-23 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23916:

Summary: Fix Atlas client dependency version  (was: Fix Atlas client 
dependencies version)

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23916) Fix Atlas client dependencies version

2020-07-23 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha reassigned HIVE-23916:
---


> Fix Atlas client dependencies version
> -
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23915) Improve Github PR Template

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23915?focusedWorklogId=462641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462641
 ]

ASF GitHub Bot logged work on HIVE-23915:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 18:20
Start Date: 23/Jul/20 18:20
Worklog Time Spent: 10m 
  Work Description: HunterL opened a new pull request #1303:
URL: https://github.com/apache/hive/pull/1303


   ### What changes were proposed in this pull request?
   The proposed change is an update to the github pull request template. I've 
copied the template from the Apache Spark project which seems to work pretty 
well.
   
   ### Why are the changes needed?
   The project is now accepting PR's from github, which is great in that it 
should drastically increase the number of contributors. Consequently this also 
means an increase in the number of PR's which means that we need to make 
reviewing PR's as easy as possible for the people who have those permissions.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, contributors who create a new PR will see the new PR template.
   
   ### How was this patch tested?
   This patch was not tested.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462641)
Remaining Estimate: 0h
Time Spent: 10m

> Improve Github PR Template
> --
>
> Key: HIVE-23915
> URL: https://issues.apache.org/jira/browse/HIVE-23915
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Having this repo on github is great and will probably increase the number of 
> contributors tremendously. With that there will of course be an influx in the 
> number of PRs so it's going to be critical to make reviewing PRs as easy as 
> possible.
> Using 
> [https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE] 
> as a template improve the template to make reviewing easier for contributors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23915) Improve Github PR Template

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23915:
--
Labels: pull-request-available  (was: )

> Improve Github PR Template
> --
>
> Key: HIVE-23915
> URL: https://issues.apache.org/jira/browse/HIVE-23915
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Having this repo on github is great and will probably increase the number of 
> contributors tremendously. With that there will of course be an influx in the 
> number of PRs so it's going to be critical to make reviewing PRs as easy as 
> possible.
> Using 
> [https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE] 
> as a template improve the template to make reviewing easier for contributors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.9.2

2020-07-23 Thread Xinli Shang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163836#comment-17163836
 ] 

Xinli Shang commented on HIVE-21737:


[~iemejia]Is there progress on this? 

> Upgrade Avro to version 1.9.2
> -
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro 1.9.2 was released recently. It brings a lot of fixes including a leaner 
> version of Avro without Jackson in the public API. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-15966) Query with column alias fails in order by

2020-07-23 Thread Alex Meadows (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163835#comment-17163835
 ] 

Alex Meadows edited comment on HIVE-15966 at 7/23/20, 6:04 PM:
---

I've run into this issue while trying to work with Hive and SQLAlchemy.  Having 
the alias be used in the ORDER BY would allow for Hive to be leveraged in many 
various project types.  For full details, here's the issue as noted on the 
SQLAlchemy project:  (SQLAlchemy Github 
Issue[)https://github.com/sqlalchemy/sqlalchemy/issues/5472|https://github.com/sqlalchemy/sqlalchemy/issues/5472]

 


was (Author: opendataalex):
I've run into this issue while trying to work with Hive and SQLAlchemy.  Having 
the alias be used in the ORDER BY would allow for Hive to be leveraged in many 
various project types.  For full details, here's the issue as noted on the 
SQLAlchemy project:  SQLAlchemy Github 
Issue[https://github.com/sqlalchemy/sqlalchemy/issues/5472]

 

> Query with column alias fails in order by
> -
>
> Key: HIVE-15966
> URL: https://issues.apache.org/jira/browse/HIVE-15966
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Major
>
> Query:  
> {code}
> select mtg.marketing_type_group_desc as marketing_type_group
> from marketing_type_group mtg 
> order by mtg.marketing_type_group_desc;
> {code}
> fails with error:
> {code}
> 2017-02-17T11:22:11,441 ERROR [eb89eafb-e100-42b1-8ff1-b3332b2e715f main]: 
> ql.Driver (SessionState.java:printError(1116)) - FAILED: SemanticException 
> [Error 10004]: Line 7:9 Invalid table alias or column reference 
> 'marketing_type_group_desc': (possible column names are: 
> marketing_type_group, prod_type)
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 7:9 Invalid table 
> alias or column reference 'marketing_type_group_desc': (possible column names 
> are: marketing_type_group, prod_type)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11501)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11449)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11417)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11395)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7761)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9655)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9554)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10328)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:11011)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:478)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11022)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:285)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:514)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1319)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1459)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1239)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-15966) Query with column alias fails in order by

2020-07-23 Thread Alex Meadows (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163835#comment-17163835
 ] 

Alex Meadows commented on HIVE-15966:
-

I've run into this issue while trying to work with Hive and SQLAlchemy.  Having 
the alias be used in the ORDER BY would allow for Hive to be leveraged in many 
various project types.  For full details, here's the issue as noted on the 
SQLAlchemy project:  SQLAlchemy Github 
Issue[https://github.com/sqlalchemy/sqlalchemy/issues/5472]

 

> Query with column alias fails in order by
> -
>
> Key: HIVE-15966
> URL: https://issues.apache.org/jira/browse/HIVE-15966
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Major
>
> Query:  
> {code}
> select mtg.marketing_type_group_desc as marketing_type_group
> from marketing_type_group mtg 
> order by mtg.marketing_type_group_desc;
> {code}
> fails with error:
> {code}
> 2017-02-17T11:22:11,441 ERROR [eb89eafb-e100-42b1-8ff1-b3332b2e715f main]: 
> ql.Driver (SessionState.java:printError(1116)) - FAILED: SemanticException 
> [Error 10004]: Line 7:9 Invalid table alias or column reference 
> 'marketing_type_group_desc': (possible column names are: 
> marketing_type_group, prod_type)
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 7:9 Invalid table 
> alias or column reference 'marketing_type_group_desc': (possible column names 
> are: marketing_type_group, prod_type)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11501)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11449)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11417)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11395)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7761)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9655)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9554)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10328)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:11011)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:478)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11022)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:285)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:514)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1319)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1459)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1239)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23915) Improve Github PR Template

2020-07-23 Thread Hunter Logan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hunter Logan reassigned HIVE-23915:
---


> Improve Github PR Template
> --
>
> Key: HIVE-23915
> URL: https://issues.apache.org/jira/browse/HIVE-23915
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Minor
>
> Having this repo on github is great and will probably increase the number of 
> contributors tremendously. With that there will of course be an influx in the 
> number of PRs so it's going to be critical to make reviewing PRs as easy as 
> possible.
> Using 
> [https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE] 
> as a template improve the template to make reviewing easier for contributors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23874) Add Debug Logging to HiveQueryResultSet

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23874?focusedWorklogId=462634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462634
 ]

ASF GitHub Bot logged work on HIVE-23874:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 18:01
Start Date: 23/Jul/20 18:01
Worklog Time Spent: 10m 
  Work Description: HunterL opened a new pull request #1290:
URL: https://github.com/apache/hive/pull/1290


   Debugging fetch request will help in troubleshooting client side requests to 
HS2.
   
   hive-jdbc probably needs better logging overall, will likely be taking a 
look at that in the future.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462634)
Time Spent: 40m  (was: 0.5h)

> Add Debug Logging to HiveQueryResultSet
> ---
>
> Key: HIVE-23874
> URL: https://issues.apache.org/jira/browse/HIVE-23874
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding a debug message on this topic with handle, orientation, and fetch size 
> would be useful.
> [https://github.com/apache/hive/blob/bc00454c194413753ac1d7067044ca78c77e1a34/jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java#L342]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23874) Add Debug Logging to HiveQueryResultSet

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23874?focusedWorklogId=462635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462635
 ]

ASF GitHub Bot logged work on HIVE-23874:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 18:01
Start Date: 23/Jul/20 18:01
Worklog Time Spent: 10m 
  Work Description: HunterL commented on pull request #1290:
URL: https://github.com/apache/hive/pull/1290#issuecomment-663150210


   Closed and Reopen to trigger CI builds again (flaky test failed)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462635)
Time Spent: 50m  (was: 40m)

> Add Debug Logging to HiveQueryResultSet
> ---
>
> Key: HIVE-23874
> URL: https://issues.apache.org/jira/browse/HIVE-23874
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Adding a debug message on this topic with handle, orientation, and fetch size 
> would be useful.
> [https://github.com/apache/hive/blob/bc00454c194413753ac1d7067044ca78c77e1a34/jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java#L342]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23874) Add Debug Logging to HiveQueryResultSet

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23874?focusedWorklogId=462633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462633
 ]

ASF GitHub Bot logged work on HIVE-23874:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 17:59
Start Date: 23/Jul/20 17:59
Worklog Time Spent: 10m 
  Work Description: HunterL closed pull request #1290:
URL: https://github.com/apache/hive/pull/1290


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462633)
Time Spent: 0.5h  (was: 20m)

> Add Debug Logging to HiveQueryResultSet
> ---
>
> Key: HIVE-23874
> URL: https://issues.apache.org/jira/browse/HIVE-23874
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Adding a debug message on this topic with handle, orientation, and fetch size 
> would be useful.
> [https://github.com/apache/hive/blob/bc00454c194413753ac1d7067044ca78c77e1a34/jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java#L342]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23914) Support Char, VarChar, Small/tiny Int for Struct IN clause

2020-07-23 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23914:
-


> Support Char, VarChar, Small/tiny Int for Struct IN clause
> --
>
> Key: HIVE-23914
> URL: https://issues.apache.org/jira/browse/HIVE-23914
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23913) Support Date, Decimal and Timestamp for Struct IN clause

2020-07-23 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23913:
-


> Support Date, Decimal and Timestamp for Struct IN clause
> 
>
> Key: HIVE-23913
> URL: https://issues.apache.org/jira/browse/HIVE-23913
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23878) Aggregate after join throws off MV rewrite

2020-07-23 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163750#comment-17163750
 ] 

Jesus Camacho Rodriguez commented on HIVE-23878:


[~vgarg], could you review? Thanks

> Aggregate after join throws off MV rewrite 
> ---
>
> Key: HIVE-23878
> URL: https://issues.apache.org/jira/browse/HIVE-23878
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: q81_eg.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> E.g Q81, Q30, Q45, Q68: In all these queries, MV rewrites are disabled for 
> {{customer, customer-address}} MV.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23912) Extend vectorization support for Struct IN() clause

2020-07-23 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23912:
--
Description: 
Currently Struct IN() Vectorization does not support all writable type.
As a result Operators using such conditions fail to vectorize: for example we 
support String type but not Char or Varchar

Issue sample:

{code:java}
notVectorizedReason: FILTER operator: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected constant String 
type HiveCharWritable
{code}


  was:
Currently Struct IN() Vectorization does not support all writable type.
As a result Operators using such conditions fail to vectorize: for example we 
support String type but not Char or Varchar




> Extend vectorization support for Struct IN() clause
> ---
>
> Key: HIVE-23912
> URL: https://issues.apache.org/jira/browse/HIVE-23912
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Critical
>
> Currently Struct IN() Vectorization does not support all writable type.
> As a result Operators using such conditions fail to vectorize: for example we 
> support String type but not Char or Varchar
> Issue sample:
> {code:java}
> notVectorizedReason: FILTER operator: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected constant String 
> type HiveCharWritable
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23912) Extend vectorization support for Struct IN() clause

2020-07-23 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23912:
--
Priority: Critical  (was: Minor)

> Extend vectorization support for Struct IN() clause
> ---
>
> Key: HIVE-23912
> URL: https://issues.apache.org/jira/browse/HIVE-23912
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Critical
>
> Currently Struct IN() Vectorization does not support all writable type.
> As a result Operators using such conditions fail to vectorize: for example we 
> support String type but not Char or Varchar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23912) Extend vectorization support for Struct IN() clause

2020-07-23 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23912:
-


> Extend vectorization support for Struct IN() clause
> ---
>
> Key: HIVE-23912
> URL: https://issues.apache.org/jira/browse/HIVE-23912
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>
> Currently Struct IN() Vectorization does not support all writable type.
> As a result Operators using such conditions fail to vectorize: for example we 
> support String type but not Char or Varchar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=462592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462592
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 15:34
Start Date: 23/Jul/20 15:34
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459540489



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
 maxHashTblMemory/1024/1024,
 gcCanary.get() == null ? "dead" : "alive"));
   }
+  int avgAccess = computeAvgAccess();
 
   /* Iterate the global (keywrapper,aggregationbuffers) map and emit
a row for each key */
   Iterator> iter =
   mapKeysAggregationBuffers.entrySet().iterator();
   while(iter.hasNext()) {
 Map.Entry pair = iter.next();
+if (!all && avgAccess >= 1) {
+  // Retain entries when access pattern is > than average access
+  if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
   I don't think we should get to L1 cache stuff here; I'm ok with not 
having an LRU, we might not need to do that - let's stay on logical level:
   
   I'm saying that if in a round the hit count of an element is not reduced or 
reset to zero (when it's being kept for next round) those keys could retain 
their places for a long time because of very old cache hits - and they will 
keep their place in the cache; this could lead to a case in which all the new 
elements (n/2) added to the cache are evicted and why very old entries which 
are not even getting any hits will keep half of the cache filled.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462592)
Time Spent: 2h 10m  (was: 2h)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=462585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462585
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 15:11
Start Date: 23/Jul/20 15:11
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #1271:
URL: https://github.com/apache/hive/pull/1271#issuecomment-663063475


   @kgyrtkirk FYI. Test failure **schema_evol_text_vec_part_all_primitive.q** 
seems unrelated and passed on local run



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462585)
Time Spent: 1h 20m  (was: 1h 10m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> 

[jira] [Work logged] (HIVE-23483) Remove DynamicSerDe

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23483?focusedWorklogId=462566=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462566
 ]

ASF GitHub Bot logged work on HIVE-23483:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:55
Start Date: 23/Jul/20 13:55
Worklog Time Spent: 10m 
  Work Description: dlavati closed pull request #1207:
URL: https://github.com/apache/hive/pull/1207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462566)
Time Spent: 3h  (was: 2h 50m)

> Remove DynamicSerDe
> ---
>
> Key: HIVE-23483
> URL: https://issues.apache.org/jira/browse/HIVE-23483
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> It is used to read thrift data files. AFAIK no one uses thrift for data 
> serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23483) Remove DynamicSerDe

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23483?focusedWorklogId=462567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462567
 ]

ASF GitHub Bot logged work on HIVE-23483:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:55
Start Date: 23/Jul/20 13:55
Worklog Time Spent: 10m 
  Work Description: dlavati opened a new pull request #1207:
URL: https://github.com/apache/hive/pull/1207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462567)
Time Spent: 3h 10m  (was: 3h)

> Remove DynamicSerDe
> ---
>
> Key: HIVE-23483
> URL: https://issues.apache.org/jira/browse/HIVE-23483
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> It is used to read thrift data files. AFAIK no one uses thrift for data 
> serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23911) CBO fails when query has distinct in function and having clause

2020-07-23 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-23911:
--
Description: 
{code}
create table t (col0 int, col1 int);

select col0, count(distinct col1) from t
group by col0
having count(distinct col1) > 1;
{code}
{code}
2020-07-23T06:12:22,403 ERROR [6990c3d4-71c0-4acf-91ab-393d642ed5e9 main] 
parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Distinct 
without an aggregation.
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.internalGenSelectLogicalPlan(CalcitePlanner.java:4855)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:4628)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5234)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1855)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1801)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1562)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:541)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12467)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:435)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:297)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:730) 
[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:700) 
[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
 [hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) 
[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
 [test-classes/:?]
at 

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=462557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462557
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:12
Start Date: 23/Jul/20 13:12
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r459437391



##
File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java
##
@@ -45,6 +48,8 @@
  * This implementation has much lesser L1 data cache misses than {@link 
BloomFilter}.
  */
 public class BloomKFilter {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BloomKFilter.class.getName());

Review comment:
   Nit:: Does not require `.getName()`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462557)
Time Spent: 3h  (was: 2h 50m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23911) CBO fails when query has distinct in function and having clause

2020-07-23 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-23911:
-


> CBO fails when query has distinct in function and having clause
> ---
>
> Key: HIVE-23911
> URL: https://issues.apache.org/jira/browse/HIVE-23911
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> create table t (col0 int, col1 int);
> select col0, count(distinct col1) from t
> group by col0
> having count(distinct col1) > 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23865) Use More Java Collections Class

2020-07-23 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-23865.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks [~mgergely] for the review!

> Use More Java Collections Class
> ---
>
> Key: HIVE-23865
> URL: https://issues.apache.org/jira/browse/HIVE-23865
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23324) Parallelise compaction directory cleaning process

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23324?focusedWorklogId=462555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462555
 ]

ASF GitHub Bot logged work on HIVE-23324:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:08
Start Date: 23/Jul/20 13:08
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1275:
URL: https://github.com/apache/hive/pull/1275#discussion_r459435104



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -89,23 +93,28 @@ public void run() {
 handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
 startedAt = System.currentTimeMillis();
 long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
+List cleanerList = new ArrayList<>();
 for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
-  clean(compactionInfo, minOpenTxnId);
+  
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
 }
+CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
   } catch (Throwable t) {
 LOG.error("Caught an exception in the main loop of compactor cleaner, 
" +
-StringUtils.stringifyException(t));
-  }
-  finally {
+StringUtils.stringifyException(t));
+if (cleanerExecutor != null) {

Review comment:
   Which `InterruptedException` are you pointing at?  Also, I have moved 
the shutdown at the end of run method now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462555)
Time Spent: 9h 50m  (was: 9h 40m)

> Parallelise compaction directory cleaning process
> -
>
> Key: HIVE-23324
> URL: https://issues.apache.org/jira/browse/HIVE-23324
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Initiator processes the various compaction candidates in parallel, so we 
> could follow a similar approach in Cleaner where we currently clean the 
> directories sequentially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23865) Use More Java Collections Class

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23865?focusedWorklogId=462556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462556
 ]

ASF GitHub Bot logged work on HIVE-23865:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:08
Start Date: 23/Jul/20 13:08
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1267:
URL: https://github.com/apache/hive/pull/1267


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462556)
Time Spent: 1h 10m  (was: 1h)

> Use More Java Collections Class
> ---
>
> Key: HIVE-23865
> URL: https://issues.apache.org/jira/browse/HIVE-23865
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23324) Parallelise compaction directory cleaning process

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23324?focusedWorklogId=462554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462554
 ]

ASF GitHub Bot logged work on HIVE-23324:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:07
Start Date: 23/Jul/20 13:07
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1275:
URL: https://github.com/apache/hive/pull/1275#discussion_r459434134



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -89,23 +93,28 @@ public void run() {
 handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
 startedAt = System.currentTimeMillis();
 long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
+List cleanerList = new ArrayList<>();
 for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
-  clean(compactionInfo, minOpenTxnId);
+  
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
 }
+CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
   } catch (Throwable t) {
 LOG.error("Caught an exception in the main loop of compactor cleaner, 
" +
-StringUtils.stringifyException(t));
-  }
-  finally {
+StringUtils.stringifyException(t));
+if (cleanerExecutor != null) {
+  cleanerExecutor.shutdownNow();
+  cleanerExecutor = CompactorUtil.createExecutorWithThreadFactory(
+  
conf.getIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_REQUEST_QUEUE), 
threadNameFormat);
+}
+  } finally {
 if (handle != null) {
   handle.releaseLocks();
 }
   }
   // Now, go back to bed until it's time to do this again
   long elapsedTime = System.currentTimeMillis() - startedAt;
-  if (elapsedTime >= cleanerCheckInterval || stop.get())  {
-continue;
-  } else {
+  if (!(elapsedTime >= cleanerCheckInterval || stop.get())) {

Review comment:
   This is outdated. I have added a comment at the right place.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462554)
Time Spent: 9h 40m  (was: 9.5h)

> Parallelise compaction directory cleaning process
> -
>
> Key: HIVE-23324
> URL: https://issues.apache.org/jira/browse/HIVE-23324
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Initiator processes the various compaction candidates in parallel, so we 
> could follow a similar approach in Cleaner where we currently clean the 
> directories sequentially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23910) Some jdbc tests are unstable

2020-07-23 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163550#comment-17163550
 ] 

Zoltan Haindrich commented on HIVE-23910:
-

http://ci.hive.apache.org/job/hive-precommit/job/master/111/

> Some jdbc tests are unstable
> 
>
> Key: HIVE-23910
> URL: https://issues.apache.org/jira/browse/HIVE-23910
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Priority: Major
>
> they fail from time-to-time because of "no such scheme" or "no such table" or 
> similar...
> http://ci.hive.apache.org/job/hive-precommit/job/master/116/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23324) Parallelise compaction directory cleaning process

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23324?focusedWorklogId=462553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462553
 ]

ASF GitHub Bot logged work on HIVE-23324:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:06
Start Date: 23/Jul/20 13:06
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1275:
URL: https://github.com/apache/hive/pull/1275#discussion_r459433801



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##
@@ -87,6 +87,9 @@
   public void run() {
 // Make sure nothing escapes this run method and kills the metastore at 
large,
 // so wrap it in a big catch Throwable statement.
+ExecutorService compactionExecutor = 
CompactorUtil.createExecutorWithThreadFactory(

Review comment:
   Done. 

##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -79,7 +82,9 @@ public void run() {
   cleanerCheckInterval = conf.getTimeVar(
   HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_RUN_INTERVAL, 
TimeUnit.MILLISECONDS);
 }
-
+ExecutorService cleanerExecutor = 
CompactorUtil.createExecutorWithThreadFactory(

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462553)
Time Spent: 9.5h  (was: 9h 20m)

> Parallelise compaction directory cleaning process
> -
>
> Key: HIVE-23324
> URL: https://issues.apache.org/jira/browse/HIVE-23324
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Initiator processes the various compaction candidates in parallel, so we 
> could follow a similar approach in Cleaner where we currently clean the 
> directories sequentially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23324) Parallelise compaction directory cleaning process

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23324?focusedWorklogId=462552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462552
 ]

ASF GitHub Bot logged work on HIVE-23324:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:05
Start Date: 23/Jul/20 13:05
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1275:
URL: https://github.com/apache/hive/pull/1275#discussion_r459433077



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -66,53 +69,62 @@
   private long cleanerCheckInterval = 0;
 
   private ReplChangeManager replChangeManager;
+  private ExecutorService cleanerExecutor;
 
   @Override
   public void init(AtomicBoolean stop) throws Exception {
 super.init(stop);
 replChangeManager = ReplChangeManager.getInstance(conf);
-  }
-
-  @Override
-  public void run() {
 if (cleanerCheckInterval == 0) {
   cleanerCheckInterval = conf.getTimeVar(
-  HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_RUN_INTERVAL, 
TimeUnit.MILLISECONDS);
+  HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_RUN_INTERVAL, 
TimeUnit.MILLISECONDS);
 }
+cleanerExecutor = CompactorUtil.createExecutorWithThreadFactory(
+
conf.getIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_REQUEST_QUEUE),
+COMPACTOR_CLEANER_THREAD_NAME_FORMAT);
+  }
 
-do {
-  TxnStore.MutexAPI.LockHandle handle = null;
-  long startedAt = -1;
-  // Make sure nothing escapes this run method and kills the metastore at 
large,
-  // so wrap it in a big catch Throwable statement.
-  try {
-handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
-startedAt = System.currentTimeMillis();
-long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
-for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
-  clean(compactionInfo, minOpenTxnId);
-}
-  } catch (Throwable t) {
-LOG.error("Caught an exception in the main loop of compactor cleaner, 
" +
-StringUtils.stringifyException(t));
-  }
-  finally {
-if (handle != null) {
-  handle.releaseLocks();
-}
-  }
-  // Now, go back to bed until it's time to do this again
-  long elapsedTime = System.currentTimeMillis() - startedAt;
-  if (elapsedTime >= cleanerCheckInterval || stop.get())  {
-continue;
-  } else {
+  @Override
+  public void run() {
+try {
+  do {
+TxnStore.MutexAPI.LockHandle handle = null;
+long startedAt = -1;
+// Make sure nothing escapes this run method and kills the metastore 
at large,
+// so wrap it in a big catch Throwable statement.
 try {
-  Thread.sleep(cleanerCheckInterval - elapsedTime);
-} catch (InterruptedException ie) {
-  // What can I do about it?
+  handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
+  startedAt = System.currentTimeMillis();
+  long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
+  List cleanerList = new ArrayList<>();
+  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
+clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  }
+  CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
+} catch (Throwable t) {
+  LOG.error("Caught an exception in the main loop of compactor 
cleaner, " +
+  StringUtils.stringifyException(t));
+} finally {
+  if (handle != null) {
+handle.releaseLocks();
+  }
+}
+// Now, go back to bed until it's time to do this again
+long elapsedTime = System.currentTimeMillis() - startedAt;
+if (elapsedTime < cleanerCheckInterval && !stop.get()) {

Review comment:
   @deniskuzZ This condition is modified. The previous comment is there on 
the outdated code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462552)
Time Spent: 9h 20m  (was: 9h 10m)

> Parallelise compaction directory cleaning process
> -
>
> Key: HIVE-23324
> URL: https://issues.apache.org/jira/browse/HIVE-23324
> Project: Hive
>  Issue Type: Improvement
>

[jira] [Commented] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163545#comment-17163545
 ] 

Ádám Szita commented on HIVE-23198:
---

Committed to master. Thanks for the review [~pvary]
 

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23690) TestNegativeCliDriver#[external_jdbc_negative] is flaky

2020-07-23 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163543#comment-17163543
 ] 

Zoltan Haindrich commented on HIVE-23690:
-

http://ci.hive.apache.org/job/hive-precommit/job/master/116/testReport/org.apache.hadoop.hive.cli.split8/TestMiniLlapLocalCliDriver/Testing___split_09___Archive___testCliDriver_external_jdbc_table3_/

> TestNegativeCliDriver#[external_jdbc_negative] is flaky
> ---
>
> Key: HIVE-23690
> URL: https://issues.apache.org/jira/browse/HIVE-23690
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Priority: Major
>
> failed after 10 tries:
> http://130.211.9.232/job/hive-flaky-check/34/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-23198:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462551
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 13:01
Start Date: 23/Jul/20 13:01
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #1288:
URL: https://github.com/apache/hive/pull/1288


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462551)
Time Spent: 1h 50m  (was: 1h 40m)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-07-23 Thread Barnabas Maidics (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23890:

Description: 
New thrift objects would be:


{code:java}
struct GetFileListRequest {
1: optional string catName,
2: required string dbName,
3: required string tableName,
4: required list partVals,
6: optional string validWriteIdList
}

struct GetFileListResponse {
1: required binary fileListData
}
{code}


Where GetFileListResponse contains a binary field, which would be a FlatBuffer 
object

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23863) UGI doAs privilege action to make calls to Ranger Service

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23863?focusedWorklogId=462540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462540
 ]

ASF GitHub Bot logged work on HIVE-23863:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 12:37
Start Date: 23/Jul/20 12:37
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1289:
URL: https://github.com/apache/hive/pull/1289#discussion_r459415724



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -79,6 +80,7 @@ public AtlasDumpTask() {
   @Override
   public int execute() {
 try {
+  SecurityUtils.reloginExpiringKeytabUser();

Review comment:
   Added

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/ReplicationMetricCollector.java
##
@@ -77,6 +77,9 @@ public void reportStageEnd(String stageName, Status status, 
long lastReplId) thr
   Stage stage = progress.getStageByName(stageName);
   stage.setStatus(status);
   stage.setEndTime(System.currentTimeMillis());
+  if (Status.FAILED == status) {

Review comment:
   Added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462540)
Time Spent: 1h  (was: 50m)

> UGI doAs privilege action  to make calls to Ranger Service
> --
>
> Key: HIVE-23863
> URL: https://issues.apache.org/jira/browse/HIVE-23863
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23863.01.patch, HIVE-23863.02.patch, UGI and 
> Replication.pdf
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23483) Remove DynamicSerDe

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23483?focusedWorklogId=462538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462538
 ]

ASF GitHub Bot logged work on HIVE-23483:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 12:35
Start Date: 23/Jul/20 12:35
Worklog Time Spent: 10m 
  Work Description: dlavati opened a new pull request #1207:
URL: https://github.com/apache/hive/pull/1207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462538)
Time Spent: 2h 40m  (was: 2.5h)

> Remove DynamicSerDe
> ---
>
> Key: HIVE-23483
> URL: https://issues.apache.org/jira/browse/HIVE-23483
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> It is used to read thrift data files. AFAIK no one uses thrift for data 
> serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23483) Remove DynamicSerDe

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23483?focusedWorklogId=462537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462537
 ]

ASF GitHub Bot logged work on HIVE-23483:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 12:35
Start Date: 23/Jul/20 12:35
Worklog Time Spent: 10m 
  Work Description: dlavati closed pull request #1207:
URL: https://github.com/apache/hive/pull/1207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462537)
Time Spent: 2.5h  (was: 2h 20m)

> Remove DynamicSerDe
> ---
>
> Key: HIVE-23483
> URL: https://issues.apache.org/jira/browse/HIVE-23483
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> It is used to read thrift data files. AFAIK no one uses thrift for data 
> serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23483) Remove DynamicSerDe

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23483?focusedWorklogId=462539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462539
 ]

ASF GitHub Bot logged work on HIVE-23483:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 12:35
Start Date: 23/Jul/20 12:35
Worklog Time Spent: 10m 
  Work Description: dlavati commented on pull request #1207:
URL: https://github.com/apache/hive/pull/1207#issuecomment-662981396


   +1 flaky: `Testing / split-14 / Archive / 
org.apache.hadoop.hive.cli.split20.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive]`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462539)
Time Spent: 2h 50m  (was: 2h 40m)

> Remove DynamicSerDe
> ---
>
> Key: HIVE-23483
> URL: https://issues.apache.org/jira/browse/HIVE-23483
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> It is used to read thrift data files. AFAIK no one uses thrift for data 
> serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462535
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 12:33
Start Date: 23/Jul/20 12:33
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459413654



##
File path: ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
##
@@ -229,12 +230,59 @@ public String getSingleDbName() {
 }
 
 /**
- * Match a CacheTag to this eviction request.
+ * Match a CacheTag to this eviction request. Must only be used on LLAP 
side only, where the received request may

Review comment:
   Thanks for the clarification!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462535)
Time Spent: 1h 40m  (was: 1.5h)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462530
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 12:30
Start Date: 23/Jul/20 12:30
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459412018



##
File path: ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
##
@@ -229,12 +230,59 @@ public String getSingleDbName() {
 }
 
 /**
- * Match a CacheTag to this eviction request.
+ * Match a CacheTag to this eviction request. Must only be used on LLAP 
side only, where the received request may

Review comment:
   This Request class is used on HS2 side too, and there it may contain 
multiple DBs, hence there is no such 'general' restriction in this class. 
Although we're not doing this right now, but It's not impossible to call this 
method while we have multiple DBs, that's why the javadoc details the 
restriction on this method only.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462530)
Time Spent: 1.5h  (was: 1h 20m)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=462515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462515
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 11:21
Start Date: 23/Jul/20 11:21
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459378925



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
 maxHashTblMemory/1024/1024,
 gcCanary.get() == null ? "dead" : "alive"));
   }
+  int avgAccess = computeAvgAccess();
 
   /* Iterate the global (keywrapper,aggregationbuffers) map and emit
a row for each key */
   Iterator> iter =
   mapKeysAggregationBuffers.entrySet().iterator();
   while(iter.hasNext()) {
 Map.Entry pair = iter.next();
+if (!all && avgAccess >= 1) {
+  // Retain entries when access pattern is > than average access
+  if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
   For rollup, it should be the other way around. If retained entries are 
reset to zero, it could potentially get evicted in next iteration polluting the 
map. I guess the confusion is with strict LRU impl here. Intent is not to 
implement strict LRU, and evict one at time (earlier impl with LinkedHashMap 
followed that, but ends up with L1 cache misses and ends up with heavy object 
tracking). Current implementation is light weight, but does not change the 
pattern of evicting 10% on entries. Just adds logic to retain heavily accessed 
entries. 
   
   Potential further optimization is to reuse the evicted entities and reuse 
them for pooling (e.g for every keywrapper gets cloned internally via copyKey() 
in the map. This causes high mem pressure on certain queries). This can be 
added as an additional optimization in follow up jira.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462515)
Time Spent: 2h  (was: 1h 50m)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23863) UGI doAs privilege action to make calls to Ranger Service

2020-07-23 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163439#comment-17163439
 ] 

Pravin Sinha commented on HIVE-23863:
-

+1

> UGI doAs privilege action  to make calls to Ranger Service
> --
>
> Key: HIVE-23863
> URL: https://issues.apache.org/jira/browse/HIVE-23863
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23863.01.patch, HIVE-23863.02.patch, UGI and 
> Replication.pdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462499
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 11:04
Start Date: 23/Jul/20 11:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459371170



##
File path: ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
##
@@ -229,12 +230,59 @@ public String getSingleDbName() {
 }
 
 /**
- * Match a CacheTag to this eviction request.
+ * Match a CacheTag to this eviction request. Must only be used on LLAP 
side only, where the received request may

Review comment:
   Why is this checked here, and not at construction time, or when we 
receive / parse the request? This seems a little awkward for me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462499)
Time Spent: 1h 20m  (was: 1h 10m)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462495
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 11:01
Start Date: 23/Jul/20 11:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459369843



##
File path: 
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestCacheContentsTracker.java
##
@@ -124,7 +125,7 @@ public void testCacheTagComparison() {
 
   @Test
   public void testEncodingDecoding() throws Exception {
-Map partDescs = new HashMap<>();
+LinkedHashMap partDescs = new LinkedHashMap<>();

Review comment:
   Got it! Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462495)
Time Spent: 1h 10m  (was: 1h)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462480
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 10:13
Start Date: 23/Jul/20 10:13
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459347299



##
File path: ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
##
@@ -229,12 +230,59 @@ public String getSingleDbName() {
 }
 
 /**
- * Match a CacheTag to this eviction request.
+ * Match a CacheTag to this eviction request. Must only be used on LLAP 
side only, where the received request may

Review comment:
   Proactive eviction requests are on a per DB basis: 
https://github.com/apache/hive/blob/master/llap-common/src/protobuf/LlapDaemonProtocol.proto#L234-L238
   This comment is just a heads-up of that fact.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462480)
Time Spent: 1h  (was: 50m)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462479
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 10:10
Start Date: 23/Jul/20 10:10
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459345582



##
File path: 
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestCacheContentsTracker.java
##
@@ -124,7 +125,7 @@ public void testCacheTagComparison() {
 
   @Test
   public void testEncodingDecoding() throws Exception {
-Map partDescs = new HashMap<>();
+LinkedHashMap partDescs = new LinkedHashMap<>();

Review comment:
   That's because (nested) partition specification (e.g. p1=v11,p2=v12) has 
to be sorted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462479)
Time Spent: 50m  (was: 40m)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-23 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23835:

Attachment: HIVE-23835.03.patch

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23835.01.patch, HIVE-23835.02.patch, 
> HIVE-23835.03.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23908) Rewrite plan to join back tables: handle root input is an Aggregate

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23908:
--
Labels: pull-request-available  (was: )

> Rewrite plan to join back tables: handle root input is an Aggregate
> ---
>
> Key: HIVE-23908
> URL: https://issues.apache.org/jira/browse/HIVE-23908
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> EXPLAIN   CBO
> SELECT
>   C_CUSTOMER_ID
> FROM
>   CUSTOMER
> , STORE_SALES
> WHERE
>   C_CUSTOMER_SK   =   SS_CUSTOMER_SK
> GROUP BY
>   C_CUSTOMER_SK
> , C_CUSTOMER_ID
> , C_FIRST_NAME
> , C_LAST_NAME
> , C_PREFERRED_CUST_FLAG
> , C_BIRTH_COUNTRY
> , C_LOGIN
> , C_EMAIL_ADDRESS
> {code}
> {code}
> HiveProject(c_customer_id=[$1])
>   HiveAggregate(group=[{0, 1}])
> HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$5], 
> $f6=[$6], $f7=[$7])
>   HiveJoin(condition=[=($0, $8)], joinType=[inner], algorithm=[none], 
> cost=[not available])
> HiveProject(c_customer_sk=[$0], c_customer_id=[$1], 
> c_first_name=[$8], c_last_name=[$9], c_preferred_cust_flag=[$10], 
> c_birth_country=[$14], c_login=[$15], c_email_address=[$16])
>   HiveTableScan(table=[[default, customer]], table:alias=[customer])
> HiveProject(ss_customer_sk=[$3])
>   HiveFilter(condition=[IS NOT NULL($3)])
> HiveTableScan(table=[[default, store_sales]], 
> table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23908) Rewrite plan to join back tables: handle root input is an Aggregate

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23908?focusedWorklogId=462472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462472
 ]

ASF GitHub Bot logged work on HIVE-23908:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 09:46
Start Date: 23/Jul/20 09:46
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1302:
URL: https://github.com/apache/hive/pull/1302


   Testing done:
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=constraints_optimization.q -pl 
itests/qtest -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462472)
Remaining Estimate: 0h
Time Spent: 10m

> Rewrite plan to join back tables: handle root input is an Aggregate
> ---
>
> Key: HIVE-23908
> URL: https://issues.apache.org/jira/browse/HIVE-23908
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> EXPLAIN   CBO
> SELECT
>   C_CUSTOMER_ID
> FROM
>   CUSTOMER
> , STORE_SALES
> WHERE
>   C_CUSTOMER_SK   =   SS_CUSTOMER_SK
> GROUP BY
>   C_CUSTOMER_SK
> , C_CUSTOMER_ID
> , C_FIRST_NAME
> , C_LAST_NAME
> , C_PREFERRED_CUST_FLAG
> , C_BIRTH_COUNTRY
> , C_LOGIN
> , C_EMAIL_ADDRESS
> {code}
> {code}
> HiveProject(c_customer_id=[$1])
>   HiveAggregate(group=[{0, 1}])
> HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$5], 
> $f6=[$6], $f7=[$7])
>   HiveJoin(condition=[=($0, $8)], joinType=[inner], algorithm=[none], 
> cost=[not available])
> HiveProject(c_customer_sk=[$0], c_customer_id=[$1], 
> c_first_name=[$8], c_last_name=[$9], c_preferred_cust_flag=[$10], 
> c_birth_country=[$14], c_login=[$15], c_email_address=[$16])
>   HiveTableScan(table=[[default, customer]], table:alias=[customer])
> HiveProject(ss_customer_sk=[$3])
>   HiveFilter(condition=[IS NOT NULL($3)])
> HiveTableScan(table=[[default, store_sales]], 
> table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23483) Remove DynamicSerDe

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23483?focusedWorklogId=462465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462465
 ]

ASF GitHub Bot logged work on HIVE-23483:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 09:36
Start Date: 23/Jul/20 09:36
Worklog Time Spent: 10m 
  Work Description: dlavati commented on pull request #1207:
URL: https://github.com/apache/hive/pull/1207#issuecomment-662911600


   I can't repro these failures locally. These tests look flaky:
  
   
   -  Testing / split-19 / Archive / 
org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnMr.testMmMinorCompactionWithSchemaEvolutionAndBuckets
   - Testing / split-16 / Archive / 
org.apache.hadoop.hive.ql.TestWarehouseExternalDir.testExternalDefaultPaths
   - Testing / split-16 / Archive / 
org.apache.hadoop.hive.ql.TestWarehouseExternalDir.
   - Testing / split-14 / Archive / 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_dynamic_partition]



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462465)
Time Spent: 2h  (was: 1h 50m)

> Remove DynamicSerDe
> ---
>
> Key: HIVE-23483
> URL: https://issues.apache.org/jira/browse/HIVE-23483
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> It is used to read thrift data files. AFAIK no one uses thrift for data 
> serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23483) Remove DynamicSerDe

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23483?focusedWorklogId=462466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462466
 ]

ASF GitHub Bot logged work on HIVE-23483:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 09:36
Start Date: 23/Jul/20 09:36
Worklog Time Spent: 10m 
  Work Description: dlavati closed pull request #1207:
URL: https://github.com/apache/hive/pull/1207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462466)
Time Spent: 2h 10m  (was: 2h)

> Remove DynamicSerDe
> ---
>
> Key: HIVE-23483
> URL: https://issues.apache.org/jira/browse/HIVE-23483
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> It is used to read thrift data files. AFAIK no one uses thrift for data 
> serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23483) Remove DynamicSerDe

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23483?focusedWorklogId=462467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462467
 ]

ASF GitHub Bot logged work on HIVE-23483:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 09:36
Start Date: 23/Jul/20 09:36
Worklog Time Spent: 10m 
  Work Description: dlavati opened a new pull request #1207:
URL: https://github.com/apache/hive/pull/1207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462467)
Time Spent: 2h 20m  (was: 2h 10m)

> Remove DynamicSerDe
> ---
>
> Key: HIVE-23483
> URL: https://issues.apache.org/jira/browse/HIVE-23483
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> It is used to read thrift data files. AFAIK no one uses thrift for data 
> serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23877) Hive on Spark incorrect partition pruning ANALYZE TABLE

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23877?focusedWorklogId=462463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462463
 ]

ASF GitHub Bot logged work on HIVE-23877:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 09:31
Start Date: 23/Jul/20 09:31
Worklog Time Spent: 10m 
  Work Description: fornaix commented on pull request #1278:
URL: https://github.com/apache/hive/pull/1278#issuecomment-662909851


   @belugabehr  Thanks for your feedback. Could you help review this pr and let 
me know if you have any questions?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462463)
Time Spent: 0.5h  (was: 20m)

> Hive on Spark incorrect partition pruning ANALYZE TABLE
> ---
>
> Key: HIVE-23877
> URL: https://issues.apache.org/jira/browse/HIVE-23877
> Project: Hive
>  Issue Type: Bug
>Reporter: Han
>Assignee: Han
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Partitions are pruned based on the partition specification in ANALYZE TABLE 
> command and cached in TableSpec.
> When compiling, It's unnecessary to use PartitionPruner.prune() to get 
> partitions again. And PartitionPruner can not prune partitions for ANALYZE 
> TABLE command, so it will get all partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23877) Hive on Spark incorrect partition pruning ANALYZE TABLE

2020-07-23 Thread Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han updated HIVE-23877:
---
Target Version/s: 3.1.2

> Hive on Spark incorrect partition pruning ANALYZE TABLE
> ---
>
> Key: HIVE-23877
> URL: https://issues.apache.org/jira/browse/HIVE-23877
> Project: Hive
>  Issue Type: Bug
>Reporter: Han
>Assignee: Han
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Partitions are pruned based on the partition specification in ANALYZE TABLE 
> command and cached in TableSpec.
> When compiling, It's unnecessary to use PartitionPruner.prune() to get 
> partitions again. And PartitionPruner can not prune partitions for ANALYZE 
> TABLE command, so it will get all partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462452
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 09:03
Start Date: 23/Jul/20 09:03
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459310372



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
##
@@ -74,7 +78,14 @@ public HiveJoinAddNotNullRule(Class clazz,
   @Override
   public void onMatch(RelOptRuleCall call) {
 Join join = call.rel(0);
-if (join.getJoinType() == JoinRelType.FULL || 
join.getCondition().isAlwaysTrue()) {
+
+// For anti join case add the not null on right side if the condition is

Review comment:
   Thanks! Makes sense now





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462452)
Time Spent: 7h 40m  (was: 7.5h)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462450
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 09:00
Start Date: 23/Jul/20 09:00
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459308986



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java
##
@@ -339,6 +339,12 @@ String getFuncText(String funcText, final int srcPos) {
   vector.add(right, left);
   break;
 case JoinDesc.LEFT_OUTER_JOIN:
+case JoinDesc.ANTI_JOIN:
+//TODO : In case of anti join, bloom filter can be created on left 
side also ("IN (keylist right table)").
+// But the filter should be "not-in" ("NOT IN (keylist right table)") 
as we want to select the records from
+// left side which are not present in the right side. But it may cause 
wrong result as
+// bloom filter may have false positive and thus simply adding not is 
not correct,
+// special handling is required for "NOT IN".

Review comment:
   Thank Mahesh! Has this in the back of my head for a while -- this will 
be useful for a bunch of cases including anti-joins

##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java
##
@@ -339,6 +339,12 @@ String getFuncText(String funcText, final int srcPos) {
   vector.add(right, left);
   break;
 case JoinDesc.LEFT_OUTER_JOIN:
+case JoinDesc.ANTI_JOIN:
+//TODO : In case of anti join, bloom filter can be created on left 
side also ("IN (keylist right table)").
+// But the filter should be "not-in" ("NOT IN (keylist right table)") 
as we want to select the records from
+// left side which are not present in the right side. But it may cause 
wrong result as
+// bloom filter may have false positive and thus simply adding not is 
not correct,
+// special handling is required for "NOT IN".

Review comment:
   Thank Mahesh! Had this in the back of my head for a while -- this will 
be useful for a bunch of cases including anti-joins





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462450)
Time Spent: 7.5h  (was: 7h 20m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 

[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462448
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 08:58
Start Date: 23/Jul/20 08:58
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459307921



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinAntiJoinLongOperator.java
##
@@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.mapjoin;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.JoinUtil;
+import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinLongHashSet;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;
+import org.apache.hadoop.hive.ql.plan.VectorDesc;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+
+// TODO : Duplicate codes need to merge with semi join.
+// Single-Column Long hash table import.
+// Single-Column Long specific imports.
+
+/*
+ * Specialized class for doing a vectorized map join that is an anti join on a 
Single-Column Long
+ * using a hash set.
+ */
+public class VectorMapJoinAntiJoinLongOperator extends 
VectorMapJoinAntiJoinGenerateResultOperator {
+
+  private static final long serialVersionUID = 1L;
+  private static final String CLASS_NAME = 
VectorMapJoinAntiJoinLongOperator.class.getName();
+  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  protected String getLoggingPrefix() {
+return super.getLoggingPrefix(CLASS_NAME);
+  }
+
+  // The above members are initialized by the constructor and must not be
+  // transient.
+
+  // The hash map for this specialized class.
+  private transient VectorMapJoinLongHashSet hashSet;
+
+  // Single-Column Long specific members.
+  // For integers, we have optional min/max filtering.
+  private transient boolean useMinMax;
+  private transient long min;
+  private transient long max;
+
+  // The column number for this one column join specialization.
+  private transient int singleJoinColumn;
+
+  // Pass-thru constructors.
+  /** Kryo ctor. */
+  protected VectorMapJoinAntiJoinLongOperator() {
+super();
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx) {
+super(ctx);
+  }
+
+  public VectorMapJoinAntiJoinLongOperator(CompilationOpContext ctx, 
OperatorDesc conf,
+   VectorizationContext vContext, 
VectorDesc vectorDesc) throws HiveException {
+super(ctx, conf, vContext, vectorDesc);
+  }
+
+  // Process Single-Column Long Anti Join on a vectorized row batch.
+  @Override
+  protected void commonSetup() throws HiveException {
+super.commonSetup();
+
+// Initialize Single-Column Long members for this specialized class.
+singleJoinColumn = bigTableKeyColumnMap[0];
+  }
+
+  @Override
+  public void hashTableSetup() throws HiveException {
+super.hashTableSetup();
+
+// Get our Single-Column Long hash set information for this specialized 
class.
+hashSet = (VectorMapJoinLongHashSet) vectorMapJoinHashTable;
+useMinMax = hashSet.useMinMax();
+if (useMinMax) {
+  min = hashSet.min();
+  max = hashSet.max();
+}
+  }
+
+  @Override
+  public void processBatch(VectorizedRowBatch batch) throws HiveException {
+
+try {
+  // (Currently none)
+  // antiPerBatchSetup(batch);
+
+  // For anti joins, we may apply the filter(s) now.
+  for(VectorExpression ve : bigTableFilterExpressions) {
+ve.evaluate(batch);
+  }
+
+

[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462446
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 08:57
Start Date: 23/Jul/20 08:57
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459307547



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
##
@@ -523,11 +533,19 @@ private boolean createForwardJoinObject(boolean[] skip) 
throws HiveException {
 forward = true;
   }
 }
+return forward;
+  }
+
+  // returns whether a record was forwarded
+  private boolean createForwardJoinObject(boolean[] skip, boolean antiJoin) 
throws HiveException {
+boolean forward = fillFwdCache(skip);
 if (forward) {
   if (needsPostEvaluation) {
 forward = !JoinUtil.isFiltered(forwardCache, residualJoinFilters, 
residualJoinFiltersOIs);
   }
-  if (forward) {
+
+  // For anti join, check all right side and if nothing is matched then 
only forward.

Review comment:
   Ok makes sense now -- so maybe we should just mention that for anti-join 
we dont forward at this point





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462446)
Time Spent: 7h 10m  (was: 7h)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462439
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 08:38
Start Date: 23/Jul/20 08:38
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459296602



##
File path: 
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestCacheContentsTracker.java
##
@@ -124,7 +125,7 @@ public void testCacheTagComparison() {
 
   @Test
   public void testEncodingDecoding() throws Exception {
-Map partDescs = new HashMap<>();
+LinkedHashMap partDescs = new LinkedHashMap<>();

Review comment:
   Could you please explain why we need LinkedHashMap?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462439)
Time Spent: 40m  (was: 0.5h)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462437
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 08:34
Start Date: 23/Jul/20 08:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459294566



##
File path: ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
##
@@ -229,12 +230,59 @@ public String getSingleDbName() {
 }
 
 /**
- * Match a CacheTag to this eviction request.
+ * Match a CacheTag to this eviction request. Must only be used on LLAP 
side only, where the received request may

Review comment:
   I do not get this comment. "request may only contain one information for 
one DB".
   Could you please elaborate?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462437)
Time Spent: 0.5h  (was: 20m)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23198) Add matching logic between CacheTags and proactive eviction requests

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23198?focusedWorklogId=462436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462436
 ]

ASF GitHub Bot logged work on HIVE-23198:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 08:34
Start Date: 23/Jul/20 08:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1288:
URL: https://github.com/apache/hive/pull/1288#discussion_r459294566



##
File path: ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
##
@@ -229,12 +230,59 @@ public String getSingleDbName() {
 }
 
 /**
- * Match a CacheTag to this eviction request.
+ * Match a CacheTag to this eviction request. Must only be used on LLAP 
side only, where the received request may

Review comment:
   I do not get this comment. "request may only contain one information for 
one DB".
   Could you please ellaborate?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462436)
Time Spent: 20m  (was: 10m)

> Add matching logic between CacheTags and proactive eviction requests
> 
>
> Key: HIVE-23198
> URL: https://issues.apache.org/jira/browse/HIVE-23198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Implement ProactiveEviction$Request#isTagMatch so that LLAP can evict buffers 
> based on their tags matching incoming eviction requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23908) Rewrite plan to join back tables: handle root input is an Aggregate

2020-07-23 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-23908:
-


> Rewrite plan to join back tables: handle root input is an Aggregate
> ---
>
> Key: HIVE-23908
> URL: https://issues.apache.org/jira/browse/HIVE-23908
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> EXPLAIN   CBO
> SELECT
>   C_CUSTOMER_ID
> FROM
>   CUSTOMER
> , STORE_SALES
> WHERE
>   C_CUSTOMER_SK   =   SS_CUSTOMER_SK
> GROUP BY
>   C_CUSTOMER_SK
> , C_CUSTOMER_ID
> , C_FIRST_NAME
> , C_LAST_NAME
> , C_PREFERRED_CUST_FLAG
> , C_BIRTH_COUNTRY
> , C_LOGIN
> , C_EMAIL_ADDRESS
> {code}
> {code}
> HiveProject(c_customer_id=[$1])
>   HiveAggregate(group=[{0, 1}])
> HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$5], 
> $f6=[$6], $f7=[$7])
>   HiveJoin(condition=[=($0, $8)], joinType=[inner], algorithm=[none], 
> cost=[not available])
> HiveProject(c_customer_sk=[$0], c_customer_id=[$1], 
> c_first_name=[$8], c_last_name=[$9], c_preferred_cust_flag=[$10], 
> c_birth_country=[$14], c_login=[$15], c_email_address=[$16])
>   HiveTableScan(table=[[default, customer]], table:alias=[customer])
> HiveProject(ss_customer_sk=[$3])
>   HiveFilter(condition=[IS NOT NULL($3)])
> HiveTableScan(table=[[default, store_sales]], 
> table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=462419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462419
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 07:13
Start Date: 23/Jul/20 07:13
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1271:
URL: https://github.com/apache/hive/pull/1271#discussion_r459254666



##
File path: standalone-metastore/metastore-server/pom.xml
##
@@ -204,6 +204,11 @@
   hive-storage-api
   ${storage-api.version}
 
+
+  org.apache.hive
+  hive-serde

Review comment:
   Fixed!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462419)
Time Spent: 1h 10m  (was: 1h)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is 

[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=462417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462417
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 23/Jul/20 07:09
Start Date: 23/Jul/20 07:09
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r459253392



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
##
@@ -747,7 +747,7 @@ public static RewritablePKFKJoinInfo 
isRewritablePKFKJoin(Join join,
 final RelNode nonFkInput = leftInputPotentialFK ? join.getRight() : 
join.getLeft();
 final RewritablePKFKJoinInfo nonRewritable = 
RewritablePKFKJoinInfo.of(false, null);
 
-if (joinType != JoinRelType.INNER && !join.isSemiJoin()) {
+if (joinType != JoinRelType.INNER && !join.isSemiJoin() && joinType != 
JoinRelType.ANTI) {

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23906





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 462417)
Time Spent: 7h  (was: 6h 50m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23907) Hash table type should be considered for calculating the Map join table size

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23907:
---
Description: 
For some join like Anti join and Semi join , hash set is used instead of a hash 
table. This is done as these joins do not emit the right side columns and just 
an existence check is enough for join.  When we check for the  table size , 
during map join conversion , this info is not considered. The hash table size 
for these join will be considerably small and thus hash table for bigger table 
can fit into memory.

 

  was:
For Anti Join, we emit the records if the join condition does not satisfies. In 
case of PK-FK rule we have to explore if this can be exploited to speed up Anti 
Join processing.

 


> Hash table type should be considered for calculating the Map join table size
> 
>
> Key: HIVE-23907
> URL: https://issues.apache.org/jira/browse/HIVE-23907
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For some join like Anti join and Semi join , hash set is used instead of a 
> hash table. This is done as these joins do not emit the right side columns 
> and just an existence check is enough for join.  When we check for the  table 
> size , during map join conversion , this info is not considered. The hash 
> table size for these join will be considerably small and thus hash table for 
> bigger table can fit into memory.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >