[jira] [Work logged] (HIVE-23998) Upgrave Guava to 27 for Hive 2.3

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23998?focusedWorklogId=468147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468147
 ]

ASF GitHub Bot logged work on HIVE-23998:
-

Author: ASF GitHub Bot
Created on: 08/Aug/20 02:04
Start Date: 08/Aug/20 02:04
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1365:
URL: https://github.com/apache/hive/pull/1365#issuecomment-670807908


   It seems the patch approach also doesn't work. ping @pvary @kgyrtkirk : do 
you know a way to trigger tests? thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468147)
Time Spent: 2h 10m  (was: 2h)

> Upgrave Guava to 27 for Hive 2.3
> 
>
> Key: HIVE-23998
> URL: https://issues.apache.org/jira/browse/HIVE-23998
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23998.01.branch-2.3.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Try to upgrade Guava to 27.0-jre for Hive 2.3 branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23716) Support Anti Join in Hive

2020-08-07 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-23716:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master. This is great, thanks [~maheshk114]!

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=468139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468139
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 08/Aug/20 01:22
Start Date: 08/Aug/20 01:22
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1147:
URL: https://github.com/apache/hive/pull/1147


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468139)
Time Spent: 16h 50m  (was: 16h 40m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=468138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468138
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 08/Aug/20 01:19
Start Date: 08/Aug/20 01:19
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r467343664



##
File path: 
ql/src/test/results/clientpositive/perf/tez/cbo_query16_anti_join.q.out
##
@@ -0,0 +1,99 @@
+PREHOOK: query: explain cbo
+select
+   count(distinct cs_order_number) as `order count`
+  ,sum(cs_ext_ship_cost) as `total shipping cost`
+  ,sum(cs_net_profit) as `total net profit`
+from
+   catalog_sales cs1
+  ,date_dim
+  ,customer_address
+  ,call_center
+where
+d_date between '2001-4-01' and
+   (cast('2001-4-01' as date) + 60 days)
+and cs1.cs_ship_date_sk = d_date_sk
+and cs1.cs_ship_addr_sk = ca_address_sk
+and ca_state = 'NY'
+and cs1.cs_call_center_sk = cc_call_center_sk
+and cc_county in ('Ziebach County','Levy County','Huron County','Franklin 
Parish',
+  'Daviess County'
+)
+and exists (select *
+from catalog_sales cs2
+where cs1.cs_order_number = cs2.cs_order_number
+  and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
+and not exists(select *
+   from catalog_returns cr1
+   where cs1.cs_order_number = cr1.cr_order_number)
+order by count(distinct cs_order_number)
+limit 100
+PREHOOK: type: QUERY
+PREHOOK: Input: default@call_center
+PREHOOK: Input: default@catalog_returns
+PREHOOK: Input: default@catalog_sales
+PREHOOK: Input: default@customer_address
+PREHOOK: Input: default@date_dim
+PREHOOK: Output: hdfs://### HDFS PATH ###
+POSTHOOK: query: explain cbo
+select
+   count(distinct cs_order_number) as `order count`
+  ,sum(cs_ext_ship_cost) as `total shipping cost`
+  ,sum(cs_net_profit) as `total net profit`
+from
+   catalog_sales cs1
+  ,date_dim
+  ,customer_address
+  ,call_center
+where
+d_date between '2001-4-01' and
+   (cast('2001-4-01' as date) + 60 days)
+and cs1.cs_ship_date_sk = d_date_sk
+and cs1.cs_ship_addr_sk = ca_address_sk
+and ca_state = 'NY'
+and cs1.cs_call_center_sk = cc_call_center_sk
+and cc_county in ('Ziebach County','Levy County','Huron County','Franklin 
Parish',
+  'Daviess County'
+)
+and exists (select *
+from catalog_sales cs2
+where cs1.cs_order_number = cs2.cs_order_number
+  and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
+and not exists(select *
+   from catalog_returns cr1
+   where cs1.cs_order_number = cr1.cr_order_number)
+order by count(distinct cs_order_number)
+limit 100
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@call_center
+POSTHOOK: Input: default@catalog_returns
+POSTHOOK: Input: default@catalog_sales
+POSTHOOK: Input: default@customer_address
+POSTHOOK: Input: default@date_dim
+POSTHOOK: Output: hdfs://### HDFS PATH ###
+CBO PLAN:
+HiveAggregate(group=[{}], agg#0=[count(DISTINCT $4)], agg#1=[sum($5)], 
agg#2=[sum($6)])
+  HiveJoin(condition=[=($4, $14)], joinType=[anti], algorithm=[none], 
cost=[not available])

Review comment:
   Do we have a JIRA to explore this optimization?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468138)
Time Spent: 16h 40m  (was: 16.5h)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This 

[jira] [Commented] (HIVE-23716) Support Anti Join in Hive

2020-08-07 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173532#comment-17173532
 ] 

Jesus Camacho Rodriguez commented on HIVE-23716:


+1

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23514) Add Atlas metadata replication metrics

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23514?focusedWorklogId=468128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468128
 ]

ASF GitHub Bot logged work on HIVE-23514:
-

Author: ASF GitHub Bot
Created on: 08/Aug/20 00:37
Start Date: 08/Aug/20 00:37
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1040:
URL: https://github.com/apache/hive/pull/1040


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468128)
Time Spent: 1h 20m  (was: 1h 10m)

> Add Atlas metadata replication metrics
> --
>
> Key: HIVE-23514
> URL: https://issues.apache.org/jira/browse/HIVE-23514
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23514.01.patch, HIVE-23514.02.patch, 
> HIVE-23514.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21196) Support semijoin reduction on multiple column join

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21196?focusedWorklogId=468103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468103
 ]

ASF GitHub Bot logged work on HIVE-21196:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 22:56
Start Date: 07/Aug/20 22:56
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r467319212



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 

[jira] [Work logged] (HIVE-21196) Support semijoin reduction on multiple column join

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21196?focusedWorklogId=468101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468101
 ]

ASF GitHub Bot logged work on HIVE-21196:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 22:55
Start Date: 07/Aug/20 22:55
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r467318976



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 

[jira] [Work logged] (HIVE-21196) Support semijoin reduction on multiple column join

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21196?focusedWorklogId=468100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468100
 ]

ASF GitHub Bot logged work on HIVE-21196:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 22:53
Start Date: 07/Aug/20 22:53
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r467318625



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 

[jira] [Work logged] (HIVE-21196) Support semijoin reduction on multiple column join

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21196?focusedWorklogId=468099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468099
 ]

ASF GitHub Bot logged work on HIVE-21196:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 22:52
Start Date: 07/Aug/20 22:52
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1325:
URL: https://github.com/apache/hive/pull/1325#discussion_r467318270



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SemiJoinReductionMerge.java
##
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.exec.FilterOperator;
+import org.apache.hadoop.hive.ql.exec.GroupByOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.OperatorFactory;
+import org.apache.hadoop.hive.ql.exec.OperatorUtils;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.RowSchema;
+import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.parse.GenTezUtils;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.RuntimeValuesInfo;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.AggregationDesc;
+import org.apache.hadoop.hive.ql.plan.DynamicValue;
+import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDynamicValueDesc;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
+import org.apache.hadoop.hive.ql.plan.FilterDesc;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
+import org.apache.hadoop.hive.ql.plan.SelectDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFInBloomFilter;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFMurmurHash;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd;
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Deque;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+public class SemiJoinReductionMerge extends Transform {
+
+  public ParseContext transform(ParseContext parseContext) throws 
SemanticException {
+Map map = 
parseContext.getRsToSemiJoinBranchInfo();
+if (map.isEmpty()) {
+  return parseContext;
+}
+HiveConf hiveConf = parseContext.getConf();
+
+// Order does not really matter but it is necessary to keep plans stable
+SortedMap> sameTableSJ =
+new TreeMap<>(Comparator.comparing(SJSourceTarget::toString));
+for (Map.Entry smjEntry : 
map.entrySet()) {
+  TableScanOperator ts = smjEntry.getValue().getTsOp();
+  // Semijoin optimization branch should look like 

[jira] [Assigned] (HIVE-24018) Review necessity of AggregationDesc#setGenericUDAFWritableEvaluator for bloom filter aggregations

2020-08-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24018:
--


> Review necessity of AggregationDesc#setGenericUDAFWritableEvaluator for bloom 
> filter aggregations
> -
>
> Key: HIVE-24018
> URL: https://issues.apache.org/jira/browse/HIVE-24018
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Minor
>
> Few places in the code have following pattern 
> {code:java}
> GenericUDAFBloomFilterEvaluator bloomFilterEval = new 
> GenericUDAFBloomFilterEvaluator();
> ...
> AggregationDesc bloom = new AggregationDesc("bloom_filter", bloomFilterEval, 
> p, false, mode);
> bloom.setGenericUDAFWritableEvaluator(bloomFilterEval);
> {code}
> where the bloom filter evaluator is passed in the constructor of the 
> aggregation and  directly after using a setter. The use of the setter is 
> necessary otherwise there are runtime failures of the query however the 
> pattern is a bit confusing. 
> Investigate if there is a way to avoid the double passing of the evaluator. 
> To reproduce the failure remove the setter and run the following test.
> {noformat}
> mvn test -Dtest=TestMiniLlapLocalCliDriver 
> -Dqfile=vectorized_dynamic_semijoin_reduction.q -Dtest.output.overwrite 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23821) Send tableId in request for all the new HMS get_partition APIs

2020-08-07 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das resolved HIVE-23821.
---
Resolution: Fixed

> Send tableId in request for all the new HMS get_partition APIs
> --
>
> Key: HIVE-23821
> URL: https://issues.apache.org/jira/browse/HIVE-23821
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> Table Id is needed on the HMS side to distinguish tables which have been 
> dropped [renamed] and recreated with the same name. In such case the tableId 
> is different and is used to not use the cached object to return the response.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24011) Flaky test AsyncResponseHandlerTest

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24011?focusedWorklogId=468027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468027
 ]

ASF GitHub Bot logged work on HIVE-24011:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 21:13
Start Date: 07/Aug/20 21:13
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1381:
URL: https://github.com/apache/hive/pull/1381


   Timeout is too low. Also retry logic could cause 
"java.lang.IllegalArgumentException: timeout value is negative"
   
   Change-Id: I3b40dad06889e0e6582d61a8f031fbaba6d06cc3
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468027)
Remaining Estimate: 0h
Time Spent: 10m

> Flaky test AsyncResponseHandlerTest
> ---
>
> Key: HIVE-24011
> URL: https://issues.apache.org/jira/browse/HIVE-24011
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Mustafa Iman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-precommit/job/PR-1343/6/testReport/junit/org.apache.hadoop.hive.llap/AsyncResponseHandlerTest/Testing___split_01___Archive___testStress/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24011) Flaky test AsyncResponseHandlerTest

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24011:
--
Labels: pull-request-available  (was: )

> Flaky test AsyncResponseHandlerTest
> ---
>
> Key: HIVE-24011
> URL: https://issues.apache.org/jira/browse/HIVE-24011
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-precommit/job/PR-1343/6/testReport/junit/org.apache.hadoop.hive.llap/AsyncResponseHandlerTest/Testing___split_01___Archive___testStress/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24011) Flaky test AsyncResponseHandlerTest

2020-08-07 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24011 started by Mustafa Iman.
---
> Flaky test AsyncResponseHandlerTest
> ---
>
> Key: HIVE-24011
> URL: https://issues.apache.org/jira/browse/HIVE-24011
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Mustafa Iman
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/PR-1343/6/testReport/junit/org.apache.hadoop.hive.llap/AsyncResponseHandlerTest/Testing___split_01___Archive___testStress/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24012) Support for rewriting with materialized views containing grouping sets

2020-08-07 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173468#comment-17173468
 ] 

Vineet Garg commented on HIVE-24012:


Left a minor comment. +1.

> Support for rewriting with materialized views containing grouping sets
> --
>
> Key: HIVE-24012
> URL: https://issues.apache.org/jira/browse/HIVE-24012
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Rewriting is not triggered for materialized views containing grouping sets. 
> This issue implements an extension from Hive side to trigger additional 
> rewritings for materialized views containing grouping sets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23821) Send tableId in request for all the new HMS get_partition APIs

2020-08-07 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-23821:
---
Summary: Send tableId in request for all the new HMS get_partition APIs  
(was: Send tableId in request for all the new HMS get_parition APIs)

> Send tableId in request for all the new HMS get_partition APIs
> --
>
> Key: HIVE-23821
> URL: https://issues.apache.org/jira/browse/HIVE-23821
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> Table Id is needed on the HMS side to distinguish tables which have been 
> dropped [renamed] and recreated with the same name. In such case the tableId 
> is different and is used to not use the cached object to return the response.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24012) Support for rewriting with materialized views containing grouping sets

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24012?focusedWorklogId=468020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468020
 ]

ASF GitHub Bot logged work on HIVE-24012:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 21:00
Start Date: 07/Aug/20 21:00
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1374:
URL: https://github.com/apache/hive/pull/1374#discussion_r467263500



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveMaterializedViewUtils.java
##
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import com.google.common.collect.ImmutableList;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import org.apache.calcite.adapter.druid.DruidQuery;
+import org.apache.calcite.interpreter.BindableConvention;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptMaterialization;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.hep.HepPlanner;
+import org.apache.calcite.plan.hep.HepProgramBuilder;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelVisitor;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.Aggregate.Group;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.apache.hadoop.hive.common.ValidTxnWriteIdList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CreationMetadata;
+import org.apache.hadoop.hive.ql.lockmgr.LockException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveGroupingID;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveRelNode;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+import org.apache.hadoop.hive.ql.parse.DruidSqlOperatorConverter;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.hive.common.util.TxnIdUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static 
org.apache.hadoop.hive.conf.Constants.MATERIALIZED_VIEW_REWRITING_TIME_WINDOW;
+
+public class HiveMaterializedViewUtils {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveMaterializedViewUtils.class);
+
+
+  private HiveMaterializedViewUtils() {}
+
+  public static Table extractTable(RelOptMaterialization materialization) {
+RelOptHiveTable cachedMaterializedViewTable;
+if (materialization.tableRel instanceof Project) {
+  // There is a Project on top (due to nullability)
+  cachedMaterializedViewTable = (RelOptHiveTable) 
materialization.tableRel.getInput(0).getTable();
+} else {
+  cachedMaterializedViewTable = (RelOptHiveTable) 
materialization.tableRel.getTable();
+}
+return cachedMaterializedViewTable.getHiveTableMD();
+  }
+
+  /**
+   * Utility method that returns whether a materialized view is outdated 
(true), not outdated
+   * (false), or it cannot be determined (null). The latest case may happen 
e.g. when the
+   * materialized view definition uses external tables.
+   */
+  public static Boolean 

[jira] [Updated] (HIVE-23821) Send tableId in request for all the new HMS get_parition APIs

2020-08-07 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-23821:
---
Summary: Send tableId in request for all the new HMS get_parition APIs  
(was: [HS2] Send tableId in request for all the new HMS get_parition_* APIs 
that are in request/response form)

> Send tableId in request for all the new HMS get_parition APIs
> -
>
> Key: HIVE-23821
> URL: https://issues.apache.org/jira/browse/HIVE-23821
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23821) Send tableId in request for all the new HMS get_parition APIs

2020-08-07 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-23821:
---
Description: Table Id is needed on the HMS side to distinguish tables which 
have been dropped [renamed] and recreated with the same name. In such case the 
tableId is different and is used to not use the cached object to return the 
response.

> Send tableId in request for all the new HMS get_parition APIs
> -
>
> Key: HIVE-23821
> URL: https://issues.apache.org/jira/browse/HIVE-23821
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> Table Id is needed on the HMS side to distinguish tables which have been 
> dropped [renamed] and recreated with the same name. In such case the tableId 
> is different and is used to not use the cached object to return the response.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24011) Flaky test AsyncResponseHandlerTest

2020-08-07 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-24011:
---

Assignee: Mustafa Iman

> Flaky test AsyncResponseHandlerTest
> ---
>
> Key: HIVE-24011
> URL: https://issues.apache.org/jira/browse/HIVE-24011
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Mustafa Iman
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/PR-1343/6/testReport/junit/org.apache.hadoop.hive.llap/AsyncResponseHandlerTest/Testing___split_01___Archive___testStress/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24012) Support for rewriting with materialized views containing grouping sets

2020-08-07 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173414#comment-17173414
 ] 

Jesus Camacho Rodriguez commented on HIVE-24012:


[~vgarg], [~kkasa], could you review? Thanks

> Support for rewriting with materialized views containing grouping sets
> --
>
> Key: HIVE-24012
> URL: https://issues.apache.org/jira/browse/HIVE-24012
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rewriting is not triggered for materialized views containing grouping sets. 
> This issue implements an extension from Hive side to trigger additional 
> rewritings for materialized views containing grouping sets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23935) Fetching primaryKey through beeline fails with NPE

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23935?focusedWorklogId=467914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467914
 ]

ASF GitHub Bot logged work on HIVE-23935:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 16:35
Start Date: 07/Aug/20 16:35
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #1321:
URL: https://github.com/apache/hive/pull/1321#issuecomment-670603579


   @kgyrtkirk @belugabehr can you help review!!!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467914)
Time Spent: 0.5h  (was: 20m)

> Fetching primaryKey through beeline fails with NPE
> --
>
> Key: HIVE-23935
> URL: https://issues.apache.org/jira/browse/HIVE-23935
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Fetching PrimaryKey of a table through Beeline !primarykey fails with NPE
> {noformat}
> 0: jdbc:hive2://localhost:1> !primarykeys Persons
> Error: MetaException(message:java.lang.NullPointerException) (state=,code=0)
> org.apache.hive.service.cli.HiveSQLException: 
> MetaException(message:java.lang.NullPointerException)
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:360)
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:351)
>   at 
> org.apache.hive.jdbc.HiveDatabaseMetaData.getPrimaryKeys(HiveDatabaseMetaData.java:573)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hive.beeline.Reflector.invoke(Reflector.java:89)
>   at org.apache.hive.beeline.Commands.metadata(Commands.java:125)
>   at org.apache.hive.beeline.Commands.primarykeys(Commands.java:231)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:57)
>   at 
> org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1465)
>   at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1504)
>   at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1364)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1134)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082)
>   at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546)
>   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:236){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23958) HiveServer2 should support additional keystore/truststores types besides JKS

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23958?focusedWorklogId=467907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467907
 ]

ASF GitHub Bot logged work on HIVE-23958:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 15:52
Start Date: 07/Aug/20 15:52
Worklog Time Spent: 10m 
  Work Description: risdenk commented on pull request #1342:
URL: https://github.com/apache/hive/pull/1342#issuecomment-670584707


   > The HttpServer has a Builder class that does not have support for setting 
the keystore type. Should we add a method to the builder to be able to build 
with a KS type and/or automatically set the default keystore when the 
KeystorePath/KeystorePassword is set?
   
   This could be a separate change - I don't want to change it here since it 
didn't seem to have any benefit. The builder isn't used in this code path and 
instead it is hardcoded to `JKS` inside of Jetty and Thrift. This change passes 
the JDK preferred keystore type instead of relying on `JKS` being hardcoded by 
libraries.
   
   I didn't feel it was necessary to expose this as another config option to 
add to hive-site.xml - since the JDK already has a way to configure this with 
the `keystore.type` config in the JDK. Since it hasn't come up previously, I'm 
assuming that no one has tried to change the keystore type in HS2 and so it 
doesn't need a Hive specific config today. This change doesn't stop someone 
from adding a config down the line if necessary.
   
   > HiveServer2 class has some SSL settings for WebUI stuff. Should the 
keystore type also be set here?
   
   The HiveServer2 class eventually falls back to HttpServer to build the Jetty 
server - so this change covers both the WebUI and other usages by Hive. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467907)
Time Spent: 0.5h  (was: 20m)

> HiveServer2 should support additional keystore/truststores types besides JKS
> 
>
> Key: HIVE-23958
> URL: https://issues.apache.org/jira/browse/HIVE-23958
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently HiveServer2 (through Jetty and Thrift) only supports JKS (and 
> PCKS12 based on JDK fallback) keystore/truststore types. There are additional 
> keystore/truststore types used for different applications like for FIPS 
> crypto algorithms. HS2 should support the default keystore type specified for 
> the JDK and not always use JKS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23963) UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule

2020-08-07 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23963.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~kkasa]!

> UnsupportedOperationException in queries 74 and 84 while applying 
> HiveCardinalityPreservingJoinRule
> ---
>
> Key: HIVE-23963
> URL: https://issues.apache.org/jira/browse/HIVE-23963
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: cbo_query74_stacktrace.txt, cbo_query84_stacktrace.txt
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The following TPC-DS queries: 
> * cbo_query74.q
> * cbo_query84.q 
> * query74.q 
> * query84.q 
> fail on the metastore with the partitioned TPC-DS 30TB dataset.
> The stacktraces for cbo_query74 and cbo_query84 show that the problem 
> originates while applying HiveCardinalityPreservingJoinRule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23963) UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23963?focusedWorklogId=467881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467881
 ]

ASF GitHub Bot logged work on HIVE-23963:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 15:09
Start Date: 07/Aug/20 15:09
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1357:
URL: https://github.com/apache/hive/pull/1357


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467881)
Time Spent: 1h 10m  (was: 1h)

> UnsupportedOperationException in queries 74 and 84 while applying 
> HiveCardinalityPreservingJoinRule
> ---
>
> Key: HIVE-23963
> URL: https://issues.apache.org/jira/browse/HIVE-23963
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: cbo_query74_stacktrace.txt, cbo_query84_stacktrace.txt
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The following TPC-DS queries: 
> * cbo_query74.q
> * cbo_query84.q 
> * query74.q 
> * query84.q 
> fail on the metastore with the partitioned TPC-DS 30TB dataset.
> The stacktraces for cbo_query74 and cbo_query84 show that the problem 
> originates while applying HiveCardinalityPreservingJoinRule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23408:
--
Labels: pull-request-available  (was: )

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23958) HiveServer2 should support additional keystore/truststores types besides JKS

2020-08-07 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173185#comment-17173185
 ] 

Naveen Gangam commented on HIVE-23958:
--

[~krisden] Just posted a couple of comments to the PR. Could you please review 
them? Thanks

> HiveServer2 should support additional keystore/truststores types besides JKS
> 
>
> Key: HIVE-23958
> URL: https://issues.apache.org/jira/browse/HIVE-23958
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently HiveServer2 (through Jetty and Thrift) only supports JKS (and 
> PCKS12 based on JDK fallback) keystore/truststore types. There are additional 
> keystore/truststore types used for different applications like for FIPS 
> crypto algorithms. HS2 should support the default keystore type specified for 
> the JDK and not always use JKS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=467874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467874
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 14:54
Start Date: 07/Aug/20 14:54
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1379:
URL: https://github.com/apache/hive/pull/1379


   …ronment
   
   Change-Id: I486e9169279765b8a695d27264b770c8db92128f
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467874)
Remaining Estimate: 0h
Time Spent: 10m

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-08-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23408 started by László Bodor.
---
> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24014) Need to delete DumpDirectoryCleanerTask

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24014?focusedWorklogId=467863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467863
 ]

ASF GitHub Bot logged work on HIVE-24014:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 14:03
Start Date: 07/Aug/20 14:03
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1378:
URL: https://github.com/apache/hive/pull/1378


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467863)
Time Spent: 0.5h  (was: 20m)

> Need to delete DumpDirectoryCleanerTask
> ---
>
> Key: HIVE-24014
> URL: https://issues.apache.org/jira/browse/HIVE-24014
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24014.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With the newer implementation, every dump operation cleans up the  
> dump-directory previously consumed by load operation. Hence, for a policy, at 
> most only one dump directory will be there. Also, now dump directory base 
> location config is policy level config and hence this DumpDirCleanerTask will 
> not be effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467862
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 13:59
Start Date: 07/Aug/20 13:59
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r467058365



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list partVals,
+  6: optional string validWriteIdList
+}
+
+struct GetFileListResponse {

Review comment:
   This make sense as well. I'll add an enum: FileMetadataType, which this 
GetFileListResponse (I'll probably rename) will contain as a field.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467862)
Time Spent: 2h 10m  (was: 2h)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24014) Need to delete DumpDirectoryCleanerTask

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24014?focusedWorklogId=467849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467849
 ]

ASF GitHub Bot logged work on HIVE-24014:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 13:04
Start Date: 07/Aug/20 13:04
Worklog Time Spent: 10m 
  Work Description: ArkoSharma closed pull request #1377:
URL: https://github.com/apache/hive/pull/1377


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467849)
Time Spent: 20m  (was: 10m)

> Need to delete DumpDirectoryCleanerTask
> ---
>
> Key: HIVE-24014
> URL: https://issues.apache.org/jira/browse/HIVE-24014
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24014.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With the newer implementation, every dump operation cleans up the  
> dump-directory previously consumed by load operation. Hence, for a policy, at 
> most only one dump directory will be there. Also, now dump directory base 
> location config is policy level config and hence this DumpDirCleanerTask will 
> not be effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause

2020-08-07 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173124#comment-17173124
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

Steps to reproduce:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

when * hive.optimize.sort.dynamic.partition=true* we expect the keys to be 
sorted in the reducer side so that reducers can keep only one record writer 
open at any time thereby reducing the memory pressure on the reducers 
(HIVE-6455) , But from the query execution it looks like the property of 
*distribute by clause* (does not guarantee clustering or sorting properties on 
the distributed keys) takes precedence and hence the keys at the reducer side 
are not in sorted order and fails with null pointer exception.

[~prasanth_j] [~jcamachorodriguez] [~vikram.dixit] any thoughts on this?
*Note*: This issue is not seen when vectorization is enabled


> NPE when inserting data with 'distribute by' clause
> ---
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
> Environment: EMR
>Reporter: Aki Tanaka
>Assignee: Lynch Lee
>Priority: Major
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> 

[jira] [Comment Edited] (HIVE-18284) NPE when inserting data with 'distribute by' clause

2020-08-07 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173124#comment-17173124
 ] 

Syed Shameerur Rahman edited comment on HIVE-18284 at 8/7/20, 1:01 PM:
---

Steps to reproduce:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

when *hive.optimize.sort.dynamic.partition=true* we expect the keys to be 
sorted in the reducer side so that reducers can keep only one record writer 
open at any time thereby reducing the memory pressure on the reducers 
(HIVE-6455) , But from the query execution it looks like the property of 
*distribute by clause* (does not guarantee clustering or sorting properties on 
the distributed keys) takes precedence and hence the keys at the reducer side 
are not in sorted order and fails with null pointer exception.

[~prasanth_j] [~jcamachorodriguez] [~vikram.dixit] any thoughts on this?
*Note*: This issue is not seen when vectorization is enabled



was (Author: srahman):
Steps to reproduce:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

when * hive.optimize.sort.dynamic.partition=true* we expect the keys to be 
sorted in the reducer side so that reducers can keep only one record writer 
open at any time thereby reducing the memory pressure on the reducers 
(HIVE-6455) , But from the query execution it looks like the property of 
*distribute by clause* (does not guarantee clustering or sorting properties on 
the distributed keys) takes precedence and hence the keys at the reducer side 
are not in sorted order and fails with null pointer exception.

[~prasanth_j] [~jcamachorodriguez] [~vikram.dixit] any thoughts on this?
*Note*: This issue is not seen when vectorization is enabled


> NPE when inserting data with 'distribute by' clause
> ---
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
> Environment: EMR
>Reporter: Aki Tanaka
>Assignee: Lynch Lee
>Priority: Major
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> 

[jira] [Commented] (HIVE-24016) Share bloom filter construction branch in multi column semijoin reducers

2020-08-07 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173114#comment-17173114
 ] 

Stamatis Zampetakis commented on HIVE-24016:


Note that I didn't try to test if the transformation described above is applied 
by {{SharedWorkOptimizer}} but for sure it is something not covered by  
{{SemiJoinReductionMerge}} at the moment.

> Share bloom filter construction branch in multi column semijoin reducers
> 
>
> Key: HIVE-24016
> URL: https://issues.apache.org/jira/browse/HIVE-24016
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> In HIVE-21196, we add a transformation capable of merging single column 
> semijoin reducers to multi column semijoin reducer.
> Currently it transforms the subplan SB0 to subplan SB1.
> +SB0+
> {noformat}
>   / RS -> TS_1[Editor] 
>  / SEL[fname] - GB - RS - GB -  RS -> TS_0[Author] 
>  SOURCE 
>  \ SEL[lname] - GB - RS - GB -  RS -> TS_0[Author]
> \ RS -> TS_1[Editor]
> TS_0[Author] - FIL[in_bloom(fname) ^ in_bloom(lname)]
> TS_1[Editor] - FIL[in_bloom(fname) ^ in_bloom(lname)]  
> {noformat}
> +SB1+
> {noformat}
>  / SEL[fname,lname] - GB - RS - GB - RS -> TS[Author] - 
> FIL[in_bloom(hash(fname,lname))]
>  SOURCE  
>  \ SEL[fname,lname] - GB - RS - GB - RS -> TS[Editor] - 
> FIL[in_bloom(hash(fname,lname))]
> {noformat}
> Observe that in SB1 we could share the common path that creates the bloom 
> filter (SEL - GB - RS -GB) to obtain a plan like SB2.
> +SB2+
> {noformat}
>  / RS -> TS[Author] - 
> FIL[in_bloom(hash(fname,lname))]
>  SOURCE - SEL[fname,lname] - GB - RS - GB -
>  \ RS -> TS[Editor] - 
> FIL[in_bloom(hash(fname,lname))]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24016) Share bloom filter construction branch in multi column semijoin reducers

2020-08-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24016:
--


> Share bloom filter construction branch in multi column semijoin reducers
> 
>
> Key: HIVE-24016
> URL: https://issues.apache.org/jira/browse/HIVE-24016
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> In HIVE-21196, we add a transformation capable of merging single column 
> semijoin reducers to multi column semijoin reducer.
> Currently it transforms the subplan SB0 to subplan SB1.
> +SB0+
> {noformat}
>   / RS -> TS_1[Editor] 
>  / SEL[fname] - GB - RS - GB -  RS -> TS_0[Author] 
>  SOURCE 
>  \ SEL[lname] - GB - RS - GB -  RS -> TS_0[Author]
> \ RS -> TS_1[Editor]
> TS_0[Author] - FIL[in_bloom(fname) ^ in_bloom(lname)]
> TS_1[Editor] - FIL[in_bloom(fname) ^ in_bloom(lname)]  
> {noformat}
> +SB1+
> {noformat}
>  / SEL[fname,lname] - GB - RS - GB - RS -> TS[Author] - 
> FIL[in_bloom(hash(fname,lname))]
>  SOURCE  
>  \ SEL[fname,lname] - GB - RS - GB - RS -> TS[Editor] - 
> FIL[in_bloom(hash(fname,lname))]
> {noformat}
> Observe that in SB1 we could share the common path that creates the bloom 
> filter (SEL - GB - RS -GB) to obtain a plan like SB2.
> +SB2+
> {noformat}
>  / RS -> TS[Author] - 
> FIL[in_bloom(hash(fname,lname))]
>  SOURCE - SEL[fname,lname] - GB - RS - GB -
>  \ RS -> TS[Editor] - 
> FIL[in_bloom(hash(fname,lname))]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467839
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 12:36
Start Date: 07/Aug/20 12:36
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r467012433



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
   }
 }
 
+@Override
+public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+  String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+  String dbName = req.getDbName();
+  String tblName = req.getTableName();
+  List partitions = req.getPartVals();
+  // Will be used later, when cache is introduced
+  String validWriteIdList = req.getValidWriteIdList();
+
+  startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+  + ", partitions: " + partitions.toString());
+
+
+  GetFileListResponse response = new GetFileListResponse();
+
+  boolean success = false;
+  Exception ex = null;
+  try {
+Partition p =  getMS().getPartition(catName, dbName, tblName, 
partitions);
+Path path = new Path(p.getSd().getLocation());
+
+FileSystem fs = path.getFileSystem(conf);
+RemoteIterator itr = fs.listFiles(path, true);
+while (itr.hasNext()) {
+  FileStatus fStatus = itr.next();
+  Reader reader = OrcFile.createReader(fStatus.getPath(), 
OrcFile.readerOptions(fs.getConf()));
+  boolean isRawFormat  = 
!CollectionUtils.isEqualCollection(reader.getSchema().getFieldNames(), 
ALL_ACID_ROW_NAMES);
+  int fileFormat = isRawFormat ? 0 : 2;
+  response.addToFileListData(createFileStatus(fStatus, 
p.getSd().getLocation(), fileFormat));
+}
+success = true;
+  } catch (Exception e) {
+ex = e;
+LOG.error("Failed to list files with error: " + e.getMessage());
+throw new MetaException(e.getMessage());
+  } finally {
+endFunction("get_file_list", success, ex);
+  }
+  return response;
+}
+
+private ByteBuffer createFileStatus(FileStatus fileStatus, String 
basePath, int fileFormat) {

Review comment:
   Noted, I'll add javadoc here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467839)
Time Spent: 2h  (was: 1h 50m)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467837
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 12:34
Start Date: 07/Aug/20 12:34
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r467011860



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
   }
 }
 
+@Override
+public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+  String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+  String dbName = req.getDbName();
+  String tblName = req.getTableName();
+  List partitions = req.getPartVals();
+  // Will be used later, when cache is introduced
+  String validWriteIdList = req.getValidWriteIdList();
+
+  startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+  + ", partitions: " + partitions.toString());
+
+
+  GetFileListResponse response = new GetFileListResponse();
+
+  boolean success = false;
+  Exception ex = null;
+  try {
+Partition p =  getMS().getPartition(catName, dbName, tblName, 
partitions);
+Path path = new Path(p.getSd().getLocation());
+
+FileSystem fs = path.getFileSystem(conf);
+RemoteIterator itr = fs.listFiles(path, true);
+while (itr.hasNext()) {
+  FileStatus fStatus = itr.next();
+  Reader reader = OrcFile.createReader(fStatus.getPath(), 
OrcFile.readerOptions(fs.getConf()));
+  boolean isRawFormat  = 
!CollectionUtils.isEqualCollection(reader.getSchema().getFieldNames(), 
ALL_ACID_ROW_NAMES);
+  int fileFormat = isRawFormat ? 0 : 2;

Review comment:
   My first thought was that is should be a boolean. But after some 
discussion, if we want to add new versions, or fileFormats (like we release 
ACID v3), it'd make sense not to store it in boolean in the first place. But 
you're right, for more readable code, we can use enums.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467837)
Time Spent: 1h 50m  (was: 1h 40m)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467835
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 12:31
Start Date: 07/Aug/20 12:31
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r467010382



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
   }
 }
 
+@Override
+public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+  String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+  String dbName = req.getDbName();
+  String tblName = req.getTableName();
+  List partitions = req.getPartVals();
+  // Will be used later, when cache is introduced
+  String validWriteIdList = req.getValidWriteIdList();
+
+  startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+  + ", partitions: " + partitions.toString());
+
+
+  GetFileListResponse response = new GetFileListResponse();
+
+  boolean success = false;
+  Exception ex = null;
+  try {
+Partition p =  getMS().getPartition(catName, dbName, tblName, 
partitions);
+Path path = new Path(p.getSd().getLocation());
+
+FileSystem fs = path.getFileSystem(conf);
+RemoteIterator itr = fs.listFiles(path, true);
+while (itr.hasNext()) {
+  FileStatus fStatus = itr.next();
+  Reader reader = OrcFile.createReader(fStatus.getPath(), 
OrcFile.readerOptions(fs.getConf()));

Review comment:
   Currently yes. But I think it makes sense to check if it's ORC first. If 
not, just leave the fileFormat field empty. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467835)
Time Spent: 1h 40m  (was: 1.5h)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467833
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 12:29
Start Date: 07/Aug/20 12:29
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r467009533



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -240,6 +247,20 @@
   private static ZooKeeperHiveHelper zooKeeperHelper = null;
   private static String msHost = null;
 
+  static final String OPERATION_FIELD_NAME = "operation";

Review comment:
   Noted, I'll add some comments





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467833)
Time Spent: 1.5h  (was: 1h 20m)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467832
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 12:27
Start Date: 07/Aug/20 12:27
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r467008407



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2454,10 +2467,14 @@ PartitionsResponse 
get_partitions_req(1:PartitionsRequest req)
   // partition keys in new_part should be the same as those in old partition.
   void rename_partition(1:string db_name, 2:string tbl_name, 3:list 
part_vals, 4:Partition new_part)
throws (1:InvalidOperationException o1, 2:MetaException 
o2)
-  
+
   RenamePartitionResponse rename_partition_req(1:RenamePartitionRequest req)
throws (1:InvalidOperationException o1, 2:MetaException 
o2)
 
+  // Returns a file list using FlatBuffers as serialization
+  GetFileListResponse get_file_list(1:GetFileListRequest req)
+throws(1:MetaException o1)

Review comment:
   You're right, I forgot to add it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467832)
Time Spent: 1h 20m  (was: 1h 10m)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467831
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 12:25
Start Date: 07/Aug/20 12:25
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r467007215



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list partVals,
+  6: optional string validWriteIdList
+}
+
+struct GetFileListResponse {
+  1: optional list fileListData,
+  2: optional i32 fbVersionNumber

Review comment:
   I agree. Here I wanted to express, that the version represents the 
fields in the flatbuffer file. In the first version, we'd like to store 
relative_path, length, last_modification_time, and file_format, but in future 
versions, we might want to add more fields. But you're right, this naming is 
confusing. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467831)
Time Spent: 1h 10m  (was: 1h)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467830
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 12:22
Start Date: 07/Aug/20 12:22
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r467005926



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {

Review comment:
   I agree with the naming. The existing getFileMetadata (AFAIK) returns 
file metadata based on fileId. It would not be good for our use-case (replacing 
recursive listing during getAcidState). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467830)
Time Spent: 1h  (was: 50m)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23960) Partition with no column statistics leads to unbalanced calls to openTransaction/commitTransaction error during get_partitions_by_names

2020-08-07 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173105#comment-17173105
 ] 

Aasha Medhi commented on HIVE-23960:


+1

> Partition with no column statistics leads to unbalanced calls to 
> openTransaction/commitTransaction error during get_partitions_by_names
> ---
>
> Key: HIVE-23960
> URL: https://issues.apache.org/jira/browse/HIVE-23960
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23960.01.patch, HIVE-23960.02.patch, 
> HIVE-23960.03.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Creating a partition with data and adding another partition is 
> leading to unbalanced calls to open/commit transaction during 
> get_partitions_by_names call.{color}
> {color:#172b4d}Issue was discovered during REPL DUMP operation which uses  
> this HMS call to get the metadata of partition. This error occurs when there 
> is a partition with no column statistics.{color}
> {color:#172b4d}To reproduce:{color}
> {code:java}
> CREATE TABLE student_part_acid(name string, age int, gpa double) PARTITIONED 
> BY (ds string) STORED AS orc;
> LOAD DATA INPATH ‘/user/hive/partDir/student_part_acid/ds=20110924’ INTO 
> TABLE student_part_acid partition(ds=20110924);
> ALTER TABLE student_part_acid ADD PARTITION (ds=20110925);
> Now if we try to preform REPL DUMP it fails with this the error "Unbalanced 
> calls to open/commit transaction" on the HS2 side. 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=467822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467822
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 11:55
Start Date: 07/Aug/20 11:55
Worklog Time Spent: 10m 
  Work Description: bmaidics commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r466993241



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list partVals,

Review comment:
   Yes, it can retrieve for 1 partition at a time. It can handle 
multi-level partitions as well. Do you think it would make sense to support 
returning file metadata for a list of partition values? It can lead to a huge 
amount of data sent over the network and the cache can be filled easily with 
just a few entries.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467822)
Time Spent: 50m  (was: 40m)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag

2020-08-07 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-24000.
---
Resolution: Fixed

> Put exclusive MERGE INSERT under the feature flag
> -
>
> Key: HIVE-24000
> URL: https://issues.apache.org/jira/browse/HIVE-24000
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag

2020-08-07 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173064#comment-17173064
 ] 

Denys Kuzmenko commented on HIVE-24000:
---

Pushed to master.
[~pvarga], thank you for the review!

> Put exclusive MERGE INSERT under the feature flag
> -
>
> Key: HIVE-24000
> URL: https://issues.apache.org/jira/browse/HIVE-24000
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag

2020-08-07 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-24000:
-

Assignee: Denys Kuzmenko

> Put exclusive MERGE INSERT under the feature flag
> -
>
> Key: HIVE-24000
> URL: https://issues.apache.org/jira/browse/HIVE-24000
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=467800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467800
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 10:35
Start Date: 07/Aug/20 10:35
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r466819149



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAntiSemiJoinRule.java
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAntiJoin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveAntiSemiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveAntiSemiJoinRule.class);
+  public static final HiveAntiSemiJoinRule INSTANCE = new 
HiveAntiSemiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveAntiSemiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void perform(RelOptRuleCall call, Project project, Filter filter, 
Join join) {
+LOG.debug("Start Matching HiveAntiJoinRule");
+
+//TODO : Need to support this scenario.
+if (join.getCondition().isAlwaysTrue()) {
+  return;
+}
+
+//We support conversion from left outer join only.
+if (join.getJoinType() != JoinRelType.LEFT) {
+  return;
+}
+
+assert (filter != null);
+
+// If null filter is not present from right side then we can not convert 
to anti join.
+List aboveFilters = 
RelOptUtil.conjunctions(filter.getCondition());
+Stream nullFilters = aboveFilters.stream().filter(filterNode -> 
filterNode.getKind() == SqlKind.IS_NULL);
+boolean hasNullFilter = 
HiveCalciteUtil.hasAnyExpressionFromRightSide(join, 
nullFilters.collect(Collectors.toList()));
+if (!hasNullFilter) {
+  return;
+}
+
+// If any projection is there from right side, then we can not convert to 
anti join.
+boolean hasProjection = 
HiveCalciteUtil.hasAnyExpressionFromRightSide(join, project.getProjects());
+if (hasProjection) {
+  return;
+}
+
+LOG.debug("Matched HiveAntiJoinRule");
+
+// Build anti join with same left, right child and condition as original 
left outer join.
+Join anti = HiveAntiJoin.getAntiJoin(join.getLeft().getCluster(), 
join.getLeft().getTraitSet(),
+join.getLeft(), join.getRight(), join.getCondition());
+RelNode newProject = project.copy(project.getTraitSet(), anti, 
project.getProjects(), project.getRowType());
+call.transformTo(newProject);

Review comment:
   

[jira] [Commented] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag

2020-08-07 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173043#comment-17173043
 ] 

Stamatis Zampetakis commented on HIVE-24000:


Shouldn't the status of the JIRA change when the PR is merged to the  master?

> Put exclusive MERGE INSERT under the feature flag
> -
>
> Key: HIVE-24000
> URL: https://issues.apache.org/jira/browse/HIVE-24000
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24001) Don't cache MapWork in tez/ObjectCache during query-based compaction

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24001?focusedWorklogId=467791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467791
 ]

ASF GitHub Bot logged work on HIVE-24001:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 09:59
Start Date: 07/Aug/20 09:59
Worklog Time Spent: 10m 
  Work Description: klcopp closed pull request #1368:
URL: https://github.com/apache/hive/pull/1368


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467791)
Time Spent: 0.5h  (was: 20m)

> Don't cache MapWork in tez/ObjectCache during query-based compaction
> 
>
> Key: HIVE-24001
> URL: https://issues.apache.org/jira/browse/HIVE-24001
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Query-based major compaction can fail intermittently with the following issue:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: One writer is 
> supposed to handle only one bucket. We saw these 2 different buckets: 1 and 6
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFValidateAcidSortOrder.evaluate(GenericUDFValidateAcidSortOrder.java:77)
> {code}
> This is consistently preceded in the application log with:
> {code:java}
>  [INFO] [TezChild] |tez.ObjectCache|: Found 
> hive_20200804185133_f04cca69-fa30-4f1b-a5fe-80fc2d749f48_Map 1__MAP_PLAN__ in 
> cache with value: org.apache.hadoop.hive.ql.plan.MapWork@74652101
> {code}
> Alternatively, when MapRecordProcessor doesn't find mapWork in 
> tez/ObjectCache (but instead caches mapWork), major compaction succeeds.
> The failure happens because, if MapWork is reused, 
> GenericUDFValidateAcidSortOrder (which is called during compaction) is also 
> reused on splits belonging to two different buckets, which produces an error.
> Solution is to avoid storing MapWork in the ObjectCache during query-based 
> compaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24001) Don't cache MapWork in tez/ObjectCache during query-based compaction

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24001?focusedWorklogId=467790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467790
 ]

ASF GitHub Bot logged work on HIVE-24001:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 09:59
Start Date: 07/Aug/20 09:59
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1368:
URL: https://github.com/apache/hive/pull/1368


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467790)
Time Spent: 20m  (was: 10m)

> Don't cache MapWork in tez/ObjectCache during query-based compaction
> 
>
> Key: HIVE-24001
> URL: https://issues.apache.org/jira/browse/HIVE-24001
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Query-based major compaction can fail intermittently with the following issue:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: One writer is 
> supposed to handle only one bucket. We saw these 2 different buckets: 1 and 6
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFValidateAcidSortOrder.evaluate(GenericUDFValidateAcidSortOrder.java:77)
> {code}
> This is consistently preceded in the application log with:
> {code:java}
>  [INFO] [TezChild] |tez.ObjectCache|: Found 
> hive_20200804185133_f04cca69-fa30-4f1b-a5fe-80fc2d749f48_Map 1__MAP_PLAN__ in 
> cache with value: org.apache.hadoop.hive.ql.plan.MapWork@74652101
> {code}
> Alternatively, when MapRecordProcessor doesn't find mapWork in 
> tez/ObjectCache (but instead caches mapWork), major compaction succeeds.
> The failure happens because, if MapWork is reused, 
> GenericUDFValidateAcidSortOrder (which is called during compaction) is also 
> reused on splits belonging to two different buckets, which produces an error.
> Solution is to avoid storing MapWork in the ObjectCache during query-based 
> compaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23954) count(*) with count(distinct) gives wrong results with hive.optimize.countdistinct=true

2020-08-07 Thread Eugene Chung (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173023#comment-17173023
 ] 

Eugene Chung commented on HIVE-23954:
-

[^HIVE-23954.01.patch] I'm trying skipping the reducer deduplication for the 
mixed case.

> count(*) with count(distinct) gives wrong results with 
> hive.optimize.countdistinct=true
> ---
>
> Key: HIVE-23954
> URL: https://issues.apache.org/jira/browse/HIVE-23954
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
> Attachments: HIVE-23954.01.patch
>
>
> {code:java}
> select count(*), count(distinct mid) from db1.table1 where partitioned_column 
> = '...'{code}
>  
> is not working properly when hive.optimize.countdistinct is true. By default, 
> it's true for all 3.x versions.
> In the two plans below, the aggregations part in the Output of Group By 
> Operator of Map 1 are different.
>  
> - hive.optimize.countdistinct=false
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_5] (rows=1 width=24) |
> |   
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
> KEY._col0:0._col0)"] |
> | <-Map 1 [SIMPLE_EDGE]  |
> |   SHUFFLE [RS_4]   |
> | Group By Operator [GBY_3] (rows=343640771 width=4160) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["count()","count(DISTINCT 
> mid)"],keys:mid |
> |   Select Operator [SEL_2] (rows=343640771 width=4160) |
> | Output:["mid"] |
> | TableScan [TS_0] (rows=343640771 width=4160) |
> |   db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++{code}
>  
> - hive.optimize.countdistinct=true
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_14] (rows=1 width=16) |
> |   
> Output:["_col0","_col1"],aggregations:["count(_col1)","count(_col0)"] |
> |   Group By Operator [GBY_11] (rows=343640771 width=4160) |
> | 
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 |
> |   <-Map 1 [SIMPLE_EDGE]|
> | SHUFFLE [RS_10]|
> |   PartitionCols:_col0  |
> |   Group By Operator [GBY_9] (rows=343640771 width=4160) |
> | Output:["_col0","_col1"],aggregations:["count()"],keys:mid |
> | Select Operator [SEL_2] (rows=343640771 width=4160) |
> |   Output:["mid"]   |
> |   TableScan [TS_0] (rows=343640771 width=4160) |
> | db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23954) count(*) with count(distinct) gives wrong results with hive.optimize.countdistinct=true

2020-08-07 Thread Eugene Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung reassigned HIVE-23954:
---

Assignee: Eugene Chung

> count(*) with count(distinct) gives wrong results with 
> hive.optimize.countdistinct=true
> ---
>
> Key: HIVE-23954
> URL: https://issues.apache.org/jira/browse/HIVE-23954
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
> Attachments: HIVE-23954.01.patch
>
>
> {code:java}
> select count(*), count(distinct mid) from db1.table1 where partitioned_column 
> = '...'{code}
>  
> is not working properly when hive.optimize.countdistinct is true. By default, 
> it's true for all 3.x versions.
> In the two plans below, the aggregations part in the Output of Group By 
> Operator of Map 1 are different.
>  
> - hive.optimize.countdistinct=false
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_5] (rows=1 width=24) |
> |   
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
> KEY._col0:0._col0)"] |
> | <-Map 1 [SIMPLE_EDGE]  |
> |   SHUFFLE [RS_4]   |
> | Group By Operator [GBY_3] (rows=343640771 width=4160) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["count()","count(DISTINCT 
> mid)"],keys:mid |
> |   Select Operator [SEL_2] (rows=343640771 width=4160) |
> | Output:["mid"] |
> | TableScan [TS_0] (rows=343640771 width=4160) |
> |   db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++{code}
>  
> - hive.optimize.countdistinct=true
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_14] (rows=1 width=16) |
> |   
> Output:["_col0","_col1"],aggregations:["count(_col1)","count(_col0)"] |
> |   Group By Operator [GBY_11] (rows=343640771 width=4160) |
> | 
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 |
> |   <-Map 1 [SIMPLE_EDGE]|
> | SHUFFLE [RS_10]|
> |   PartitionCols:_col0  |
> |   Group By Operator [GBY_9] (rows=343640771 width=4160) |
> | Output:["_col0","_col1"],aggregations:["count()"],keys:mid |
> | Select Operator [SEL_2] (rows=343640771 width=4160) |
> |   Output:["mid"]   |
> |   TableScan [TS_0] (rows=343640771 width=4160) |
> | db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23954) count(*) with count(distinct) gives wrong results with hive.optimize.countdistinct=true

2020-08-07 Thread Eugene Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-23954:

Attachment: HIVE-23954.01.patch
Status: Patch Available  (was: Open)

> count(*) with count(distinct) gives wrong results with 
> hive.optimize.countdistinct=true
> ---
>
> Key: HIVE-23954
> URL: https://issues.apache.org/jira/browse/HIVE-23954
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 3.1.0, 3.0.0
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
> Attachments: HIVE-23954.01.patch
>
>
> {code:java}
> select count(*), count(distinct mid) from db1.table1 where partitioned_column 
> = '...'{code}
>  
> is not working properly when hive.optimize.countdistinct is true. By default, 
> it's true for all 3.x versions.
> In the two plans below, the aggregations part in the Output of Group By 
> Operator of Map 1 are different.
>  
> - hive.optimize.countdistinct=false
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_5] (rows=1 width=24) |
> |   
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
> KEY._col0:0._col0)"] |
> | <-Map 1 [SIMPLE_EDGE]  |
> |   SHUFFLE [RS_4]   |
> | Group By Operator [GBY_3] (rows=343640771 width=4160) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["count()","count(DISTINCT 
> mid)"],keys:mid |
> |   Select Operator [SEL_2] (rows=343640771 width=4160) |
> | Output:["mid"] |
> | TableScan [TS_0] (rows=343640771 width=4160) |
> |   db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++{code}
>  
> - hive.optimize.countdistinct=true
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_14] (rows=1 width=16) |
> |   
> Output:["_col0","_col1"],aggregations:["count(_col1)","count(_col0)"] |
> |   Group By Operator [GBY_11] (rows=343640771 width=4160) |
> | 
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 |
> |   <-Map 1 [SIMPLE_EDGE]|
> | SHUFFLE [RS_10]|
> |   PartitionCols:_col0  |
> |   Group By Operator [GBY_9] (rows=343640771 width=4160) |
> | Output:["_col0","_col1"],aggregations:["count()"],keys:mid |
> | Select Operator [SEL_2] (rows=343640771 width=4160) |
> |   Output:["mid"]   |
> |   TableScan [TS_0] (rows=343640771 width=4160) |
> | db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23954) count(*) with count(distinct) gives wrong results with hive.optimize.countdistinct=true

2020-08-07 Thread Eugene Chung (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173010#comment-17173010
 ] 

Eugene Chung commented on HIVE-23954:
-

How about skipping the reducer deduplication or count distinct transformer 
itself for this case like usages of count all and distinct are mixed? I saw 
that multiple count distinct usages make the count distinct transformer skip.

> count(*) with count(distinct) gives wrong results with 
> hive.optimize.countdistinct=true
> ---
>
> Key: HIVE-23954
> URL: https://issues.apache.org/jira/browse/HIVE-23954
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eugene Chung
>Priority: Major
>
> {code:java}
> select count(*), count(distinct mid) from db1.table1 where partitioned_column 
> = '...'{code}
>  
> is not working properly when hive.optimize.countdistinct is true. By default, 
> it's true for all 3.x versions.
> In the two plans below, the aggregations part in the Output of Group By 
> Operator of Map 1 are different.
>  
> - hive.optimize.countdistinct=false
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_5] (rows=1 width=24) |
> |   
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
> KEY._col0:0._col0)"] |
> | <-Map 1 [SIMPLE_EDGE]  |
> |   SHUFFLE [RS_4]   |
> | Group By Operator [GBY_3] (rows=343640771 width=4160) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["count()","count(DISTINCT 
> mid)"],keys:mid |
> |   Select Operator [SEL_2] (rows=343640771 width=4160) |
> | Output:["mid"] |
> | TableScan [TS_0] (rows=343640771 width=4160) |
> |   db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++{code}
>  
> - hive.optimize.countdistinct=true
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_14] (rows=1 width=16) |
> |   
> Output:["_col0","_col1"],aggregations:["count(_col1)","count(_col0)"] |
> |   Group By Operator [GBY_11] (rows=343640771 width=4160) |
> | 
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 |
> |   <-Map 1 [SIMPLE_EDGE]|
> | SHUFFLE [RS_10]|
> |   PartitionCols:_col0  |
> |   Group By Operator [GBY_9] (rows=343640771 width=4160) |
> | Output:["_col0","_col1"],aggregations:["count()"],keys:mid |
> | Select Operator [SEL_2] (rows=343640771 width=4160) |
> |   Output:["mid"]   |
> |   TableScan [TS_0] (rows=343640771 width=4160) |
> | db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24014) Need to delete DumpDirectoryCleanerTask

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24014?focusedWorklogId=467773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467773
 ]

ASF GitHub Bot logged work on HIVE-24014:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 08:50
Start Date: 07/Aug/20 08:50
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1377:
URL: https://github.com/apache/hive/pull/1377


   ([HIVE-24014: Need to delete 
DumpDirectoryCleanerTask](https://issues.apache.org/jira/browse/HIVE-24014))
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467773)
Remaining Estimate: 0h
Time Spent: 10m

> Need to delete DumpDirectoryCleanerTask
> ---
>
> Key: HIVE-24014
> URL: https://issues.apache.org/jira/browse/HIVE-24014
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
> Attachments: HIVE-24014.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the newer implementation, every dump operation cleans up the  
> dump-directory previously consumed by load operation. Hence, for a policy, at 
> most only one dump directory will be there. Also, now dump directory base 
> location config is policy level config and hence this DumpDirCleanerTask will 
> not be effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24014) Need to delete DumpDirectoryCleanerTask

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24014:
--
Labels: pull-request-available  (was: )

> Need to delete DumpDirectoryCleanerTask
> ---
>
> Key: HIVE-24014
> URL: https://issues.apache.org/jira/browse/HIVE-24014
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24014.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the newer implementation, every dump operation cleans up the  
> dump-directory previously consumed by load operation. Hence, for a policy, at 
> most only one dump directory will be there. Also, now dump directory base 
> location config is policy level config and hence this DumpDirCleanerTask will 
> not be effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23963) UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23963?focusedWorklogId=467771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467771
 ]

ASF GitHub Bot logged work on HIVE-23963:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 08:47
Start Date: 07/Aug/20 08:47
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1357:
URL: https://github.com/apache/hive/pull/1357#discussion_r466908433



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelDistribution.java
##
@@ -81,8 +84,16 @@ public RelDistribution apply(TargetMapping mapping) {
   return this;
 }
 List newKeys = new ArrayList<>(keys.size());
+
+// Instead of using a HashMap for lookup 
newKeys.add(mapping.getTargetOpt(key)); should be called but not all the
+// mapping supports that. See HIVE-23963. Replace this when this is fixed 
in calcite.

Review comment:
   Created: https://issues.apache.org/jira/browse/CALCITE-4166





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467771)
Time Spent: 1h  (was: 50m)

> UnsupportedOperationException in queries 74 and 84 while applying 
> HiveCardinalityPreservingJoinRule
> ---
>
> Key: HIVE-23963
> URL: https://issues.apache.org/jira/browse/HIVE-23963
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: cbo_query74_stacktrace.txt, cbo_query84_stacktrace.txt
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following TPC-DS queries: 
> * cbo_query74.q
> * cbo_query84.q 
> * query74.q 
> * query84.q 
> fail on the metastore with the partitioned TPC-DS 30TB dataset.
> The stacktraces for cbo_query74 and cbo_query84 show that the problem 
> originates while applying HiveCardinalityPreservingJoinRule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23916) Fix Atlas client dependency version

2020-08-07 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-23916:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

merged to master, Thanks for the patch [~pkumarsinha] and review [~aasha]

> Fix Atlas client dependency version
> ---
>
> Key: HIVE-23916
> URL: https://issues.apache.org/jira/browse/HIVE-23916
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23916.01.patch, HIVE-23916.02.patch, 
> HIVE-23916.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24000?focusedWorklogId=467757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467757
 ]

ASF GitHub Bot logged work on HIVE-24000:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 07:54
Start Date: 07/Aug/20 07:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1371:
URL: https://github.com/apache/hive/pull/1371


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467757)
Time Spent: 1h 20m  (was: 1h 10m)

> Put exclusive MERGE INSERT under the feature flag
> -
>
> Key: HIVE-24000
> URL: https://issues.apache.org/jira/browse/HIVE-24000
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24000?focusedWorklogId=467754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467754
 ]

ASF GitHub Bot logged work on HIVE-24000:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 07:52
Start Date: 07/Aug/20 07:52
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1371:
URL: https://github.com/apache/hive/pull/1371#issuecomment-670383718


   LGTM. +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467754)
Time Spent: 1h 10m  (was: 1h)

> Put exclusive MERGE INSERT under the feature flag
> -
>
> Key: HIVE-24000
> URL: https://issues.apache.org/jira/browse/HIVE-24000
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24000?focusedWorklogId=467753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467753
 ]

ASF GitHub Bot logged work on HIVE-24000:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 07:52
Start Date: 07/Aug/20 07:52
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #1371:
URL: https://github.com/apache/hive/pull/1371#issuecomment-670383666


   @pvargacl could you please review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467753)
Time Spent: 1h  (was: 50m)

> Put exclusive MERGE INSERT under the feature flag
> -
>
> Key: HIVE-24000
> URL: https://issues.apache.org/jira/browse/HIVE-24000
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24014) Need to delete DumpDirectoryCleanerTask

2020-08-07 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24014:
---
Attachment: HIVE-24014.01.patch
Status: Patch Available  (was: Open)

> Need to delete DumpDirectoryCleanerTask
> ---
>
> Key: HIVE-24014
> URL: https://issues.apache.org/jira/browse/HIVE-24014
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
> Attachments: HIVE-24014.01.patch
>
>
> With the newer implementation, every dump operation cleans up the  
> dump-directory previously consumed by load operation. Hence, for a policy, at 
> most only one dump directory will be there. Also, now dump directory base 
> location config is policy level config and hence this DumpDirCleanerTask will 
> not be effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24015) Disable query-based compaction on MR execution engine

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24015:
--
Labels: pull-request-available  (was: )

> Disable query-based compaction on MR execution engine
> -
>
> Key: HIVE-24015
> URL: https://issues.apache.org/jira/browse/HIVE-24015
> Project: Hive
>  Issue Type: Task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Major compaction can be run when the execution engine is MR. This can cause 
> data loss a la HIVE-23703 (the fix for data loss when the execution engine is 
> MR was reverted by HIVE-23763).
> Currently minor compaction can only be run when the execution engine is Tez, 
> otherwise it falls back to MR (non-query-based) compaction. We should extend 
> this functionality to major compaction as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24015) Disable query-based compaction on MR execution engine

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24015?focusedWorklogId=467750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467750
 ]

ASF GitHub Bot logged work on HIVE-24015:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 07:43
Start Date: 07/Aug/20 07:43
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1375:
URL: https://github.com/apache/hive/pull/1375


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467750)
Remaining Estimate: 0h
Time Spent: 10m

> Disable query-based compaction on MR execution engine
> -
>
> Key: HIVE-24015
> URL: https://issues.apache.org/jira/browse/HIVE-24015
> Project: Hive
>  Issue Type: Task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Major compaction can be run when the execution engine is MR. This can cause 
> data loss a la HIVE-23703 (the fix for data loss when the execution engine is 
> MR was reverted by HIVE-23763).
> Currently minor compaction can only be run when the execution engine is Tez, 
> otherwise it falls back to MR (non-query-based) compaction. We should extend 
> this functionality to major compaction as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24015) Disable query-based compaction on MR execution engine

2020-08-07 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-24015:



> Disable query-based compaction on MR execution engine
> -
>
> Key: HIVE-24015
> URL: https://issues.apache.org/jira/browse/HIVE-24015
> Project: Hive
>  Issue Type: Task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>
> Major compaction can be run when the execution engine is MR. This can cause 
> data loss a la HIVE-23703 (the fix for data loss when the execution engine is 
> MR was reverted by HIVE-23763).
> Currently minor compaction can only be run when the execution engine is Tez, 
> otherwise it falls back to MR (non-query-based) compaction. We should extend 
> this functionality to major compaction as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24014) Need to delete DumpDirectoryCleanerTask

2020-08-07 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma reassigned HIVE-24014:
--


> Need to delete DumpDirectoryCleanerTask
> ---
>
> Key: HIVE-24014
> URL: https://issues.apache.org/jira/browse/HIVE-24014
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>
> With the newer implementation, every dump operation cleans up the  
> dump-directory previously consumed by load operation. Hence, for a policy, at 
> most only one dump directory will be there. Also, now dump directory base 
> location config is policy level config and hence this DumpDirCleanerTask will 
> not be effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-18325) Config to do case unaware schema evolution to ORC reader.

2020-08-07 Thread zhaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172914#comment-17172914
 ] 

zhaolong edited comment on HIVE-18325 at 8/7/20, 6:08 AM:
--

   In OrcFile.java,

readerOptions method add 

conf.set("orc.schema.evolution.case.sensitive", "false");

 can fix this problem


was (Author: fsilent):
!image-2020-08-07-14-07-04-676.png!   this param can fix this problem

> Config to do case unaware schema evolution to ORC reader.
> -
>
> Key: HIVE-18325
> URL: https://issues.apache.org/jira/browse/HIVE-18325
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: piyush mukati
>Priority: Critical
>
> in case of orc data reader schema passed by hive are all small cases and if 
> the column name stored in the file has any uppercase, it will return null 
> values for those columns even if the data is present in the file. 
> Column name matching while schema evolution should be case unaware. 
> we need to pass config for same from hive. the 
> config(orc.schema.evolution.case.sensitive) in orc will be exposed by 
> https://issues.apache.org/jira/browse/ORC-264 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18325) Config to do case unaware schema evolution to ORC reader.

2020-08-07 Thread zhaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172914#comment-17172914
 ] 

zhaolong commented on HIVE-18325:
-

!image-2020-08-07-14-07-04-676.png!   this param can fix this problem

> Config to do case unaware schema evolution to ORC reader.
> -
>
> Key: HIVE-18325
> URL: https://issues.apache.org/jira/browse/HIVE-18325
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: piyush mukati
>Priority: Critical
>
> in case of orc data reader schema passed by hive are all small cases and if 
> the column name stored in the file has any uppercase, it will return null 
> values for those columns even if the data is present in the file. 
> Column name matching while schema evolution should be case unaware. 
> we need to pass config for same from hive. the 
> config(orc.schema.evolution.case.sensitive) in orc will be exposed by 
> https://issues.apache.org/jira/browse/ORC-264 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)