date:20201029

[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=506534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506534
 ]

ASF GitHub Bot logged work on HIVE-18284:
-

Author: ASF GitHub Bot
Created on: 30/Oct/20 05:18
Start Date: 30/Oct/20 05:18
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r514873327



##
File path: itests/src/test/resources/testconfiguration.properties
##
@@ -6,6 +6,7 @@ minimr.query.files=\
 
 # Queries ran by both MiniLlapLocal and MiniTez
 minitez.query.files.shared=\
+  dynpart_sort_optimization_distribute_by.q,\

Review comment:
   For some reason, The issue is not reproducible with LLAP, Hence running 
this with mini tez





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506534)
Time Spent: 2.5h  (was: 2h 20m)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>

[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=506529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506529
 ]

ASF GitHub Bot logged work on HIVE-24259:
-

Author: ASF GitHub Bot
Created on: 30/Oct/20 04:21
Start Date: 30/Oct/20 04:21
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1610:
URL: https://github.com/apache/hive/pull/1610#discussion_r514829244



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2836,14 +2836,32 @@ long getPartsFound() {
   @Override
   public SQLAllTableConstraints getAllTableConstraints(String catName, String 
dbName, String tblName)
   throws MetaException, NoSuchObjectException {
-SQLAllTableConstraints sqlAllTableConstraints = new 
SQLAllTableConstraints();
-sqlAllTableConstraints.setPrimaryKeys(getPrimaryKeys(catName, dbName, 
tblName));
-sqlAllTableConstraints.setForeignKeys(getForeignKeys(catName, null, null, 
dbName, tblName));
-sqlAllTableConstraints.setUniqueConstraints(getUniqueConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setDefaultConstraints(getDefaultConstraints(catName, 
dbName, tblName));
-sqlAllTableConstraints.setCheckConstraints(getCheckConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setNotNullConstraints(getNotNullConstraints(catName, 
dbName, tblName));
-return sqlAllTableConstraints;
+
+catName = StringUtils.normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the constraints is not yet loaded in cache
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+SQLAllTableConstraints constraints = 
sharedCache.listCachedAllTableConstraints(catName, dbName, tblName);
+
+// if any of the constraint value is missing then there might be the case 
of partial constraints are stored in cached.
+// So fall back to raw store for correct values
+if (constraints != null && 
CollectionUtils.isNotEmpty(constraints.getPrimaryKeys()) && CollectionUtils

Review comment:
   Adding a flag required bigger change. So for now I am reducing the scope 
of this PR to optimise following
   1. Check only once if table exit in cached store.
   2. Instead of calling individual constraint in cached store. Add a method 
which return all constraint at once and if data is not consistent then fall 
back to rawstore. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506529)
Time Spent: 1h  (was: 50m)

> [CachedStore] Optimise get constraints call by removing redundant table check 
> --
>
> Key: HIVE-24259
> URL: https://issues.apache.org/jira/browse/HIVE-24259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Description -
> Problem - 
> 1. Redundant check if table is present or not
> 2. Currently in order to get all constraint form the cachedstore. 6 different 
> call is made with in the cached store. Which led to 6 different call to raw 
> store
>  
> DOD
> 1. Check only once if table exit in cached store.
> 2. Instead of calling individual constraint in cached store. Add a method 
> which return all constraint at once and if data is not consistent then fall 
> back to rawstore.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check

2020-10-29 Thread Ashish Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-24259:
-
Description: 
Description -

Problem - 
1. Redundant check if table is present or not
2. Currently in order to get all constraint form the cachedstore. 6 different 
call is made with in the cached store. Which led to 6 different call to raw 
store
 

DOD
1. Check only once if table exit in cached store.
2. Instead of calling individual constraint in cached store. Add a method which 
return all constraint at once and if data is not consistent then fall back to 
rawstore.  

  was:
Description -
currently inorder to get all constraint form the cachedstore. 6 different call 
is made to the store. Instead combine that 6 call in 1


> [CachedStore] Optimise get constraints call by removing redundant table check 
> --
>
> Key: HIVE-24259
> URL: https://issues.apache.org/jira/browse/HIVE-24259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Description -
> Problem - 
> 1. Redundant check if table is present or not
> 2. Currently in order to get all constraint form the cachedstore. 6 different 
> call is made with in the cached store. Which led to 6 different call to raw 
> store
>  
> DOD
> 1. Check only once if table exit in cached store.
> 2. Instead of calling individual constraint in cached store. Add a method 
> which return all constraint at once and if data is not consistent then fall 
> back to rawstore.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check

2020-10-29 Thread Ashish Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-24259:
-
Summary: [CachedStore] Optimise get constraints call by removing redundant 
table check   (was: [CachedStore] Optimise getAlltableConstraint from 6 cache 
calls to 1)

> [CachedStore] Optimise get constraints call by removing redundant table check 
> --
>
> Key: HIVE-24259
> URL: https://issues.apache.org/jira/browse/HIVE-24259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Description -
> currently inorder to get all constraint form the cachedstore. 6 different 
> call is made to the store. Instead combine that 6 call in 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=506526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506526
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 30/Oct/20 04:09
Start Date: 30/Oct/20 04:09
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r514803487



##
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorGraph.java
##
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.AppMasterEventOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule.DiGraph;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.DynamicPruningEventDesc;
+
+import com.google.common.collect.Sets;
+
+public class OperatorGraph {
+
+  /**
+   * A directed graph extended with support to check dag property.
+   */
+  static class DagGraph extends DiGraph {

Review comment:
   In the meantime, maybe DiGraph could be made a top class.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
##
@@ -189,115 +189,121 @@ public RexNode analyzeRexNode(RexBuilder rexBuilder, 
RexNode condition) {
 return newCondition;
   }
 
-  /**
-   * Transforms inequality candidates into [NOT] BETWEEN calls.
-   *
-   */
-  protected static class RexTransformIntoBetween extends RexShuttle {
-private final RexBuilder rexBuilder;
+  public static class DiGraph {

Review comment:
   Left the comment in other class but I was thinking that it may be a good 
idea to promote this to top class (at least until we replace it by any other 
library version as we were discussing).

##
File path: 
ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out
##
@@ -4317,7 +4301,7 @@ STAGE PLANS:
 outputColumnNames: ds
 Statistics: Num rows: 2000 Data size: 389248 Basic stats: 
COMPLETE Column stats: COMPLETE
 Group By Operator
-  aggregations: max(ds)
+  aggregations: min(ds)

Review comment:
   Any idea why this is happening?

##
File path: 
ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out
##
@@ -4277,37 +4277,21 @@ STAGE PLANS:
   alias: srcpart
   filterExpr: ds is not null (type: boolean)
   Statistics: Num rows: 2000 Data size: 389248 Basic stats: 
COMPLETE Column stats: COMPLETE
-  Filter Operator
-predicate: ds is not null (type: boolean)

Review comment:
   Note that the filter operator is removed. We need to be careful here 
because not all input formats guarantee that the filter expression is being 
applied / does not return false positives. I would expect the Filter remains 
but only a single time?

##
File path: ql/src/test/results/clientpositive/perf/tez/query95.q.out
##
@@ -128,7 +128,7 @@ Stage-0
   Select Operator [SEL_235] (rows=144002668 
width=7)
 Output:["_col0","_col1"]
 Filter Operator [FIL_234] (rows=144002668 
width=7)
-  predicate:(ws_order_number is not null 
and (ws_order_number is not null or ws_order_number is not null))
+

[jira] [Updated] (HIVE-24331) Add Jenkinsfile for branch-3.1

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24331:
--
Labels: pull-request-available  (was: )

> Add Jenkinsfile for branch-3.1
> --
>
> Key: HIVE-24331
> URL: https://issues.apache.org/jira/browse/HIVE-24331
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should add Jenkinsfile for branch-3.1 so that ppl can file PR against it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24331) Add Jenkinsfile for branch-3.1

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24331?focusedWorklogId=506521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506521
 ]

ASF GitHub Bot logged work on HIVE-24331:
-

Author: ASF GitHub Bot
Created on: 30/Oct/20 03:54
Start Date: 30/Oct/20 03:54
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1626:
URL: https://github.com/apache/hive/pull/1626#issuecomment-719156094


   seems the `TestMiniDruidKafkaCliDriver` is timing out and we need 
https://issues.apache.org/jira/browse/HIVE-19170



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506521)
Remaining Estimate: 0h
Time Spent: 10m

> Add Jenkinsfile for branch-3.1
> --
>
> Key: HIVE-24331
> URL: https://issues.apache.org/jira/browse/HIVE-24331
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should add Jenkinsfile for branch-3.1 so that ppl can file PR against it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23802) “merge files” job was submited to default queue when set hive.merge.tezfiles to true

2020-10-29 Thread gaozhan ding (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaozhan ding updated HIVE-23802:

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> “merge files” job was submited to default queue when set hive.merge.tezfiles 
> to true
> 
>
> Key: HIVE-23802
> URL: https://issues.apache.org/jira/browse/HIVE-23802
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0
>Reporter: gaozhan ding
>Assignee: gaozhan ding
>Priority: Major
>  Labels: pull-request-available
> Attachments: 15940042679272.png, HIVE-23802.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We use tez as the query engine. When hive.merge.tezfiles  set to true，merge 
> files task,  which followed by orginal task,  will be submit to default queue 
> rather then the queue same with orginal task.
> I study this issue for days and found that, every time starting a container, 
> "tez,queue.name" whill be unset in current session. Code are as below:
> {code:java}
> // TezSessionState.startSessionAndContainers()
> // sessionState.getQueueName() comes from cluster wide configured queue names.
>  // sessionState.getConf().get("tez.queue.name") is explicitly set by user in 
> a session.
>  // TezSessionPoolManager sets tez.queue.name if user has specified one or 
> use the one from
>  // cluster wide queue names.
>  // There is no way to differentiate how this was set (user vs system).
>  // Unset this after opening the session so that reopening of session uses 
> the correct queue
>  // names i.e, if client has not died and if the user has explicitly set a 
> queue name
>  // then reopened session will use user specified queue name else default 
> cluster queue names.
>  conf.unset(TezConfiguration.TEZ_QUEUE_NAME);
> {code}
> So after the orgin task was submited to yarn, "tez.queue.name" will be unset. 
> While starting merge file task, it will try use the same session with orgin 
> job, but get false due to tez.queue.name was unset. Seems like we could not 
> unset this property.
> {code:java}
> // TezSessionPoolManager.canWorkWithSameSession()
> if (!session.isDefault()) {
>   String queueName = session.getQueueName();
>   String confQueueName = conf.get(TezConfiguration.TEZ_QUEUE_NAME);
>   LOG.info("Current queue name is " + queueName + " incoming queue name is " 
> + confQueueName);
>   return (queueName == null) ? confQueueName == null : 
> queueName.equals(confQueueName);
> } else {
>   // this session should never be a default session unless something has 
> messed up.
>   throw new HiveException("The pool session " + session + " should have been 
> returned to the pool"); 
> }
> {code}
>    !15940042679272.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24333:
--
Labels: pull-request-available  (was: )

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=506499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506499
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 30/Oct/20 02:15
Start Date: 30/Oct/20 02:15
Worklog Time Spent: 10m 
  Work Description: miklosgergely opened a new pull request #1629:
URL: https://github.com/apache/hive/pull/1629


   ### What changes were proposed in this pull request?
   The Driver class has some very long methods, they are now having a more 
manageable size. Also some minor checkstyle errors were fixed in some Driver 
associated classes
   
   ### Why are the changes needed?
   To make the Driver class more understandable.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   All the tests are still running.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506499)
Remaining Estimate: 0h
Time Spent: 10m

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-10-29 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely reassigned HIVE-24333:
-


> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-29 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24325:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This error happens when one of the columns that is used in the output 
> backtracks to a constant. We end up without a mapping for the column, which 
> leads to exception below.
> {code}
> org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
> target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
> 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
> at 
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1570)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:549)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12539)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
>

[jira] [Work logged] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24325?focusedWorklogId=506476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506476
 ]

ASF GitHub Bot logged work on HIVE-24325:
-

Author: ASF GitHub Bot
Created on: 30/Oct/20 01:01
Start Date: 30/Oct/20 01:01
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1622:
URL: https://github.com/apache/hive/pull/1622


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506476)
Time Spent: 50m  (was: 40m)

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This error happens when one of the columns that is used in the output 
> backtracks to a constant. We end up without a mapping for the column, which 
> leads to exception below.
> {code}
> org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
> target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
> 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
> at 
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
>

[jira] [Updated] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-10-29 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-24332:
--
Description: 
Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are 
designed.

Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
functionality that is not commonly used.  Remove deprecated methods that were 
deprecated in 3.x (or maybe even older).

Make it like Java's {{ByteChannel}} that provides implementations for both 
{{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.

  was:
Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are 
designed.

Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
functionality that is not commonly used.

Make it like Java's {{ByteChannel}} that provides implementations for both 
{{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.


> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24332:
--
Labels: pull-request-available  (was: )

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=506443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506443
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 22:38
Start Date: 29/Oct/20 22:38
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1628:
URL: https://github.com/apache/hive/pull/1628


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506443)
Remaining Estimate: 0h
Time Spent: 10m

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-10-29 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-24332:
--
Description: 
Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are 
designed.

Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
functionality that is not commonly used.

Make it like Java's {{ByteChannel}} that provides implementations for both 
{{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.

  was:
Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are 
designed.

Simplify, and consolidate more functionality into {{AbstractSerDe}}.

Make it like Java's {{ByteChannel}} that provides implementations for both 
{{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.


> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-10-29 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24332:
-


> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24262) Optimise NullScanTaskDispatcher for cloud storage

2020-10-29 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24262:

Status: Patch Available  (was: Open)

> Optimise NullScanTaskDispatcher for cloud storage
> -
>
> Key: HIVE-24262
> URL: https://issues.apache.org/jira/browse/HIVE-24262
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>
> {noformat}
> select count(DISTINCT ss_sold_date_sk) from store_sales;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 5.55 s
> --
> INFO  : Status: DAG finished successfully in 5.44 seconds
> INFO  :
> INFO  : Query Execution Summary
> INFO  : 
> --
> INFO  : OPERATIONDURATION
> INFO  : 
> --
> INFO  : Compile Query 102.02s
> INFO  : Prepare Plan0.51s
> INFO  : Get Query Coordinator (AM)  0.01s
> INFO  : Submit Plan 0.33s
> INFO  : Start DAG   0.56s
> INFO  : Run DAG 5.44s
> INFO  : 
> --
> {noformat}
> Reason for "102 seconds" compilation time is that, it ends up doing 
> "isEmptyPath" check for every partition path and takes lot of time in 
> compilation phase.
> If the parent directory of all paths belong to the same path, we could just 
> do a recursive listing just once (instead of listing each directory one at a 
> time sequentially) in cloud storage systems.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L158
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L121
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L101
> With a temp hacky fix, it comes down to 2 seconds from 100+ seconds.
> {noformat}
> INFO  : Dag name: select count(DISTINCT ss_sold_...store_sales (Stage-1)
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1602500203747_0003)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 1.23 s
> --
> INFO  : Status: DAG finished successfully in 1.20 seconds
> INFO  :
> INFO  : Query Execution Summary
> INFO  : 
> --
> INFO  : OPERATIONDURATION
> INFO  : 
> --
> INFO  : Compile Query   0.85s
> INFO  : Prepare Plan0.17s
> INFO  : Get Query Coordinator (AM)  0.00s
> INFO  : Submit Plan 0.03s
> INFO  : Start DAG   0.03s
> INFO  : Run DAG 1.20s
> INFO  : 
> --
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=506427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506427
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 21:48
Start Date: 29/Oct/20 21:48
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1627:
URL: https://github.com/apache/hive/pull/1627


   …e cases
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506427)
Time Spent: 2.5h  (was: 2h 20m)

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24294) TezSessionPool sessions can throw AssertionError

2020-10-29 Thread Naresh P R (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223178#comment-17223178
 ] 

Naresh P R edited comment on HIVE-24294 at 10/29/20, 7:48 PM:
--

Thanks for the review & commit [~lpinter] & [~mustafaiman]


was (Author: nareshpr):
Thanks for the review & commit [~lpinter]

> TezSessionPool sessions can throw AssertionError
> 
>
> Key: HIVE-24294
> URL: https://issues.apache.org/jira/browse/HIVE-24294
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Whenever default TezSessionPool sessions are reopened for some reason, we are 
> setting dagResources to null before close & setting it back in openWhenever 
> default TezSessionPool sessions are reopened for some reason, we are setting 
> dagResources to null before close & setting it back in open
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
> If there is an exception in sessionState.close(), we are not restoring the 
> dagResource but moving the session back to TezSessionPool.eg., exception 
> trace when sessionState.close() failed
> {code:java}
> 2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: 
> client.TezClient (:()) - Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, 
> diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, 
> sessionTimeoutInterval=60 ms
> Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0   
>  at 
> org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
> at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code}
> Because of this, all new queries using this corrupted sessions are failing 
> with below exception
> {code:java}
> Caused by: java.lang.AssertionError: Ensure called on an unitialized (or 
> closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: 
> java.lang.AssertionError: Ensure called on an unitialized (or closed) session 
> 41774265-b7da-4d58-84a8-1bedfd597aec at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24294) TezSessionPool sessions can throw AssertionError

2020-10-29 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R resolved HIVE-24294.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> TezSessionPool sessions can throw AssertionError
> 
>
> Key: HIVE-24294
> URL: https://issues.apache.org/jira/browse/HIVE-24294
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Whenever default TezSessionPool sessions are reopened for some reason, we are 
> setting dagResources to null before close & setting it back in openWhenever 
> default TezSessionPool sessions are reopened for some reason, we are setting 
> dagResources to null before close & setting it back in open
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
> If there is an exception in sessionState.close(), we are not restoring the 
> dagResource but moving the session back to TezSessionPool.eg., exception 
> trace when sessionState.close() failed
> {code:java}
> 2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: 
> client.TezClient (:()) - Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, 
> diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, 
> sessionTimeoutInterval=60 ms
> Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0   
>  at 
> org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
> at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code}
> Because of this, all new queries using this corrupted sessions are failing 
> with below exception
> {code:java}
> Caused by: java.lang.AssertionError: Ensure called on an unitialized (or 
> closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: 
> java.lang.AssertionError: Ensure called on an unitialized (or closed) session 
> 41774265-b7da-4d58-84a8-1bedfd597aec at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24294) TezSessionPool sessions can throw AssertionError

2020-10-29 Thread Naresh P R (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223178#comment-17223178
 ] 

Naresh P R commented on HIVE-24294:
---

Thanks for the review & commit [~lpinter]

> TezSessionPool sessions can throw AssertionError
> 
>
> Key: HIVE-24294
> URL: https://issues.apache.org/jira/browse/HIVE-24294
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Whenever default TezSessionPool sessions are reopened for some reason, we are 
> setting dagResources to null before close & setting it back in openWhenever 
> default TezSessionPool sessions are reopened for some reason, we are setting 
> dagResources to null before close & setting it back in open
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
> If there is an exception in sessionState.close(), we are not restoring the 
> dagResource but moving the session back to TezSessionPool.eg., exception 
> trace when sessionState.close() failed
> {code:java}
> 2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: 
> client.TezClient (:()) - Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, 
> diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, 
> sessionTimeoutInterval=60 ms
> Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0   
>  at 
> org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
> at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code}
> Because of this, all new queries using this corrupted sessions are failing 
> with below exception
> {code:java}
> Caused by: java.lang.AssertionError: Ensure called on an unitialized (or 
> closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: 
> java.lang.AssertionError: Ensure called on an unitialized (or closed) session 
> 41774265-b7da-4d58-84a8-1bedfd597aec at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506364
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 19:41
Start Date: 29/Oct/20 19:41
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1624:
URL: https://github.com/apache/hive/pull/1624


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506364)
Time Spent: 4h  (was: 3h 50m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506362
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 19:41
Start Date: 29/Oct/20 19:41
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1624:
URL: https://github.com/apache/hive/pull/1624


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506362)
Time Spent: 3h 50m  (was: 3h 40m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24325?focusedWorklogId=506361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506361
 ]

ASF GitHub Bot logged work on HIVE-24325:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 19:36
Start Date: 29/Oct/20 19:36
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on pull request #1622:
URL: https://github.com/apache/hive/pull/1622#issuecomment-718975783


   :+1: new approach 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506361)
Time Spent: 40m  (was: 0.5h)

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This error happens when one of the columns that is used in the output 
> backtracks to a constant. We end up without a mapping for the column, which 
> leads to exception below.
> {code}
> org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
> target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
> 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
> at 
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
>

[jira] [Work logged] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24325?focusedWorklogId=506347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506347
 ]

ASF GitHub Bot logged work on HIVE-24325:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 19:02
Start Date: 29/Oct/20 19:02
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1622:
URL: https://github.com/apache/hive/pull/1622#issuecomment-718958588


   @kasakrisz , I have changed the approach slightly (I think new code is 
better). Could you take another quick look? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506347)
Time Spent: 0.5h  (was: 20m)

> Cardinality preserving join optimization fails when column is backtracked to 
> a constant
> ---
>
> Key: HIVE-24325
> URL: https://issues.apache.org/jira/browse/HIVE-24325
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This error happens when one of the columns that is used in the output 
> backtracks to a constant. We end up without a mapping for the column, which 
> leads to exception below.
> {code}
> org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no 
> target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, 
> 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]]
> at 
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>

[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=506327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506327
 ]

ASF GitHub Bot logged work on HIVE-24222:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 18:11
Start Date: 29/Oct/20 18:11
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun closed pull request #1615:
URL: https://github.com/apache/hive/pull/1615


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506327)
Time Spent: 2h 20m  (was: 2h 10m)

> Upgrade ORC to 1.5.12
> -
>
> Key: HIVE-24222
> URL: https://issues.apache.org/jira/browse/HIVE-24222
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506314
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 17:48
Start Date: 29/Oct/20 17:48
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1616:
URL: https://github.com/apache/hive/pull/1616#issuecomment-718917847


   Thank you so much, @sunchao!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506314)
Time Spent: 1h 50m  (was: 1h 40m)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506310
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 17:34
Start Date: 29/Oct/20 17:34
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1616:
URL: https://github.com/apache/hive/pull/1616#issuecomment-718909196


   opened #1626



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506310)
Time Spent: 1h 40m  (was: 1.5h)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24331) Add Jenkinsfile for branch-3.1

2020-10-29 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-24331:
---


> Add Jenkinsfile for branch-3.1
> --
>
> Key: HIVE-24331
> URL: https://issues.apache.org/jira/browse/HIVE-24331
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> We should add Jenkinsfile for branch-3.1 so that ppl can file PR against it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506307
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 17:25
Start Date: 29/Oct/20 17:25
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1616:
URL: https://github.com/apache/hive/pull/1616#issuecomment-718904044


   yeah I can help on that - I think it won't be too difficult after going 
through the process for branch-2.3.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506307)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506301
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 17:14
Start Date: 29/Oct/20 17:14
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1616:
URL: https://github.com/apache/hive/pull/1616#issuecomment-718898121


   Do you think you can do that for the Apache Hive community?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506301)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506295=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506295
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 17:07
Start Date: 29/Oct/20 17:07
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1616:
URL: https://github.com/apache/hive/pull/1616#issuecomment-718893556


   No. There is no jenkins file in branch-3.1 so there's no way to run CI at 
the moment. We'd have to do something similar to #1398 to enable that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506295)
Time Spent: 1h 10m  (was: 1h)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506292
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 17:04
Start Date: 29/Oct/20 17:04
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1616:
URL: https://github.com/apache/hive/pull/1616#issuecomment-718891495


   Hi, @sunchao .
   There is no way to trigger the real CI until now?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506292)
Time Spent: 1h  (was: 50m)

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-29 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24217.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Attila Magyar!

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=506280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506280
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 16:44
Start Date: 29/Oct/20 16:44
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1542:
URL: https://github.com/apache/hive/pull/1542


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506280)
Time Spent: 4h 50m  (was: 4h 40m)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24314?focusedWorklogId=506249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506249
 ]

ASF GitHub Bot logged work on HIVE-24314:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 15:45
Start Date: 29/Oct/20 15:45
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1613:
URL: https://github.com/apache/hive/pull/1613#discussion_r514363631



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -201,16 +201,17 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB) 
throws MetaException {
 LOG.debug("Cleaning based on writeIdList: " + validWriteIdList);
   }
 
+  final boolean[] removedFiles = new boolean[1];

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506249)
Time Spent: 0.5h  (was: 20m)

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one file, then the compaction queue entry as 
> "succeeded". This is a partial fix, HIVE-24291 is the complete fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24291?focusedWorklogId=506248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506248
 ]

ASF GitHub Bot logged work on HIVE-24291:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 15:44
Start Date: 29/Oct/20 15:44
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1592:
URL: https://github.com/apache/hive/pull/1592#discussion_r514362309



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -281,9 +280,14 @@ public void markCompacted(CompactionInfo info) throws 
MetaException {
   try {
 dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
 stmt = dbConn.createStatement();
-String s = "SELECT \"CQ_ID\", \"CQ_DATABASE\", \"CQ_TABLE\", 
\"CQ_PARTITION\", " +
-"\"CQ_TYPE\", \"CQ_RUN_AS\", \"CQ_HIGHEST_WRITE_ID\" FROM 
\"COMPACTION_QUEUE\" " +
-"WHERE \"CQ_STATE\" = '" + READY_FOR_CLEANING + "'";
+/*
+ * By filtering on minOpenTxnWaterMark, we will only cleanup after 
every transaction is committed, that could see
+ * the uncompacted deltas. This way the cleaner can clean up 
everything that was made obsolete by this compaction.
+ */
+long minOpenTxnWaterMark = getMinOpenTxnIdWaterMark(dbConn);

Review comment:
   Cleaner already knows this value, Cleaner#run calls 
CompactionTxnHandler#findMinOpenTxnIdForCleaner first, then findReadyToClean, 
so you can just pass it into findReadyToClean.
   
   (Btw findMinOpenTxnIdForCleaner doesn't filter out timed out txns like 
getMinOpenTxnIdWaterMark does, might want to change that? 
(AcidHouseKeeperService should take care of that, but who knows if it's on... 
on the other hand that's another query and would take longer))

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1526,6 +1529,10 @@ private void 
updateWSCommitIdAndCleanUpMetadata(Statement stmt, long txnid, TxnT
 if (txnType == TxnType.MATER_VIEW_REBUILD) {
   queryBatch.add("DELETE FROM \"MATERIALIZATION_REBUILD_LOCKS\" WHERE 
\"MRL_TXN_ID\" = " + txnid);
 }
+if (txnType == TxnType.COMPACTION) {

Review comment:
   It's not the end of the world to add the CQ_TXN_ID column, but we can 
avoid that and keep things more straightforward (i.e. keep compaction stuff out 
of generic TxnHandler and limit it to CompactionTxnHandler which was made 
specifically for updating compaction-related tables) by updating CQ_NEXT_TXN_ID 
in CompactionTxnHandler instead, and calling it straight from Worker, maybe 
right between commitTxn and markCompacted. It would be so much simpler.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506248)
Time Spent: 40m  (was: 0.5h)

> Compaction Cleaner prematurely cleans up deltas
> ---
>
> Key: HIVE-24291
> URL: https://issues.apache.org/jira/browse/HIVE-24291
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since HIVE-23107 the cleaner can clean up deltas that are still used by 
> running queries.
> Example:
>  * TxnId 1-5 writes to a partition, all commits
>  * Compactor starts with txnId=6
>  * Long running query starts with txnId=7, it sees txnId=6 as open in its 
> snapshot
>  * Compaction commits
>  * Cleaner runs
> Previously min_history_level table would have prevented the Cleaner to delete 
> the deltas1-5 until txnId=7 is open, but now they will be deleted and the 
> long running query may fail if its tries to access the files.
> Solution could be to not run the cleaner until any txn is open that was 
> opened before the compaction was committed (CQ_NEXT_TXN_ID)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24330) Automate setAcl on cmRoot directories.

2020-10-29 Thread Arko Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma reassigned HIVE-24330:
--


> Automate setAcl on cmRoot directories.
> --
>
> Key: HIVE-24330
> URL: https://issues.apache.org/jira/browse/HIVE-24330
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506216
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 13:29
Start Date: 29/Oct/20 13:29
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1624:
URL: https://github.com/apache/hive/pull/1624#issuecomment-718752892


   ```
   [2020-10-29T13:26:53.569Z] [ERROR] Nullcheck of out at line 78 of value 
previously dereferenced in 
org.apache.hadoop.hive.common.AcidMetaDataFile.writeToFile(FileSystem, Path, 
AcidMetaDataFile$DataFormat) [org.apache.hadoop.hive.common.AcidMetaDataFile, 
org.apache.hadoop.hive.common.AcidMetaDataFile] At AcidMetaDataFile.java:[line 
78]Redundant null check at AcidMetaDataFile.java:[line 82] 
RCN_REDUNDANT_NULLCHECK_WOULD_HAVE_BEEN_A_NPE
   ```
   
   I don't think that's related to this change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506216)
Time Spent: 3h 40m  (was: 3.5h)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506213
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 13:22
Start Date: 29/Oct/20 13:22
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1241:
URL: https://github.com/apache/hive/pull/1241#issuecomment-718748712


   @abstractdog Re-opened PR at #1624



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506213)
Time Spent: 3.5h  (was: 3h 20m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506212=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506212
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 13:21
Start Date: 29/Oct/20 13:21
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1624:
URL: https://github.com/apache/hive/pull/1624


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506212)
Time Spent: 3h 20m  (was: 3h 10m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=506181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506181
 ]

ASF GitHub Bot logged work on HIVE-18284:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 11:45
Start Date: 29/Oct/20 11:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r514196451



##
File path: itests/src/test/resources/testconfiguration.properties
##
@@ -6,6 +6,7 @@ minimr.query.files=\
 
 # Queries ran by both MiniLlapLocal and MiniTez
 minitez.query.files.shared=\
+  dynpart_sort_optimization_distribute_by.q,\

Review comment:
   do we need to run this test with minitez - or it may run with 
minillaplocal?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506181)
Time Spent: 2h 20m  (was: 2h 10m)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by:

[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=506180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506180
 ]

ASF GitHub Bot logged work on HIVE-18284:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 11:44
Start Date: 29/Oct/20 11:44
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r514195755



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplicationUtils.java
##
@@ -181,6 +183,23 @@ public static boolean merge(HiveConf hiveConf, 
ReduceSinkOperator cRS, ReduceSin
 TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(new 
ArrayList(), pRS
 .getConf().getOrder(), pRS.getConf().getNullOrder());
 pRS.getConf().setKeySerializeInfo(keyTable);
+  } else if (cRS.getConf().getKeyCols() != null && 
cRS.getConf().getKeyCols().size() > 0) {
+ArrayList keyColNames = Lists.newArrayList();
+for (ExprNodeDesc keyCol : pRS.getConf().getKeyCols()) {
+  String keyColName = keyCol.getExprString();
+  keyColNames.add(keyColName);
+}
+List fields = 
PlanUtils.getFieldSchemasFromColumnList(pRS.getConf().getKeyCols(),
+keyColNames, 0, "");
+TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(fields, 
pRS.getConf().getOrder(),
+pRS.getConf().getNullOrder());
+ArrayList outputKeyCols = Lists.newArrayList();
+for (int i = 0; i < fields.size(); i++) {
+  outputKeyCols.add(fields.get(i).getName());
+}
+pRS.getConf().setOutputKeyColumnNames(outputKeyCols);
+pRS.getConf().setKeySerializeInfo(keyTable);
+
pRS.getConf().setNumDistributionKeys(cRS.getConf().getNumDistributionKeys());
   }

Review comment:
   yes; you are correct 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506180)
Time Spent: 2h 10m  (was: 2h)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
>

[jira] [Assigned] (HIVE-24329) Add HMS notification for compaction commit

2020-10-29 Thread Peter Varga (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-24329:
--


> Add HMS notification for compaction commit
> --
>
> Key: HIVE-24329
> URL: https://issues.apache.org/jira/browse/HIVE-24329
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> This could be used by file metadata caches, to invalidate the cache content



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24328) Run distcp in parallel for all file entries in repl load.

2020-10-29 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-24328:
--


> Run distcp in parallel for all file entries in repl load.
> -
>
> Key: HIVE-24328
> URL: https://issues.apache.org/jira/browse/HIVE-24328
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24318) When GlobalLimit is efficient, query will run twice with "Retry query with a different approach..."

2020-10-29 Thread libo (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

libo updated HIVE-24318:

Description: 
hive.limit.optimize.enable=true

hive.limit.row.max.size=1000

hive.limit.optimize.fetch.max=1000

hive.fetch.task.conversion.threshold=256

hive.fetch.task.conversion=more

 

*sql eg:*

select db_name,concat(tb_name,'test')  tbname from (select * from test1.t3 
where dt='0909' limit 10)t1;

(only partitioned table has this problem)

*console information:*

*..*

Kill Command = /appcom/hadoop/bin/hadoop job -kill job_1600683831691_837491
Hadoop job information for Stage-1: number of {color:#FF}mappers: 1{color}; 
number of reducers: 1
 map = 0%, reduce = 0%
 map = 100%, reduce = 0%, Cumulative CPU 6.33 sec
 map = 100%, reduce = 100%, Cumulative CPU 13.69 sec
MapReduce Total cumulative CPU time: 13 seconds 690 msec
Ended Job = job_1600683831691_837491
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 13.69 sec HDFS Read: 4389 
HDFS Write: 4115 SUCCESS
Total MapReduce CPU Time Spent: 13 seconds 690 msec
OK

db_name      tbname 

..

Retry query with a different approach...

..

 Kill Command = /appcom/hadoop/bin/hadoop job -kill job_1600683831691_837520
Hadoop job information for Stage-1: number of {color:#FF}mappers: 
176{color}; number of reducers: 1

..

as we can see, the mr run twice，first time the global limit is efficient and 
the second time is not

*exception stack:*

org.apache.hadoop.hive.ql.CommandNeedRetryException
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2022)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:317)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:232)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:475)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:855)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:794)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:721)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

  was:
hive.limit.optimize.enable=true

hive.limit.row.max.size=1000

hive.limit.optimize.fetch.max=1000

hive.fetch.task.conversion.threshold=256

hive.fetch.task.conversion=more

 

*sql eg:*

select db_name,concat(tb_name,'test') from (select * from test1.t3 where 
dt='0909' limit 10)t1;

(only partitioned table)

*console information:*

Retry query with a different approach...

 

*exception stack:*

org.apache.hadoop.hive.ql.CommandNeedRetryException
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2022)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:317)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:232)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:475)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:855)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:794)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:721)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:236)


> When GlobalLimit is efficient, query will run twice with "Retry query with a 
> different approach..."
> ---
>
> Key: HIVE-24318
> URL: https://issues.apache.org/jira/browse/HIVE-24318
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.1
> Environment: Hadoop 2.6.0
> Hive-2.0.1
>Reporter: libo
>Assignee: libo
>Priority: Minor
> Attachments: HIVE-24318.patch
>
>
> hive.limit.optimize.enable=true
> hive.limit.row.max.size=1000
> hive.limit.optimize.fetch.max=1000
> hive.fetch.task.conversion.threshold=256
> hive.fetch.task.conversion=more
>  
> *sql eg:*
> select db_name,concat(tb_name,'test')  tbname from (select * from test1.t3 
> where dt='0909' limit 10)t1;
> (only partitioned table has this problem)
> *console

[jira] [Updated] (HIVE-24307) Beeline with property-file and -e parameter is failing

2020-10-29 Thread Anishek Agarwal (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-24307:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to master , Thanks for the patch [~ayushtkn] and review 
[~aasha]!

> Beeline with property-file and -e parameter is failing
> --
>
> Key: HIVE-24307
> URL: https://issues.apache.org/jira/browse/HIVE-24307
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24307-01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Beeline query with property file specified with -e parameter fails with :
> {noformat}
> Cannot run commands specified using -e. No current connection
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24307) Beeline with property-file and -e parameter is failing

2020-10-29 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-24307:

Status: Patch Available  (was: Open)

> Beeline with property-file and -e parameter is failing
> --
>
> Key: HIVE-24307
> URL: https://issues.apache.org/jira/browse/HIVE-24307
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24307-01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Beeline query with property file specified with -e parameter fails with :
> {noformat}
> Cannot run commands specified using -e. No current connection
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24307) Beeline with property-file and -e parameter is failing

2020-10-29 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-24307:

Attachment: HIVE-24307-01.patch

> Beeline with property-file and -e parameter is failing
> --
>
> Key: HIVE-24307
> URL: https://issues.apache.org/jira/browse/HIVE-24307
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24307-01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Beeline query with property file specified with -e parameter fails with :
> {noformat}
> Cannot run commands specified using -e. No current connection
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24327) AtlasServer entity may not be present during first Atlas metadata dump

2020-10-29 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24327:

Attachment: HIVE-24327.01.patch

> AtlasServer entity may not be present during first Atlas metadata dump
> --
>
> Key: HIVE-24327
> URL: https://issues.apache.org/jira/browse/HIVE-24327
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24327.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24314?focusedWorklogId=506128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506128
 ]

ASF GitHub Bot logged work on HIVE-24314:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 09:05
Start Date: 29/Oct/20 09:05
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1613:
URL: https://github.com/apache/hive/pull/1613#discussion_r514102648



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -201,16 +201,17 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB) 
throws MetaException {
 LOG.debug("Cleaning based on writeIdList: " + validWriteIdList);
   }
 
+  final boolean[] removedFiles = new boolean[1];

Review comment:
   You can use org.apache.hive.common.util.Ref for this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506128)
Time Spent: 20m  (was: 10m)

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one file, then the compaction queue entry as 
> "succeeded". This is a partial fix, HIVE-24291 is the complete fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24327) AtlasServer entity may not be present during first Atlas metadata dump

2020-10-29 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24327:

Summary: AtlasServer entity may not be present during first Atlas metadata 
dump  (was: During Atlas metadata replication, handle a case when AtlasServer 
entity is not present )

> AtlasServer entity may not be present during first Atlas metadata dump
> --
>
> Key: HIVE-24327
> URL: https://issues.apache.org/jira/browse/HIVE-24327
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24327) During Atlas metadata replication, handle a case when AtlasServer entity is not present

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24327:
--
Labels: pull-request-available  (was: )

> During Atlas metadata replication, handle a case when AtlasServer entity is 
> not present 
> 
>
> Key: HIVE-24327
> URL: https://issues.apache.org/jira/browse/HIVE-24327
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24327) During Atlas metadata replication, handle a case when AtlasServer entity is not present

2020-10-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24327?focusedWorklogId=506122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506122
 ]

ASF GitHub Bot logged work on HIVE-24327:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 08:54
Start Date: 29/Oct/20 08:54
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1623:
URL: https://github.com/apache/hive/pull/1623


   …asServer entity is not present
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506122)
Remaining Estimate: 0h
Time Spent: 10m

> During Atlas metadata replication, handle a case when AtlasServer entity is 
> not present 
> 
>
> Key: HIVE-24327
> URL: https://issues.apache.org/jira/browse/HIVE-24327
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24327) During Atlas metadata replication, handle a case when AtlasServer entity is not present

2020-10-29 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24327:

Summary: During Atlas metadata replication, handle a case when AtlasServer 
entity is not present   (was: During Atlas metadata replication handle a case 
when AtlasServer entity is not present )

> During Atlas metadata replication, handle a case when AtlasServer entity is 
> not present 
> 
>
> Key: HIVE-24327
> URL: https://issues.apache.org/jira/browse/HIVE-24327
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24327) During Atlas metadata replication handle a case when AtlasServer entity is not present

2020-10-29 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24327:

Summary: During Atlas metadata replication handle a case when AtlasServer 
entity is not present   (was: During Atlas metadata replication handle a case 
when AtlasServer entity not present )

> During Atlas metadata replication handle a case when AtlasServer entity is 
> not present 
> ---
>
> Key: HIVE-24327
> URL: https://issues.apache.org/jira/browse/HIVE-24327
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24327) During Atlas metadata replication handle a case when AtlasServer entity not present

2020-10-29 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha reassigned HIVE-24327:
---


> During Atlas metadata replication handle a case when AtlasServer entity not 
> present 
> 
>
> Key: HIVE-24327
> URL: https://issues.apache.org/jira/browse/HIVE-24327
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24326) HiveServer memory leak

2020-10-29 Thread zengxl (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl updated HIVE-24326:
--
Attachment: QQ图片20201029161110.png

> HiveServer memory leak
> --
>
> Key: HIVE-24326
> URL: https://issues.apache.org/jira/browse/HIVE-24326
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: zengxl
>Priority: Major
> Attachments: QQ图片20201029160447.png, QQ图片20201029161110.png
>
>
> After a while, the hiveserver we produce will fill up with JVMS, resulting in 
> unresponsive hiveservers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24275) Configurations to delay the deletion of obsolete files by the Cleaner

2020-10-29 Thread Peter Varga (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-24275:
--

Assignee: Peter Varga

> Configurations to delay the deletion of obsolete files by the Cleaner
> -
>
> Key: HIVE-24275
> URL: https://issues.apache.org/jira/browse/HIVE-24275
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kishen Das
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Whenever compaction happens, the cleaner immediately deletes older obsolete 
> files. In certain cases it would be beneficial to retain these for certain 
> period. For example : if you are serving the file metadata from cache and 
> don't want to invalidate the cache during compaction because of performance 
> reasons. 
> For this purpose we should introduce a configuration 
> hive.compactor.delayed.cleanup.enabled, which if enabled will delay the 
> cleaning up obsolete files. There should be a separate configuration 
> CLEANER_RETENTION_TIME to specify the duration till which we should retain 
> these older obsolete files. 
> It might be beneficial to have one more configuration to decide whether to 
> retain files involved in an aborted transaction 
> hive.compactor.aborted.txn.delayed.cleanup.enabled . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

61 matches

Mail list logo