[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization
[ https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=506534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506534 ] ASF GitHub Bot logged work on HIVE-18284: - Author: ASF GitHub Bot Created on: 30/Oct/20 05:18 Start Date: 30/Oct/20 05:18 Worklog Time Spent: 10m Work Description: shameersss1 commented on a change in pull request #1400: URL: https://github.com/apache/hive/pull/1400#discussion_r514873327 ## File path: itests/src/test/resources/testconfiguration.properties ## @@ -6,6 +6,7 @@ minimr.query.files=\ # Queries ran by both MiniLlapLocal and MiniTez minitez.query.files.shared=\ + dynpart_sort_optimization_distribute_by.q,\ Review comment: For some reason, The issue is not reproducible with LLAP, Hence running this with mini tez This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506534) Time Spent: 2.5h (was: 2h 20m) > NPE when inserting data with 'distribute by' clause with dynpart sort > optimization > -- > > Key: HIVE-18284 > URL: https://issues.apache.org/jira/browse/HIVE-18284 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2 >Reporter: Aki Tanaka >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > A Null Pointer Exception occurs when inserting data with 'distribute by' > clause. The following snippet query reproduces this issue: > *(non-vectorized , non-llap mode)* > {code:java} > create table table1 (col1 string, datekey int); > insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1); > create table table2 (col1 string) partitioned by (datekey int); > set hive.vectorized.execution.enabled=false; > set hive.optimize.sort.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > insert into table table2 > PARTITION(datekey) > select col1, > datekey > from table1 > distribute by datekey ; > {code} > I could run the insert query without the error if I remove Distribute By or > use Cluster By clause. > It seems that the issue happens because Distribute By does not guarantee > clustering or sorting properties on the distributed keys. > FileSinkOperator removes the previous fsp. FileSinkOperator will remove the > previous fsp which might be re-used when we use Distribute By. > https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972 > The following stack trace is logged. > {code:java} > Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, > diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >
[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check
[ https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=506529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506529 ] ASF GitHub Bot logged work on HIVE-24259: - Author: ASF GitHub Bot Created on: 30/Oct/20 04:21 Start Date: 30/Oct/20 04:21 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1610: URL: https://github.com/apache/hive/pull/1610#discussion_r514829244 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2836,14 +2836,32 @@ long getPartsFound() { @Override public SQLAllTableConstraints getAllTableConstraints(String catName, String dbName, String tblName) throws MetaException, NoSuchObjectException { -SQLAllTableConstraints sqlAllTableConstraints = new SQLAllTableConstraints(); -sqlAllTableConstraints.setPrimaryKeys(getPrimaryKeys(catName, dbName, tblName)); -sqlAllTableConstraints.setForeignKeys(getForeignKeys(catName, null, null, dbName, tblName)); -sqlAllTableConstraints.setUniqueConstraints(getUniqueConstraints(catName, dbName, tblName)); - sqlAllTableConstraints.setDefaultConstraints(getDefaultConstraints(catName, dbName, tblName)); -sqlAllTableConstraints.setCheckConstraints(getCheckConstraints(catName, dbName, tblName)); - sqlAllTableConstraints.setNotNullConstraints(getNotNullConstraints(catName, dbName, tblName)); -return sqlAllTableConstraints; + +catName = StringUtils.normalizeIdentifier(catName); +dbName = StringUtils.normalizeIdentifier(dbName); +tblName = StringUtils.normalizeIdentifier(tblName); +if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction())) { + return rawStore.getAllTableConstraints(catName, dbName, tblName); +} + +Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName); +if (tbl == null) { + // The table containing the constraints is not yet loaded in cache + return rawStore.getAllTableConstraints(catName, dbName, tblName); +} +SQLAllTableConstraints constraints = sharedCache.listCachedAllTableConstraints(catName, dbName, tblName); + +// if any of the constraint value is missing then there might be the case of partial constraints are stored in cached. +// So fall back to raw store for correct values +if (constraints != null && CollectionUtils.isNotEmpty(constraints.getPrimaryKeys()) && CollectionUtils Review comment: Adding a flag required bigger change. So for now I am reducing the scope of this PR to optimise following 1. Check only once if table exit in cached store. 2. Instead of calling individual constraint in cached store. Add a method which return all constraint at once and if data is not consistent then fall back to rawstore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506529) Time Spent: 1h (was: 50m) > [CachedStore] Optimise get constraints call by removing redundant table check > -- > > Key: HIVE-24259 > URL: https://issues.apache.org/jira/browse/HIVE-24259 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Description - > Problem - > 1. Redundant check if table is present or not > 2. Currently in order to get all constraint form the cachedstore. 6 different > call is made with in the cached store. Which led to 6 different call to raw > store > > DOD > 1. Check only once if table exit in cached store. > 2. Instead of calling individual constraint in cached store. Add a method > which return all constraint at once and if data is not consistent then fall > back to rawstore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check
[ https://issues.apache.org/jira/browse/HIVE-24259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-24259: - Description: Description - Problem - 1. Redundant check if table is present or not 2. Currently in order to get all constraint form the cachedstore. 6 different call is made with in the cached store. Which led to 6 different call to raw store DOD 1. Check only once if table exit in cached store. 2. Instead of calling individual constraint in cached store. Add a method which return all constraint at once and if data is not consistent then fall back to rawstore. was: Description - currently inorder to get all constraint form the cachedstore. 6 different call is made to the store. Instead combine that 6 call in 1 > [CachedStore] Optimise get constraints call by removing redundant table check > -- > > Key: HIVE-24259 > URL: https://issues.apache.org/jira/browse/HIVE-24259 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Description - > Problem - > 1. Redundant check if table is present or not > 2. Currently in order to get all constraint form the cachedstore. 6 different > call is made with in the cached store. Which led to 6 different call to raw > store > > DOD > 1. Check only once if table exit in cached store. > 2. Instead of calling individual constraint in cached store. Add a method > which return all constraint at once and if data is not consistent then fall > back to rawstore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check
[ https://issues.apache.org/jira/browse/HIVE-24259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-24259: - Summary: [CachedStore] Optimise get constraints call by removing redundant table check (was: [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1) > [CachedStore] Optimise get constraints call by removing redundant table check > -- > > Key: HIVE-24259 > URL: https://issues.apache.org/jira/browse/HIVE-24259 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Description - > currently inorder to get all constraint form the cachedstore. 6 different > call is made to the store. Instead combine that 6 call in 1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step
[ https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=506526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506526 ] ASF GitHub Bot logged work on HIVE-24241: - Author: ASF GitHub Bot Created on: 30/Oct/20 04:09 Start Date: 30/Oct/20 04:09 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1562: URL: https://github.com/apache/hive/pull/1562#discussion_r514803487 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorGraph.java ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.optimizer; + +import java.io.File; +import java.io.PrintWriter; +import java.util.HashMap; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.hadoop.hive.ql.exec.AppMasterEventOperator; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule.DiGraph; +import org.apache.hadoop.hive.ql.parse.ParseContext; +import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo; +import org.apache.hadoop.hive.ql.plan.DynamicPruningEventDesc; + +import com.google.common.collect.Sets; + +public class OperatorGraph { + + /** + * A directed graph extended with support to check dag property. + */ + static class DagGraph extends DiGraph { Review comment: In the meantime, maybe DiGraph could be made a top class. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java ## @@ -189,115 +189,121 @@ public RexNode analyzeRexNode(RexBuilder rexBuilder, RexNode condition) { return newCondition; } - /** - * Transforms inequality candidates into [NOT] BETWEEN calls. - * - */ - protected static class RexTransformIntoBetween extends RexShuttle { -private final RexBuilder rexBuilder; + public static class DiGraph { Review comment: Left the comment in other class but I was thinking that it may be a good idea to promote this to top class (at least until we replace it by any other library version as we were discussing). ## File path: ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out ## @@ -4317,7 +4301,7 @@ STAGE PLANS: outputColumnNames: ds Statistics: Num rows: 2000 Data size: 389248 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator - aggregations: max(ds) + aggregations: min(ds) Review comment: Any idea why this is happening? ## File path: ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out ## @@ -4277,37 +4277,21 @@ STAGE PLANS: alias: srcpart filterExpr: ds is not null (type: boolean) Statistics: Num rows: 2000 Data size: 389248 Basic stats: COMPLETE Column stats: COMPLETE - Filter Operator -predicate: ds is not null (type: boolean) Review comment: Note that the filter operator is removed. We need to be careful here because not all input formats guarantee that the filter expression is being applied / does not return false positives. I would expect the Filter remains but only a single time? ## File path: ql/src/test/results/clientpositive/perf/tez/query95.q.out ## @@ -128,7 +128,7 @@ Stage-0 Select Operator [SEL_235] (rows=144002668 width=7) Output:["_col0","_col1"] Filter Operator [FIL_234] (rows=144002668 width=7) - predicate:(ws_order_number is not null and (ws_order_number is not null or ws_order_number is not null)) +
[jira] [Updated] (HIVE-24331) Add Jenkinsfile for branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24331: -- Labels: pull-request-available (was: ) > Add Jenkinsfile for branch-3.1 > -- > > Key: HIVE-24331 > URL: https://issues.apache.org/jira/browse/HIVE-24331 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should add Jenkinsfile for branch-3.1 so that ppl can file PR against it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24331) Add Jenkinsfile for branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24331?focusedWorklogId=506521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506521 ] ASF GitHub Bot logged work on HIVE-24331: - Author: ASF GitHub Bot Created on: 30/Oct/20 03:54 Start Date: 30/Oct/20 03:54 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1626: URL: https://github.com/apache/hive/pull/1626#issuecomment-719156094 seems the `TestMiniDruidKafkaCliDriver` is timing out and we need https://issues.apache.org/jira/browse/HIVE-19170 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506521) Remaining Estimate: 0h Time Spent: 10m > Add Jenkinsfile for branch-3.1 > -- > > Key: HIVE-24331 > URL: https://issues.apache.org/jira/browse/HIVE-24331 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We should add Jenkinsfile for branch-3.1 so that ppl can file PR against it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23802) “merge files” job was submited to default queue when set hive.merge.tezfiles to true
[ https://issues.apache.org/jira/browse/HIVE-23802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaozhan ding updated HIVE-23802: Resolution: Duplicate Status: Resolved (was: Patch Available) > “merge files” job was submited to default queue when set hive.merge.tezfiles > to true > > > Key: HIVE-23802 > URL: https://issues.apache.org/jira/browse/HIVE-23802 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.1.0 >Reporter: gaozhan ding >Assignee: gaozhan ding >Priority: Major > Labels: pull-request-available > Attachments: 15940042679272.png, HIVE-23802.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > We use tez as the query engine. When hive.merge.tezfiles set to true,merge > files task, which followed by orginal task, will be submit to default queue > rather then the queue same with orginal task. > I study this issue for days and found that, every time starting a container, > "tez,queue.name" whill be unset in current session. Code are as below: > {code:java} > // TezSessionState.startSessionAndContainers() > // sessionState.getQueueName() comes from cluster wide configured queue names. > // sessionState.getConf().get("tez.queue.name") is explicitly set by user in > a session. > // TezSessionPoolManager sets tez.queue.name if user has specified one or > use the one from > // cluster wide queue names. > // There is no way to differentiate how this was set (user vs system). > // Unset this after opening the session so that reopening of session uses > the correct queue > // names i.e, if client has not died and if the user has explicitly set a > queue name > // then reopened session will use user specified queue name else default > cluster queue names. > conf.unset(TezConfiguration.TEZ_QUEUE_NAME); > {code} > So after the orgin task was submited to yarn, "tez.queue.name" will be unset. > While starting merge file task, it will try use the same session with orgin > job, but get false due to tez.queue.name was unset. Seems like we could not > unset this property. > {code:java} > // TezSessionPoolManager.canWorkWithSameSession() > if (!session.isDefault()) { > String queueName = session.getQueueName(); > String confQueueName = conf.get(TezConfiguration.TEZ_QUEUE_NAME); > LOG.info("Current queue name is " + queueName + " incoming queue name is " > + confQueueName); > return (queueName == null) ? confQueueName == null : > queueName.equals(confQueueName); > } else { > // this session should never be a default session unless something has > messed up. > throw new HiveException("The pool session " + session + " should have been > returned to the pool"); > } > {code} > !15940042679272.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces
[ https://issues.apache.org/jira/browse/HIVE-24333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24333: -- Labels: pull-request-available (was: ) > Cut long methods in Driver to smaller, more manageable pieces > - > > Key: HIVE-24333 > URL: https://issues.apache.org/jira/browse/HIVE-24333 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Some methods in Driver are too long to be easily understandable. They should > be cut into pieces to make them easier to understand. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces
[ https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=506499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506499 ] ASF GitHub Bot logged work on HIVE-24333: - Author: ASF GitHub Bot Created on: 30/Oct/20 02:15 Start Date: 30/Oct/20 02:15 Worklog Time Spent: 10m Work Description: miklosgergely opened a new pull request #1629: URL: https://github.com/apache/hive/pull/1629 ### What changes were proposed in this pull request? The Driver class has some very long methods, they are now having a more manageable size. Also some minor checkstyle errors were fixed in some Driver associated classes ### Why are the changes needed? To make the Driver class more understandable. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? All the tests are still running. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506499) Remaining Estimate: 0h Time Spent: 10m > Cut long methods in Driver to smaller, more manageable pieces > - > > Key: HIVE-24333 > URL: https://issues.apache.org/jira/browse/HIVE-24333 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Some methods in Driver are too long to be easily understandable. They should > be cut into pieces to make them easier to understand. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces
[ https://issues.apache.org/jira/browse/HIVE-24333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely reassigned HIVE-24333: - > Cut long methods in Driver to smaller, more manageable pieces > - > > Key: HIVE-24333 > URL: https://issues.apache.org/jira/browse/HIVE-24333 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > > Some methods in Driver are too long to be easily understandable. They should > be cut into pieces to make them easier to understand. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant
[ https://issues.apache.org/jira/browse/HIVE-24325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-24325: --- Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Cardinality preserving join optimization fails when column is backtracked to > a constant > --- > > Key: HIVE-24325 > URL: https://issues.apache.org/jira/browse/HIVE-24325 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This error happens when one of the columns that is used in the output > backtracks to a constant. We end up without a mapping for the column, which > leads to exception below. > {code} > org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no > target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, > 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]] > at > org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1570) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:549) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12539) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at >
[jira] [Work logged] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant
[ https://issues.apache.org/jira/browse/HIVE-24325?focusedWorklogId=506476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506476 ] ASF GitHub Bot logged work on HIVE-24325: - Author: ASF GitHub Bot Created on: 30/Oct/20 01:01 Start Date: 30/Oct/20 01:01 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1622: URL: https://github.com/apache/hive/pull/1622 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506476) Time Spent: 50m (was: 40m) > Cardinality preserving join optimization fails when column is backtracked to > a constant > --- > > Key: HIVE-24325 > URL: https://issues.apache.org/jira/browse/HIVE-24325 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > This error happens when one of the columns that is used in the output > backtracks to a constant. We end up without a mapping for the column, which > leads to exception below. > {code} > org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no > target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, > 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]] > at > org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at >
[jira] [Updated] (HIVE-24332) Make AbstractSerDe Superclass of all Classes
[ https://issues.apache.org/jira/browse/HIVE-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-24332: -- Description: Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are designed. Simplify, and consolidate more functionality into {{AbstractSerDe}}. Remove functionality that is not commonly used. Remove deprecated methods that were deprecated in 3.x (or maybe even older). Make it like Java's {{ByteChannel}} that provides implementations for both {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. was: Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are designed. Simplify, and consolidate more functionality into {{AbstractSerDe}}. Remove functionality that is not commonly used. Make it like Java's {{ByteChannel}} that provides implementations for both {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. > Make AbstractSerDe Superclass of all Classes > > > Key: HIVE-24332 > URL: https://issues.apache.org/jira/browse/HIVE-24332 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes > are designed. > Simplify, and consolidate more functionality into {{AbstractSerDe}}. Remove > functionality that is not commonly used. Remove deprecated methods that were > deprecated in 3.x (or maybe even older). > Make it like Java's {{ByteChannel}} that provides implementations for both > {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24332) Make AbstractSerDe Superclass of all Classes
[ https://issues.apache.org/jira/browse/HIVE-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24332: -- Labels: pull-request-available (was: ) > Make AbstractSerDe Superclass of all Classes > > > Key: HIVE-24332 > URL: https://issues.apache.org/jira/browse/HIVE-24332 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes > are designed. > Simplify, and consolidate more functionality into {{AbstractSerDe}}. Remove > functionality that is not commonly used. > Make it like Java's {{ByteChannel}} that provides implementations for both > {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes
[ https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=506443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506443 ] ASF GitHub Bot logged work on HIVE-24332: - Author: ASF GitHub Bot Created on: 29/Oct/20 22:38 Start Date: 29/Oct/20 22:38 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1628: URL: https://github.com/apache/hive/pull/1628 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506443) Remaining Estimate: 0h Time Spent: 10m > Make AbstractSerDe Superclass of all Classes > > > Key: HIVE-24332 > URL: https://issues.apache.org/jira/browse/HIVE-24332 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes > are designed. > Simplify, and consolidate more functionality into {{AbstractSerDe}}. Remove > functionality that is not commonly used. > Make it like Java's {{ByteChannel}} that provides implementations for both > {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24332) Make AbstractSerDe Superclass of all Classes
[ https://issues.apache.org/jira/browse/HIVE-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-24332: -- Description: Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are designed. Simplify, and consolidate more functionality into {{AbstractSerDe}}. Remove functionality that is not commonly used. Make it like Java's {{ByteChannel}} that provides implementations for both {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. was: Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are designed. Simplify, and consolidate more functionality into {{AbstractSerDe}}. Make it like Java's {{ByteChannel}} that provides implementations for both {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. > Make AbstractSerDe Superclass of all Classes > > > Key: HIVE-24332 > URL: https://issues.apache.org/jira/browse/HIVE-24332 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes > are designed. > Simplify, and consolidate more functionality into {{AbstractSerDe}}. Remove > functionality that is not commonly used. > Make it like Java's {{ByteChannel}} that provides implementations for both > {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24332) Make AbstractSerDe Superclass of all Classes
[ https://issues.apache.org/jira/browse/HIVE-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-24332: - > Make AbstractSerDe Superclass of all Classes > > > Key: HIVE-24332 > URL: https://issues.apache.org/jira/browse/HIVE-24332 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes > are designed. > Simplify, and consolidate more functionality into {{AbstractSerDe}}. > Make it like Java's {{ByteChannel}} that provides implementations for both > {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24262) Optimise NullScanTaskDispatcher for cloud storage
[ https://issues.apache.org/jira/browse/HIVE-24262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman updated HIVE-24262: Status: Patch Available (was: Open) > Optimise NullScanTaskDispatcher for cloud storage > - > > Key: HIVE-24262 > URL: https://issues.apache.org/jira/browse/HIVE-24262 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > > {noformat} > select count(DISTINCT ss_sold_date_sk) from store_sales; > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > Reducer 2 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 5.55 s > -- > INFO : Status: DAG finished successfully in 5.44 seconds > INFO : > INFO : Query Execution Summary > INFO : > -- > INFO : OPERATIONDURATION > INFO : > -- > INFO : Compile Query 102.02s > INFO : Prepare Plan0.51s > INFO : Get Query Coordinator (AM) 0.01s > INFO : Submit Plan 0.33s > INFO : Start DAG 0.56s > INFO : Run DAG 5.44s > INFO : > -- > {noformat} > Reason for "102 seconds" compilation time is that, it ends up doing > "isEmptyPath" check for every partition path and takes lot of time in > compilation phase. > If the parent directory of all paths belong to the same path, we could just > do a recursive listing just once (instead of listing each directory one at a > time sequentially) in cloud storage systems. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L158 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L121 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L101 > With a temp hacky fix, it comes down to 2 seconds from 100+ seconds. > {noformat} > INFO : Dag name: select count(DISTINCT ss_sold_...store_sales (Stage-1) > INFO : Status: Running (Executing on YARN cluster with App id > application_1602500203747_0003) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > Reducer 2 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 1.23 s > -- > INFO : Status: DAG finished successfully in 1.20 seconds > INFO : > INFO : Query Execution Summary > INFO : > -- > INFO : OPERATIONDURATION > INFO : > -- > INFO : Compile Query 0.85s > INFO : Prepare Plan0.17s > INFO : Get Query Coordinator (AM) 0.00s > INFO : Submit Plan 0.03s > INFO : Start DAG 0.03s > INFO : Run DAG 1.20s > INFO : > -- > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background
[ https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=506427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506427 ] ASF GitHub Bot logged work on HIVE-24270: - Author: ASF GitHub Bot Created on: 29/Oct/20 21:48 Start Date: 29/Oct/20 21:48 Worklog Time Spent: 10m Work Description: mustafaiman opened a new pull request #1627: URL: https://github.com/apache/hive/pull/1627 …e cases ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506427) Time Spent: 2.5h (was: 2h 20m) > Move scratchdir cleanup to background > - > > Key: HIVE-24270 > URL: https://issues.apache.org/jira/browse/HIVE-24270 > Project: Hive > Issue Type: Improvement >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > In cloud environment, scratchdir cleaning at the end of the query may take > long time. This causes client to hang up to 1 minute even after the results > were streamed back. During this time client just waits for cleanup to finish. > Cleanup can take place in the background in HiveServer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24294) TezSessionPool sessions can throw AssertionError
[ https://issues.apache.org/jira/browse/HIVE-24294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223178#comment-17223178 ] Naresh P R edited comment on HIVE-24294 at 10/29/20, 7:48 PM: -- Thanks for the review & commit [~lpinter] & [~mustafaiman] was (Author: nareshpr): Thanks for the review & commit [~lpinter] > TezSessionPool sessions can throw AssertionError > > > Key: HIVE-24294 > URL: https://issues.apache.org/jira/browse/HIVE-24294 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Whenever default TezSessionPool sessions are reopened for some reason, we are > setting dagResources to null before close & setting it back in openWhenever > default TezSessionPool sessions are reopened for some reason, we are setting > dagResources to null before close & setting it back in open > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503 > If there is an exception in sessionState.close(), we are not restoring the > dagResource but moving the session back to TezSessionPool.eg., exception > trace when sessionState.close() failed > {code:java} > 2020-10-15T09:20:28,749 INFO [HiveServer2-Background-Pool: Thread-25451]: > client.TezClient (:()) - Failed to shutdown Tez Session via proxy > org.apache.tez.dag.api.SessionNotRunning: Application not running, > applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, > finalApplicationStatus=SUCCEEDED, > trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, > diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, > sessionTimeoutInterval=60 ms > Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0 > at > org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) > at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) > at org.apache.tez.client.TezClient.stop(TezClient.java:743) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228) > > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531) > > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code} > Because of this, all new queries using this corrupted sessions are failing > with below exception > {code:java} > Caused by: java.lang.AssertionError: Ensure called on an unitialized (or > closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: > java.lang.AssertionError: Ensure called on an unitialized (or closed) session > 41774265-b7da-4d58-84a8-1bedfd597aec at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24294) TezSessionPool sessions can throw AssertionError
[ https://issues.apache.org/jira/browse/HIVE-24294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R resolved HIVE-24294. --- Fix Version/s: 4.0.0 Resolution: Fixed > TezSessionPool sessions can throw AssertionError > > > Key: HIVE-24294 > URL: https://issues.apache.org/jira/browse/HIVE-24294 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Whenever default TezSessionPool sessions are reopened for some reason, we are > setting dagResources to null before close & setting it back in openWhenever > default TezSessionPool sessions are reopened for some reason, we are setting > dagResources to null before close & setting it back in open > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503 > If there is an exception in sessionState.close(), we are not restoring the > dagResource but moving the session back to TezSessionPool.eg., exception > trace when sessionState.close() failed > {code:java} > 2020-10-15T09:20:28,749 INFO [HiveServer2-Background-Pool: Thread-25451]: > client.TezClient (:()) - Failed to shutdown Tez Session via proxy > org.apache.tez.dag.api.SessionNotRunning: Application not running, > applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, > finalApplicationStatus=SUCCEEDED, > trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, > diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, > sessionTimeoutInterval=60 ms > Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0 > at > org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) > at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) > at org.apache.tez.client.TezClient.stop(TezClient.java:743) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228) > > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531) > > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code} > Because of this, all new queries using this corrupted sessions are failing > with below exception > {code:java} > Caused by: java.lang.AssertionError: Ensure called on an unitialized (or > closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: > java.lang.AssertionError: Ensure called on an unitialized (or closed) session > 41774265-b7da-4d58-84a8-1bedfd597aec at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24294) TezSessionPool sessions can throw AssertionError
[ https://issues.apache.org/jira/browse/HIVE-24294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223178#comment-17223178 ] Naresh P R commented on HIVE-24294: --- Thanks for the review & commit [~lpinter] > TezSessionPool sessions can throw AssertionError > > > Key: HIVE-24294 > URL: https://issues.apache.org/jira/browse/HIVE-24294 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Whenever default TezSessionPool sessions are reopened for some reason, we are > setting dagResources to null before close & setting it back in openWhenever > default TezSessionPool sessions are reopened for some reason, we are setting > dagResources to null before close & setting it back in open > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503 > If there is an exception in sessionState.close(), we are not restoring the > dagResource but moving the session back to TezSessionPool.eg., exception > trace when sessionState.close() failed > {code:java} > 2020-10-15T09:20:28,749 INFO [HiveServer2-Background-Pool: Thread-25451]: > client.TezClient (:()) - Failed to shutdown Tez Session via proxy > org.apache.tez.dag.api.SessionNotRunning: Application not running, > applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, > finalApplicationStatus=SUCCEEDED, > trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, > diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, > sessionTimeoutInterval=60 ms > Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0 > at > org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) > at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) > at org.apache.tez.client.TezClient.stop(TezClient.java:743) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487) > > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228) > > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531) > > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code} > Because of this, all new queries using this corrupted sessions are failing > with below exception > {code:java} > Caused by: java.lang.AssertionError: Ensure called on an unitialized (or > closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: > java.lang.AssertionError: Ensure called on an unitialized (or closed) session > 41774265-b7da-4d58-84a8-1bedfd597aec at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506364 ] ASF GitHub Bot logged work on HIVE-22415: - Author: ASF GitHub Bot Created on: 29/Oct/20 19:41 Start Date: 29/Oct/20 19:41 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1624: URL: https://github.com/apache/hive/pull/1624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506364) Time Spent: 4h (was: 3h 50m) > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506362 ] ASF GitHub Bot logged work on HIVE-22415: - Author: ASF GitHub Bot Created on: 29/Oct/20 19:41 Start Date: 29/Oct/20 19:41 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1624: URL: https://github.com/apache/hive/pull/1624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506362) Time Spent: 3h 50m (was: 3h 40m) > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant
[ https://issues.apache.org/jira/browse/HIVE-24325?focusedWorklogId=506361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506361 ] ASF GitHub Bot logged work on HIVE-24325: - Author: ASF GitHub Bot Created on: 29/Oct/20 19:36 Start Date: 29/Oct/20 19:36 Worklog Time Spent: 10m Work Description: kasakrisz commented on pull request #1622: URL: https://github.com/apache/hive/pull/1622#issuecomment-718975783 :+1: new approach This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506361) Time Spent: 40m (was: 0.5h) > Cardinality preserving join optimization fails when column is backtracked to > a constant > --- > > Key: HIVE-24325 > URL: https://issues.apache.org/jira/browse/HIVE-24325 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > This error happens when one of the columns that is used in the output > backtracks to a constant. We end up without a mapping for the column, which > leads to exception below. > {code} > org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no > target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, > 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]] > at > org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) >
[jira] [Work logged] (HIVE-24325) Cardinality preserving join optimization fails when column is backtracked to a constant
[ https://issues.apache.org/jira/browse/HIVE-24325?focusedWorklogId=506347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506347 ] ASF GitHub Bot logged work on HIVE-24325: - Author: ASF GitHub Bot Created on: 29/Oct/20 19:02 Start Date: 29/Oct/20 19:02 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #1622: URL: https://github.com/apache/hive/pull/1622#issuecomment-718958588 @kasakrisz , I have changed the approach slightly (I think new code is better). Could you take another quick look? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506347) Time Spent: 0.5h (was: 20m) > Cardinality preserving join optimization fails when column is backtracked to > a constant > --- > > Key: HIVE-24325 > URL: https://issues.apache.org/jira/browse/HIVE-24325 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This error happens when one of the columns that is used in the output > backtracks to a constant. We end up without a mapping for the column, which > leads to exception below. > {code} > org.apache.calcite.util.mapping.Mappings$NoElementException: source #8 has no > target in mapping [size=9, sourceCount=23, targetCount=9, elements=[0:0, 1:1, > 2:2, 3:3, 4:4, 9:5, 11:6, 12:7, 13:8]] > at > org.apache.calcite.util.mapping.Mappings$AbstractMapping.getTarget(Mappings.java:879) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinOptimization.trim(HiveCardinalityPreservingJoinOptimization.java:228) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveCardinalityPreservingJoinRule.trim(HiveCardinalityPreservingJoinRule.java:48) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule.onMatch(HiveFieldTrimmerRule.java:70) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2669) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2635) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2547) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1941) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1809) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] >
[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12
[ https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=506327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506327 ] ASF GitHub Bot logged work on HIVE-24222: - Author: ASF GitHub Bot Created on: 29/Oct/20 18:11 Start Date: 29/Oct/20 18:11 Worklog Time Spent: 10m Work Description: dongjoon-hyun closed pull request #1615: URL: https://github.com/apache/hive/pull/1615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506327) Time Spent: 2h 20m (was: 2h 10m) > Upgrade ORC to 1.5.12 > - > > Key: HIVE-24222 > URL: https://issues.apache.org/jira/browse/HIVE-24222 > Project: Hive > Issue Type: Improvement > Components: ORC >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506314 ] ASF GitHub Bot logged work on HIVE-24316: - Author: ASF GitHub Bot Created on: 29/Oct/20 17:48 Start Date: 29/Oct/20 17:48 Worklog Time Spent: 10m Work Description: dongjoon-hyun commented on pull request #1616: URL: https://github.com/apache/hive/pull/1616#issuecomment-718917847 Thank you so much, @sunchao! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506314) Time Spent: 1h 50m (was: 1h 40m) > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506310 ] ASF GitHub Bot logged work on HIVE-24316: - Author: ASF GitHub Bot Created on: 29/Oct/20 17:34 Start Date: 29/Oct/20 17:34 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1616: URL: https://github.com/apache/hive/pull/1616#issuecomment-718909196 opened #1626 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506310) Time Spent: 1h 40m (was: 1.5h) > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24331) Add Jenkinsfile for branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned HIVE-24331: --- > Add Jenkinsfile for branch-3.1 > -- > > Key: HIVE-24331 > URL: https://issues.apache.org/jira/browse/HIVE-24331 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > We should add Jenkinsfile for branch-3.1 so that ppl can file PR against it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506307 ] ASF GitHub Bot logged work on HIVE-24316: - Author: ASF GitHub Bot Created on: 29/Oct/20 17:25 Start Date: 29/Oct/20 17:25 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1616: URL: https://github.com/apache/hive/pull/1616#issuecomment-718904044 yeah I can help on that - I think it won't be too difficult after going through the process for branch-2.3. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506307) Time Spent: 1.5h (was: 1h 20m) > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506301 ] ASF GitHub Bot logged work on HIVE-24316: - Author: ASF GitHub Bot Created on: 29/Oct/20 17:14 Start Date: 29/Oct/20 17:14 Worklog Time Spent: 10m Work Description: dongjoon-hyun commented on pull request #1616: URL: https://github.com/apache/hive/pull/1616#issuecomment-718898121 Do you think you can do that for the Apache Hive community? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506301) Time Spent: 1h 20m (was: 1h 10m) > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506295=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506295 ] ASF GitHub Bot logged work on HIVE-24316: - Author: ASF GitHub Bot Created on: 29/Oct/20 17:07 Start Date: 29/Oct/20 17:07 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1616: URL: https://github.com/apache/hive/pull/1616#issuecomment-718893556 No. There is no jenkins file in branch-3.1 so there's no way to run CI at the moment. We'd have to do something similar to #1398 to enable that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506295) Time Spent: 1h 10m (was: 1h) > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=506292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506292 ] ASF GitHub Bot logged work on HIVE-24316: - Author: ASF GitHub Bot Created on: 29/Oct/20 17:04 Start Date: 29/Oct/20 17:04 Worklog Time Spent: 10m Work Description: dongjoon-hyun commented on pull request #1616: URL: https://github.com/apache/hive/pull/1616#issuecomment-718891495 Hi, @sunchao . There is no way to trigger the real CI until now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506292) Time Spent: 1h (was: 50m) > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24217. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Attila Magyar! > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 4h 50m > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=506280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506280 ] ASF GitHub Bot logged work on HIVE-24217: - Author: ASF GitHub Bot Created on: 29/Oct/20 16:44 Start Date: 29/Oct/20 16:44 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1542: URL: https://github.com/apache/hive/pull/1542 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506280) Time Spent: 4h 50m (was: 4h 40m) > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 4h 50m > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files
[ https://issues.apache.org/jira/browse/HIVE-24314?focusedWorklogId=506249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506249 ] ASF GitHub Bot logged work on HIVE-24314: - Author: ASF GitHub Bot Created on: 29/Oct/20 15:45 Start Date: 29/Oct/20 15:45 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1613: URL: https://github.com/apache/hive/pull/1613#discussion_r514363631 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -201,16 +201,17 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB) throws MetaException { LOG.debug("Cleaning based on writeIdList: " + validWriteIdList); } + final boolean[] removedFiles = new boolean[1]; Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506249) Time Spent: 0.5h (was: 20m) > compactor.Cleaner should not set state "mark cleaned" if it didn't remove any > files > --- > > Key: HIVE-24314 > URL: https://issues.apache.org/jira/browse/HIVE-24314 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > If the Cleaner didn't remove any files, don't mark the compaction queue entry > as "succeeded" but instead leave it in "ready for cleaning" state for later > cleaning. If it removed at least one file, then the compaction queue entry as > "succeeded". This is a partial fix, HIVE-24291 is the complete fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas
[ https://issues.apache.org/jira/browse/HIVE-24291?focusedWorklogId=506248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506248 ] ASF GitHub Bot logged work on HIVE-24291: - Author: ASF GitHub Bot Created on: 29/Oct/20 15:44 Start Date: 29/Oct/20 15:44 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1592: URL: https://github.com/apache/hive/pull/1592#discussion_r514362309 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -281,9 +280,14 @@ public void markCompacted(CompactionInfo info) throws MetaException { try { dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED); stmt = dbConn.createStatement(); -String s = "SELECT \"CQ_ID\", \"CQ_DATABASE\", \"CQ_TABLE\", \"CQ_PARTITION\", " + -"\"CQ_TYPE\", \"CQ_RUN_AS\", \"CQ_HIGHEST_WRITE_ID\" FROM \"COMPACTION_QUEUE\" " + -"WHERE \"CQ_STATE\" = '" + READY_FOR_CLEANING + "'"; +/* + * By filtering on minOpenTxnWaterMark, we will only cleanup after every transaction is committed, that could see + * the uncompacted deltas. This way the cleaner can clean up everything that was made obsolete by this compaction. + */ +long minOpenTxnWaterMark = getMinOpenTxnIdWaterMark(dbConn); Review comment: Cleaner already knows this value, Cleaner#run calls CompactionTxnHandler#findMinOpenTxnIdForCleaner first, then findReadyToClean, so you can just pass it into findReadyToClean. (Btw findMinOpenTxnIdForCleaner doesn't filter out timed out txns like getMinOpenTxnIdWaterMark does, might want to change that? (AcidHouseKeeperService should take care of that, but who knows if it's on... on the other hand that's another query and would take longer)) ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -1526,6 +1529,10 @@ private void updateWSCommitIdAndCleanUpMetadata(Statement stmt, long txnid, TxnT if (txnType == TxnType.MATER_VIEW_REBUILD) { queryBatch.add("DELETE FROM \"MATERIALIZATION_REBUILD_LOCKS\" WHERE \"MRL_TXN_ID\" = " + txnid); } +if (txnType == TxnType.COMPACTION) { Review comment: It's not the end of the world to add the CQ_TXN_ID column, but we can avoid that and keep things more straightforward (i.e. keep compaction stuff out of generic TxnHandler and limit it to CompactionTxnHandler which was made specifically for updating compaction-related tables) by updating CQ_NEXT_TXN_ID in CompactionTxnHandler instead, and calling it straight from Worker, maybe right between commitTxn and markCompacted. It would be so much simpler. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506248) Time Spent: 40m (was: 0.5h) > Compaction Cleaner prematurely cleans up deltas > --- > > Key: HIVE-24291 > URL: https://issues.apache.org/jira/browse/HIVE-24291 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Since HIVE-23107 the cleaner can clean up deltas that are still used by > running queries. > Example: > * TxnId 1-5 writes to a partition, all commits > * Compactor starts with txnId=6 > * Long running query starts with txnId=7, it sees txnId=6 as open in its > snapshot > * Compaction commits > * Cleaner runs > Previously min_history_level table would have prevented the Cleaner to delete > the deltas1-5 until txnId=7 is open, but now they will be deleted and the > long running query may fail if its tries to access the files. > Solution could be to not run the cleaner until any txn is open that was > opened before the compaction was committed (CQ_NEXT_TXN_ID) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24330) Automate setAcl on cmRoot directories.
[ https://issues.apache.org/jira/browse/HIVE-24330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma reassigned HIVE-24330: -- > Automate setAcl on cmRoot directories. > -- > > Key: HIVE-24330 > URL: https://issues.apache.org/jira/browse/HIVE-24330 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506216 ] ASF GitHub Bot logged work on HIVE-22415: - Author: ASF GitHub Bot Created on: 29/Oct/20 13:29 Start Date: 29/Oct/20 13:29 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1624: URL: https://github.com/apache/hive/pull/1624#issuecomment-718752892 ``` [2020-10-29T13:26:53.569Z] [ERROR] Nullcheck of out at line 78 of value previously dereferenced in org.apache.hadoop.hive.common.AcidMetaDataFile.writeToFile(FileSystem, Path, AcidMetaDataFile$DataFormat) [org.apache.hadoop.hive.common.AcidMetaDataFile, org.apache.hadoop.hive.common.AcidMetaDataFile] At AcidMetaDataFile.java:[line 78]Redundant null check at AcidMetaDataFile.java:[line 82] RCN_REDUNDANT_NULLCHECK_WOULD_HAVE_BEEN_A_NPE ``` I don't think that's related to this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506216) Time Spent: 3h 40m (was: 3.5h) > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506213 ] ASF GitHub Bot logged work on HIVE-22415: - Author: ASF GitHub Bot Created on: 29/Oct/20 13:22 Start Date: 29/Oct/20 13:22 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1241: URL: https://github.com/apache/hive/pull/1241#issuecomment-718748712 @abstractdog Re-opened PR at #1624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506213) Time Spent: 3.5h (was: 3h 20m) > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=506212=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506212 ] ASF GitHub Bot logged work on HIVE-22415: - Author: ASF GitHub Bot Created on: 29/Oct/20 13:21 Start Date: 29/Oct/20 13:21 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1624: URL: https://github.com/apache/hive/pull/1624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506212) Time Spent: 3h 20m (was: 3h 10m) > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization
[ https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=506181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506181 ] ASF GitHub Bot logged work on HIVE-18284: - Author: ASF GitHub Bot Created on: 29/Oct/20 11:45 Start Date: 29/Oct/20 11:45 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1400: URL: https://github.com/apache/hive/pull/1400#discussion_r514196451 ## File path: itests/src/test/resources/testconfiguration.properties ## @@ -6,6 +6,7 @@ minimr.query.files=\ # Queries ran by both MiniLlapLocal and MiniTez minitez.query.files.shared=\ + dynpart_sort_optimization_distribute_by.q,\ Review comment: do we need to run this test with minitez - or it may run with minillaplocal? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506181) Time Spent: 2h 20m (was: 2h 10m) > NPE when inserting data with 'distribute by' clause with dynpart sort > optimization > -- > > Key: HIVE-18284 > URL: https://issues.apache.org/jira/browse/HIVE-18284 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2 >Reporter: Aki Tanaka >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > A Null Pointer Exception occurs when inserting data with 'distribute by' > clause. The following snippet query reproduces this issue: > *(non-vectorized , non-llap mode)* > {code:java} > create table table1 (col1 string, datekey int); > insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1); > create table table2 (col1 string) partitioned by (datekey int); > set hive.vectorized.execution.enabled=false; > set hive.optimize.sort.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > insert into table table2 > PARTITION(datekey) > select col1, > datekey > from table1 > distribute by datekey ; > {code} > I could run the insert query without the error if I remove Distribute By or > use Cluster By clause. > It seems that the issue happens because Distribute By does not guarantee > clustering or sorting properties on the distributed keys. > FileSinkOperator removes the previous fsp. FileSinkOperator will remove the > previous fsp which might be re-used when we use Distribute By. > https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972 > The following stack trace is logged. > {code:java} > Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, > diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by:
[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization
[ https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=506180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506180 ] ASF GitHub Bot logged work on HIVE-18284: - Author: ASF GitHub Bot Created on: 29/Oct/20 11:44 Start Date: 29/Oct/20 11:44 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1400: URL: https://github.com/apache/hive/pull/1400#discussion_r514195755 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplicationUtils.java ## @@ -181,6 +183,23 @@ public static boolean merge(HiveConf hiveConf, ReduceSinkOperator cRS, ReduceSin TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(new ArrayList(), pRS .getConf().getOrder(), pRS.getConf().getNullOrder()); pRS.getConf().setKeySerializeInfo(keyTable); + } else if (cRS.getConf().getKeyCols() != null && cRS.getConf().getKeyCols().size() > 0) { +ArrayList keyColNames = Lists.newArrayList(); +for (ExprNodeDesc keyCol : pRS.getConf().getKeyCols()) { + String keyColName = keyCol.getExprString(); + keyColNames.add(keyColName); +} +List fields = PlanUtils.getFieldSchemasFromColumnList(pRS.getConf().getKeyCols(), +keyColNames, 0, ""); +TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(fields, pRS.getConf().getOrder(), +pRS.getConf().getNullOrder()); +ArrayList outputKeyCols = Lists.newArrayList(); +for (int i = 0; i < fields.size(); i++) { + outputKeyCols.add(fields.get(i).getName()); +} +pRS.getConf().setOutputKeyColumnNames(outputKeyCols); +pRS.getConf().setKeySerializeInfo(keyTable); + pRS.getConf().setNumDistributionKeys(cRS.getConf().getNumDistributionKeys()); } Review comment: yes; you are correct This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506180) Time Spent: 2h 10m (was: 2h) > NPE when inserting data with 'distribute by' clause with dynpart sort > optimization > -- > > Key: HIVE-18284 > URL: https://issues.apache.org/jira/browse/HIVE-18284 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2 >Reporter: Aki Tanaka >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > A Null Pointer Exception occurs when inserting data with 'distribute by' > clause. The following snippet query reproduces this issue: > *(non-vectorized , non-llap mode)* > {code:java} > create table table1 (col1 string, datekey int); > insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1); > create table table2 (col1 string) partitioned by (datekey int); > set hive.vectorized.execution.enabled=false; > set hive.optimize.sort.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > insert into table table2 > PARTITION(datekey) > select col1, > datekey > from table1 > distribute by datekey ; > {code} > I could run the insert query without the error if I remove Distribute By or > use Cluster By clause. > It seems that the issue happens because Distribute By does not guarantee > clustering or sorting properties on the distributed keys. > FileSinkOperator removes the previous fsp. FileSinkOperator will remove the > previous fsp which might be re-used when we use Distribute By. > https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972 > The following stack trace is logged. > {code:java} > Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, > diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at >
[jira] [Assigned] (HIVE-24329) Add HMS notification for compaction commit
[ https://issues.apache.org/jira/browse/HIVE-24329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga reassigned HIVE-24329: -- > Add HMS notification for compaction commit > -- > > Key: HIVE-24329 > URL: https://issues.apache.org/jira/browse/HIVE-24329 > Project: Hive > Issue Type: New Feature >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > > This could be used by file metadata caches, to invalidate the cache content -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24328) Run distcp in parallel for all file entries in repl load.
[ https://issues.apache.org/jira/browse/HIVE-24328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi reassigned HIVE-24328: -- > Run distcp in parallel for all file entries in repl load. > - > > Key: HIVE-24328 > URL: https://issues.apache.org/jira/browse/HIVE-24328 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24318) When GlobalLimit is efficient, query will run twice with "Retry query with a different approach..."
[ https://issues.apache.org/jira/browse/HIVE-24318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] libo updated HIVE-24318: Description: hive.limit.optimize.enable=true hive.limit.row.max.size=1000 hive.limit.optimize.fetch.max=1000 hive.fetch.task.conversion.threshold=256 hive.fetch.task.conversion=more *sql eg:* select db_name,concat(tb_name,'test') tbname from (select * from test1.t3 where dt='0909' limit 10)t1; (only partitioned table has this problem) *console information:* *..* Kill Command = /appcom/hadoop/bin/hadoop job -kill job_1600683831691_837491 Hadoop job information for Stage-1: number of {color:#FF}mappers: 1{color}; number of reducers: 1 map = 0%, reduce = 0% map = 100%, reduce = 0%, Cumulative CPU 6.33 sec map = 100%, reduce = 100%, Cumulative CPU 13.69 sec MapReduce Total cumulative CPU time: 13 seconds 690 msec Ended Job = job_1600683831691_837491 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 13.69 sec HDFS Read: 4389 HDFS Write: 4115 SUCCESS Total MapReduce CPU Time Spent: 13 seconds 690 msec OK db_name tbname .. Retry query with a different approach... .. Kill Command = /appcom/hadoop/bin/hadoop job -kill job_1600683831691_837520 Hadoop job information for Stage-1: number of {color:#FF}mappers: 176{color}; number of reducers: 1 .. as we can see, the mr run twice,first time the global limit is efficient and the second time is not *exception stack:* org.apache.hadoop.hive.ql.CommandNeedRetryException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2022) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:317) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:232) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:475) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:855) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:794) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:721) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) was: hive.limit.optimize.enable=true hive.limit.row.max.size=1000 hive.limit.optimize.fetch.max=1000 hive.fetch.task.conversion.threshold=256 hive.fetch.task.conversion=more *sql eg:* select db_name,concat(tb_name,'test') from (select * from test1.t3 where dt='0909' limit 10)t1; (only partitioned table) *console information:* Retry query with a different approach... *exception stack:* org.apache.hadoop.hive.ql.CommandNeedRetryException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2022) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:317) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:232) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:475) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:855) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:794) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:721) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) > When GlobalLimit is efficient, query will run twice with "Retry query with a > different approach..." > --- > > Key: HIVE-24318 > URL: https://issues.apache.org/jira/browse/HIVE-24318 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.1 > Environment: Hadoop 2.6.0 > Hive-2.0.1 >Reporter: libo >Assignee: libo >Priority: Minor > Attachments: HIVE-24318.patch > > > hive.limit.optimize.enable=true > hive.limit.row.max.size=1000 > hive.limit.optimize.fetch.max=1000 > hive.fetch.task.conversion.threshold=256 > hive.fetch.task.conversion=more > > *sql eg:* > select db_name,concat(tb_name,'test') tbname from (select * from test1.t3 > where dt='0909' limit 10)t1; > (only partitioned table has this problem) > *console
[jira] [Updated] (HIVE-24307) Beeline with property-file and -e parameter is failing
[ https://issues.apache.org/jira/browse/HIVE-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anishek Agarwal updated HIVE-24307: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to master , Thanks for the patch [~ayushtkn] and review [~aasha]! > Beeline with property-file and -e parameter is failing > -- > > Key: HIVE-24307 > URL: https://issues.apache.org/jira/browse/HIVE-24307 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24307-01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Beeline query with property file specified with -e parameter fails with : > {noformat} > Cannot run commands specified using -e. No current connection > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24307) Beeline with property-file and -e parameter is failing
[ https://issues.apache.org/jira/browse/HIVE-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-24307: Status: Patch Available (was: Open) > Beeline with property-file and -e parameter is failing > -- > > Key: HIVE-24307 > URL: https://issues.apache.org/jira/browse/HIVE-24307 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24307-01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Beeline query with property file specified with -e parameter fails with : > {noformat} > Cannot run commands specified using -e. No current connection > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24307) Beeline with property-file and -e parameter is failing
[ https://issues.apache.org/jira/browse/HIVE-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-24307: Attachment: HIVE-24307-01.patch > Beeline with property-file and -e parameter is failing > -- > > Key: HIVE-24307 > URL: https://issues.apache.org/jira/browse/HIVE-24307 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24307-01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Beeline query with property file specified with -e parameter fails with : > {noformat} > Cannot run commands specified using -e. No current connection > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24327) AtlasServer entity may not be present during first Atlas metadata dump
[ https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24327: Attachment: HIVE-24327.01.patch > AtlasServer entity may not be present during first Atlas metadata dump > -- > > Key: HIVE-24327 > URL: https://issues.apache.org/jira/browse/HIVE-24327 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24327.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files
[ https://issues.apache.org/jira/browse/HIVE-24314?focusedWorklogId=506128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506128 ] ASF GitHub Bot logged work on HIVE-24314: - Author: ASF GitHub Bot Created on: 29/Oct/20 09:05 Start Date: 29/Oct/20 09:05 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1613: URL: https://github.com/apache/hive/pull/1613#discussion_r514102648 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -201,16 +201,17 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB) throws MetaException { LOG.debug("Cleaning based on writeIdList: " + validWriteIdList); } + final boolean[] removedFiles = new boolean[1]; Review comment: You can use org.apache.hive.common.util.Ref for this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506128) Time Spent: 20m (was: 10m) > compactor.Cleaner should not set state "mark cleaned" if it didn't remove any > files > --- > > Key: HIVE-24314 > URL: https://issues.apache.org/jira/browse/HIVE-24314 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > If the Cleaner didn't remove any files, don't mark the compaction queue entry > as "succeeded" but instead leave it in "ready for cleaning" state for later > cleaning. If it removed at least one file, then the compaction queue entry as > "succeeded". This is a partial fix, HIVE-24291 is the complete fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24327) AtlasServer entity may not be present during first Atlas metadata dump
[ https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24327: Summary: AtlasServer entity may not be present during first Atlas metadata dump (was: During Atlas metadata replication, handle a case when AtlasServer entity is not present ) > AtlasServer entity may not be present during first Atlas metadata dump > -- > > Key: HIVE-24327 > URL: https://issues.apache.org/jira/browse/HIVE-24327 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24327) During Atlas metadata replication, handle a case when AtlasServer entity is not present
[ https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24327: -- Labels: pull-request-available (was: ) > During Atlas metadata replication, handle a case when AtlasServer entity is > not present > > > Key: HIVE-24327 > URL: https://issues.apache.org/jira/browse/HIVE-24327 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24327) During Atlas metadata replication, handle a case when AtlasServer entity is not present
[ https://issues.apache.org/jira/browse/HIVE-24327?focusedWorklogId=506122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506122 ] ASF GitHub Bot logged work on HIVE-24327: - Author: ASF GitHub Bot Created on: 29/Oct/20 08:54 Start Date: 29/Oct/20 08:54 Worklog Time Spent: 10m Work Description: pkumarsinha opened a new pull request #1623: URL: https://github.com/apache/hive/pull/1623 …asServer entity is not present ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506122) Remaining Estimate: 0h Time Spent: 10m > During Atlas metadata replication, handle a case when AtlasServer entity is > not present > > > Key: HIVE-24327 > URL: https://issues.apache.org/jira/browse/HIVE-24327 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24327) During Atlas metadata replication, handle a case when AtlasServer entity is not present
[ https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24327: Summary: During Atlas metadata replication, handle a case when AtlasServer entity is not present (was: During Atlas metadata replication handle a case when AtlasServer entity is not present ) > During Atlas metadata replication, handle a case when AtlasServer entity is > not present > > > Key: HIVE-24327 > URL: https://issues.apache.org/jira/browse/HIVE-24327 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24327) During Atlas metadata replication handle a case when AtlasServer entity is not present
[ https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24327: Summary: During Atlas metadata replication handle a case when AtlasServer entity is not present (was: During Atlas metadata replication handle a case when AtlasServer entity not present ) > During Atlas metadata replication handle a case when AtlasServer entity is > not present > --- > > Key: HIVE-24327 > URL: https://issues.apache.org/jira/browse/HIVE-24327 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24327) During Atlas metadata replication handle a case when AtlasServer entity not present
[ https://issues.apache.org/jira/browse/HIVE-24327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha reassigned HIVE-24327: --- > During Atlas metadata replication handle a case when AtlasServer entity not > present > > > Key: HIVE-24327 > URL: https://issues.apache.org/jira/browse/HIVE-24327 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24326) HiveServer memory leak
[ https://issues.apache.org/jira/browse/HIVE-24326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zengxl updated HIVE-24326: -- Attachment: QQ图片20201029161110.png > HiveServer memory leak > -- > > Key: HIVE-24326 > URL: https://issues.apache.org/jira/browse/HIVE-24326 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.1.0 >Reporter: zengxl >Priority: Major > Attachments: QQ图片20201029160447.png, QQ图片20201029161110.png > > > After a while, the hiveserver we produce will fill up with JVMS, resulting in > unresponsive hiveservers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24275) Configurations to delay the deletion of obsolete files by the Cleaner
[ https://issues.apache.org/jira/browse/HIVE-24275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga reassigned HIVE-24275: -- Assignee: Peter Varga > Configurations to delay the deletion of obsolete files by the Cleaner > - > > Key: HIVE-24275 > URL: https://issues.apache.org/jira/browse/HIVE-24275 > Project: Hive > Issue Type: New Feature >Reporter: Kishen Das >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Whenever compaction happens, the cleaner immediately deletes older obsolete > files. In certain cases it would be beneficial to retain these for certain > period. For example : if you are serving the file metadata from cache and > don't want to invalidate the cache during compaction because of performance > reasons. > For this purpose we should introduce a configuration > hive.compactor.delayed.cleanup.enabled, which if enabled will delay the > cleaning up obsolete files. There should be a separate configuration > CLEANER_RETENTION_TIME to specify the duration till which we should retain > these older obsolete files. > It might be beneficial to have one more configuration to decide whether to > retain files involved in an aborted transaction > hive.compactor.aborted.txn.delayed.cleanup.enabled . -- This message was sent by Atlassian Jira (v8.3.4#803005)