[jira] [Work logged] (HIVE-24240) Implement missing features in UDTFStatsRule
[ https://issues.apache.org/jira/browse/HIVE-24240?focusedWorklogId=554619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554619 ] ASF GitHub Bot logged work on HIVE-24240: - Author: ASF GitHub Bot Created on: 19/Feb/21 07:21 Start Date: 19/Feb/21 07:21 Worklog Time Spent: 10m Work Description: okumin commented on pull request #1984: URL: https://github.com/apache/hive/pull/1984#issuecomment-781886090 @kgyrtkirk Could you please take a look when you have a chance? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554619) Time Spent: 20m (was: 10m) > Implement missing features in UDTFStatsRule > --- > > Key: HIVE-24240 > URL: https://issues.apache.org/jira/browse/HIVE-24240 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Add the following steps. > * Handle the case in which the num row will be zero > * Compute runtime stats in case of a re-execution -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24704) Ensure that all Operator column expressions refer to a column in the RowSchema
[ https://issues.apache.org/jira/browse/HIVE-24704?focusedWorklogId=554608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554608 ] ASF GitHub Bot logged work on HIVE-24704: - Author: ASF GitHub Bot Created on: 19/Feb/21 06:41 Start Date: 19/Feb/21 06:41 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1929: URL: https://github.com/apache/hive/pull/1929#discussion_r578958271 ## File path: ql/src/java/org/apache/hadoop/hive/ql/hooks/OperatorHealthCheckerHook.java ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.hooks; + +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Map.Entry; +import java.util.Set; +import java.util.Stack; + +import com.google.common.collect.Lists; + +import com.google.common.collect.Sets; +import org.apache.hadoop.hive.ql.exec.ColumnInfo; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator; +import org.apache.hadoop.hive.ql.exec.RowSchema; +import org.apache.hadoop.hive.ql.exec.ScriptOperator; +import org.apache.hadoop.hive.ql.exec.SelectOperator; +import org.apache.hadoop.hive.ql.exec.TableScanOperator; +import org.apache.hadoop.hive.ql.exec.Task; +import org.apache.hadoop.hive.ql.lib.DefaultGraphWalker; +import org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher; +import org.apache.hadoop.hive.ql.lib.SemanticDispatcher; +import org.apache.hadoop.hive.ql.lib.SemanticGraphWalker; +import org.apache.hadoop.hive.ql.lib.Node; +import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor; +import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.plan.BaseWork; +import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; +import org.apache.hadoop.hive.ql.plan.ExprNodeDesc; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.MapWork; +import org.apache.hadoop.hive.ql.plan.MapredWork; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.ReduceWork; +import org.apache.hadoop.hive.ql.plan.SelectDesc; +import org.apache.hadoop.hive.ql.plan.TezWork; + +/** + * Checks some operator quality rules. + * + * Checks whenever operator ids are not reused. + * Checks some level of expression/schema consistency + * Some sanity checks on SelectOperators + */ +public class OperatorHealthCheckerHook implements ExecuteWithHookContext { + + static class UniqueOpIdChecker implements SemanticNodeProcessor { + +Map> opMap = new HashMap<>(); + +@Override +public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, Object... nodeOutputs) +throws SemanticException { + + Operator op = (Operator) nd; + checkOperator(op); + String opKey = op.getOperatorId(); + Operator found = opMap.get(opKey); + if (found != null) { +throw new RuntimeException("operator id reuse found: " + opKey); + } + opMap.put(opKey, op); + return null; +} + } + + public static void checkOperator(Operator op) { +OperatorDesc conf = op.getConf(); +Map exprMap = conf.getColumnExprMap(); +RowSchema schema = op.getSchema(); + +chceckSchema(schema); Review comment: typo -> `checkSchema` ## File path: ql/src/java/org/apache/hadoop/hive/ql/hooks/OperatorHealthCheckerHook.java ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing,
[jira] [Commented] (HIVE-24733) Handle replication when db location and managed location is set to custom location on source
[ https://issues.apache.org/jira/browse/HIVE-24733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286868#comment-17286868 ] Pravin Sinha commented on HIVE-24733: - Thank you for the review, [~aasha] . Committed to master. > Handle replication when db location and managed location is set to custom > location on source > > > Key: HIVE-24733 > URL: https://issues.apache.org/jira/browse/HIVE-24733 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > {color:#172b4d} {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24733) Handle replication when db location and managed location is set to custom location on source
[ https://issues.apache.org/jira/browse/HIVE-24733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-24733: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Handle replication when db location and managed location is set to custom > location on source > > > Key: HIVE-24733 > URL: https://issues.apache.org/jira/browse/HIVE-24733 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > {color:#172b4d} {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24259) [CachedStore] Optimise get all constraint api
[ https://issues.apache.org/jira/browse/HIVE-24259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-24259: - Description: Description - Problem - 1. Redundant check if table is present or not 2. Currently in order to get all constraint form the cachedstore. 6 different call is made with in the cached store. Which led to 6 different call to raw store 3. If constraints are null and valid then also a redundant call is made to raw store. DOD 1. Create a flag which tell weather constraint snapshot is valid or not. 2. if incremental addition happen to any of the constraint mark snapshot as invalid and let updater thread to update cache with valid snapshot data. 3. Combine individual constraint call into one call for refresh and creation. was: Description - Problem - 1. Redundant check if table is present or not 2. Currently in order to get all constraint form the cachedstore. 6 different call is made with in the cached store. Which led to 6 different call to raw store DOD 1. Check only once if table exit in cached store. 2. Instead of calling individual constraint in cached store. Add a method which return all constraint at once and if data is not consistent then fall back to rawstore. > [CachedStore] Optimise get all constraint api > - > > Key: HIVE-24259 > URL: https://issues.apache.org/jira/browse/HIVE-24259 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Description - > Problem - > 1. Redundant check if table is present or not > 2. Currently in order to get all constraint form the cachedstore. 6 different > call is made with in the cached store. Which led to 6 different call to raw > store > 3. If constraints are null and valid then also a redundant call is made to > raw store. > > DOD > 1. Create a flag which tell weather constraint snapshot is valid or not. > 2. if incremental addition happen to any of the constraint mark snapshot as > invalid and let updater thread to update cache with valid snapshot data. > 3. Combine individual constraint call into one call for refresh and creation. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24259) [CachedStore] Optimise get all constraint api
[ https://issues.apache.org/jira/browse/HIVE-24259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-24259: - Summary: [CachedStore] Optimise get all constraint api (was: [CachedStore] Optimise get constraints call by removing redundant table check ) > [CachedStore] Optimise get all constraint api > - > > Key: HIVE-24259 > URL: https://issues.apache.org/jira/browse/HIVE-24259 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Description - > Problem - > 1. Redundant check if table is present or not > 2. Currently in order to get all constraint form the cachedstore. 6 different > call is made with in the cached store. Which led to 6 different call to raw > store > > DOD > 1. Check only once if table exit in cached store. > 2. Instead of calling individual constraint in cached store. Add a method > which return all constraint at once and if data is not consistent then fall > back to rawstore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24792) Potential thread leak in Operation
[ https://issues.apache.org/jira/browse/HIVE-24792?focusedWorklogId=554580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554580 ] ASF GitHub Bot logged work on HIVE-24792: - Author: ASF GitHub Bot Created on: 19/Feb/21 05:12 Start Date: 19/Feb/21 05:12 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1992: URL: https://github.com/apache/hive/pull/1992 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554580) Time Spent: 0.5h (was: 20m) > Potential thread leak in Operation > -- > > Key: HIVE-24792 > URL: https://issues.apache.org/jira/browse/HIVE-24792 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The _scheduledExecutorService_ in _Operation_ does not shut down after > scheduling delay operationlog cleanup, which may result to thread leak in > hiveserver2... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24792) Potential thread leak in Operation
[ https://issues.apache.org/jira/browse/HIVE-24792?focusedWorklogId=554579=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554579 ] ASF GitHub Bot logged work on HIVE-24792: - Author: ASF GitHub Bot Created on: 19/Feb/21 05:09 Start Date: 19/Feb/21 05:09 Worklog Time Spent: 10m Work Description: dengzhhu653 closed pull request #1992: URL: https://github.com/apache/hive/pull/1992 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554579) Time Spent: 20m (was: 10m) > Potential thread leak in Operation > -- > > Key: HIVE-24792 > URL: https://issues.apache.org/jira/browse/HIVE-24792 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The _scheduledExecutorService_ in _Operation_ does not shut down after > scheduling delay operationlog cleanup, which may result to thread leak in > hiveserver2... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24733) Handle replication when db location and managed location is set to custom location on source
[ https://issues.apache.org/jira/browse/HIVE-24733?focusedWorklogId=554575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554575 ] ASF GitHub Bot logged work on HIVE-24733: - Author: ASF GitHub Bot Created on: 19/Feb/21 04:57 Start Date: 19/Feb/21 04:57 Worklog Time Spent: 10m Work Description: pkumarsinha merged pull request #1942: URL: https://github.com/apache/hive/pull/1942 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554575) Time Spent: 1.5h (was: 1h 20m) > Handle replication when db location and managed location is set to custom > location on source > > > Key: HIVE-24733 > URL: https://issues.apache.org/jira/browse/HIVE-24733 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > {color:#172b4d} {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-13825) Using JOIN in 2 tables that has same path locations, but different colum names fail wtih an error exception
[ https://issues.apache.org/jira/browse/HIVE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286829#comment-17286829 ] Dhirendra Pandit commented on HIVE-13825: - I am also having the same issue with below query. MERGE INTO default.s_satellite AS O using default.s_satellite AS N ON o.load_end_date=n.load_end_date AND n.l_link_key = o.l_link_key and n.load_date > o.load_date WHEN matched then update set load_end_date= n.load_date; I am trying to update the same table by comparing the value of the exiting entries. > Using JOIN in 2 tables that has same path locations, but different colum > names fail wtih an error exception > --- > > Key: HIVE-13825 > URL: https://issues.apache.org/jira/browse/HIVE-13825 > Project: Hive > Issue Type: Improvement >Reporter: Sergio Peña >Assignee: Vihang Karajgaonkar >Priority: Major > > The following scenario of 2 tables with same locations cannot be used on a > JOIN query: > {noformat} > hive> create table t1 (a string, b string) location > '/user/hive/warehouse/test1'; > OK > hive> create table t2 (c string, d string) location > '/user/hive/warehouse/test1'; > OK > hive> select t1.a from t1 join t2 on t1.a = t2.c; > ... > 2016-05-23 16:39:57 Starting to launch local task to process map join; > maximum memory = 477102080 > Execution failed with exit status: 2 > Obtaining error information > Task failed! > Task ID: > Stage-4 > Logs: > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask > {noformat} > The logs contain this error exception: > {noformat} > 2016-05-23T16:39:58,163 ERROR [main]: mr.MapredLocalTask (:()) - Hive Runtime > Error: Map local work failed > java.lang.RuntimeException: cannot find field a from [0:c, 1:d] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:485) > at > org.apache.hadoop.hive.serde2.BaseStructObjectInspector.getStructFieldRef(BaseStructObjectInspector.java:133) > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:973) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:999) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:75) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:355) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:457) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:365) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:457) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:365) > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:499) > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:403) > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:383) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:751) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24792) Potential thread leak in Operation
[ https://issues.apache.org/jira/browse/HIVE-24792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng reassigned HIVE-24792: -- Assignee: Zhihua Deng > Potential thread leak in Operation > -- > > Key: HIVE-24792 > URL: https://issues.apache.org/jira/browse/HIVE-24792 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The _scheduledExecutorService_ in _Operation_ does not shut down after > scheduling delay operationlog cleanup, which may result to thread leak in > hiveserver2... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24346) Store HPL/SQL packages into HMS
[ https://issues.apache.org/jira/browse/HIVE-24346?focusedWorklogId=554558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554558 ] ASF GitHub Bot logged work on HIVE-24346: - Author: ASF GitHub Bot Created on: 19/Feb/21 02:07 Start Date: 19/Feb/21 02:07 Worklog Time Spent: 10m Work Description: mustafaiman commented on a change in pull request #1733: URL: https://github.com/apache/hive/pull/1733#discussion_r578873214 ## File path: hplsql/src/main/java/org/apache/hive/hplsql/packages/HmsPackageRegistry.java ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.hplsql.packages; + +import java.util.Optional; + +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.hive.metastore.IMetaStoreClient; +import org.apache.hadoop.hive.metastore.api.AddPackageRequest; +import org.apache.hadoop.hive.metastore.api.DropPackageRequest; +import org.apache.hadoop.hive.metastore.api.GetPackageRequest; +import org.apache.hadoop.hive.metastore.api.Package; +import org.apache.hive.hplsql.HplSqlSessionState; +import org.apache.thrift.TException; + +public class HmsPackageRegistry implements PackageRegistry { + private final IMetaStoreClient msc; + private final HplSqlSessionState hplSqlSession; + + public HmsPackageRegistry(IMetaStoreClient msc, HplSqlSessionState hplSqlSession) { +this.msc = msc; +this.hplSqlSession = hplSqlSession; + } + + @Override + public Optional getPackage(String name) { +try { + Package pkg = msc.findPackage(request(name)); + return pkg == null + ? Optional.empty() + : Optional.of(pkg.getHeader() + ";\n" + pkg.getBody()); +} catch (TException e) { + throw new RuntimeException(e.getCause()); +} + } + + @Override + public void createPackageHeader(String name, String header, boolean replace) { +try { + Package existing = msc.findPackage(request(name)); + if (existing != null && !replace) +throw new RuntimeException("Package " + name + " already exists"); + msc.addPackage(makePackage(name, header, "")); +} catch (TException e) { + throw new RuntimeException(e.getCause()); +} + } + + @Override + public void createPackageBody(String name, String body, boolean replace) { +try { + Package existing = msc.findPackage(request(name)); + if (existing != null && StringUtils.isNotEmpty(existing.getBody()) && !replace) +throw new RuntimeException("Package body " + name + " already exists"); + if (existing == null) { +msc.addPackage(makePackage(name, "", body)); Review comment: Is a package body without header valid? I believe this should throw an error. https://docs.oracle.com/cd/E11882_01/appdev.112/e25519/packages.htm#LNPLS00901 says "A package always has a specification" ## File path: hplsql/src/main/java/org/apache/hive/hplsql/Exec.java ## @@ -328,16 +334,16 @@ public String callStackPop() { */ public Var findVariable(String name) { Var var; -String name1 = name; +String name1 = name.toUpperCase(); String name1a = null; String name2 = null; Scope cur = exec.currentScope; Package pack; Package packCallContext = exec.getPackageCallContext(); ArrayList qualified = exec.meta.splitIdentifier(name); if (qualified != null) { - name1 = qualified.get(0); - name2 = qualified.get(1); + name1 = qualified.get(0).toUpperCase(); Review comment: can use `qualified = exec.meta.splitIdentifier(name1)` above so you wont need toUpperCase here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554558) Time Spent: 5h 50m (was: 5h 40m) > Store HPL/SQL packages into
[jira] [Commented] (HIVE-24348) Beeline: Isolating dependencies and execution with java
[ https://issues.apache.org/jira/browse/HIVE-24348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286786#comment-17286786 ] Abhay commented on HIVE-24348: -- The fix has been merged into master here [https://github.com/apache/hive/pull/1906.] Thanks [~ngangam] for the work and the review. > Beeline: Isolating dependencies and execution with java > --- > > Key: HIVE-24348 > URL: https://issues.apache.org/jira/browse/HIVE-24348 > Project: Hive > Issue Type: Improvement > Components: Beeline >Affects Versions: 3.1.0 >Reporter: Naveen Gangam >Assignee: Abhay >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently, beeline code, binaries and executables are somewhat tightly > coupled with the hive product. To be able to execute beeline from a node with > just JRE installed and some jars in classpath is impossible. > * beeline.sh/hive scripts rely on HADOOP_HOME to be set which are designed to > use "hadoop" executable to run beeline. > * Ideally, just the hive-beeline.jar and hive-jdbc-standalone jars should be > enough but sadly they arent. The latter jar adds more problems than it solves > because all the classfiles are shaded some dependencies cannot be resolved. > * Beeline has many other dependencies like hive-exec, hive-common. > hadoop-common, supercsv, jline, commons-cli, commons-io, commons-logging etc. > While it may not be possible to eliminate some of these, we should atleast > have a self-contains jar that contains all these to be able to make it work. > * the underlying script used to run beeline should use JAVA as an alternate > means to execute if HADOOP_HOME is not set -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24768) Use jackson-bom everywhere for version replacement
[ https://issues.apache.org/jira/browse/HIVE-24768?focusedWorklogId=554495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554495 ] ASF GitHub Bot logged work on HIVE-24768: - Author: ASF GitHub Bot Created on: 18/Feb/21 22:12 Start Date: 18/Feb/21 22:12 Worklog Time Spent: 10m Work Description: risdenk commented on a change in pull request #1969: URL: https://github.com/apache/hive/pull/1969#discussion_r578780875 ## File path: standalone-metastore/pom.xml ## @@ -123,6 +123,13 @@ orc-core ${orc.version} + +com.fasterxml.jackson +jackson-bom +${jackson.version} +pom +import + Review comment: Whoops this is in `dependencyManagement`. This means hte jackson-databind dependency below isn't needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554495) Time Spent: 0.5h (was: 20m) > Use jackson-bom everywhere for version replacement > -- > > Key: HIVE-24768 > URL: https://issues.apache.org/jira/browse/HIVE-24768 > Project: Hive > Issue Type: Improvement >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > It's more of an optimization but makes it easier to replace the versions > where ever necessary for Jackson dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24768) Use jackson-bom everywhere for version replacement
[ https://issues.apache.org/jira/browse/HIVE-24768?focusedWorklogId=554496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554496 ] ASF GitHub Bot logged work on HIVE-24768: - Author: ASF GitHub Bot Created on: 18/Feb/21 22:12 Start Date: 18/Feb/21 22:12 Worklog Time Spent: 10m Work Description: risdenk commented on a change in pull request #1969: URL: https://github.com/apache/hive/pull/1969#discussion_r578781109 ## File path: standalone-metastore/pom.xml ## @@ -123,6 +123,13 @@ orc-core ${orc.version} + +com.fasterxml.jackson +jackson-bom +${jackson.version} +pom +import + com.fasterxml.jackson.core jackson-databind Review comment: Since this is already the dependencyManagement section. This jackson-databind version is no longer needed after the jackson-bom is added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554496) Time Spent: 40m (was: 0.5h) > Use jackson-bom everywhere for version replacement > -- > > Key: HIVE-24768 > URL: https://issues.apache.org/jira/browse/HIVE-24768 > Project: Hive > Issue Type: Improvement >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > It's more of an optimization but makes it easier to replace the versions > where ever necessary for Jackson dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24348) Beeline: Isolating dependencies and execution with java
[ https://issues.apache.org/jira/browse/HIVE-24348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-24348. -- Fix Version/s: 4.0.0 Resolution: Fixed > Beeline: Isolating dependencies and execution with java > --- > > Key: HIVE-24348 > URL: https://issues.apache.org/jira/browse/HIVE-24348 > Project: Hive > Issue Type: Improvement > Components: Beeline >Affects Versions: 3.1.0 >Reporter: Naveen Gangam >Assignee: Abhay >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently, beeline code, binaries and executables are somewhat tightly > coupled with the hive product. To be able to execute beeline from a node with > just JRE installed and some jars in classpath is impossible. > * beeline.sh/hive scripts rely on HADOOP_HOME to be set which are designed to > use "hadoop" executable to run beeline. > * Ideally, just the hive-beeline.jar and hive-jdbc-standalone jars should be > enough but sadly they arent. The latter jar adds more problems than it solves > because all the classfiles are shaded some dependencies cannot be resolved. > * Beeline has many other dependencies like hive-exec, hive-common. > hadoop-common, supercsv, jline, commons-cli, commons-io, commons-logging etc. > While it may not be possible to eliminate some of these, we should atleast > have a self-contains jar that contains all these to be able to make it work. > * the underlying script used to run beeline should use JAVA as an alternate > means to execute if HADOOP_HOME is not set -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24348) Beeline: Isolating dependencies and execution with java
[ https://issues.apache.org/jira/browse/HIVE-24348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286684#comment-17286684 ] Naveen Gangam commented on HIVE-24348: -- Fix has been committed to master. Thanks for the additional work [~achennagiri] in getting this in. [~belugabehr] Abhay has tested the dfs functionality with the standalone Beeline distribution. It has been found to be working fine. If there other dependencies that are missing, I think it would work if we added a hive-env.sh script and setting some CLASSPATH variables in here. It might be a fair expectation for users to do so. Now that these dfs commands have been supported for a while now, it will be hard remove it and break backward-compatibility (not sure what the original intent was to include this functionality). > Beeline: Isolating dependencies and execution with java > --- > > Key: HIVE-24348 > URL: https://issues.apache.org/jira/browse/HIVE-24348 > Project: Hive > Issue Type: Improvement > Components: Beeline >Affects Versions: 3.1.0 >Reporter: Naveen Gangam >Assignee: Abhay >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Currently, beeline code, binaries and executables are somewhat tightly > coupled with the hive product. To be able to execute beeline from a node with > just JRE installed and some jars in classpath is impossible. > * beeline.sh/hive scripts rely on HADOOP_HOME to be set which are designed to > use "hadoop" executable to run beeline. > * Ideally, just the hive-beeline.jar and hive-jdbc-standalone jars should be > enough but sadly they arent. The latter jar adds more problems than it solves > because all the classfiles are shaded some dependencies cannot be resolved. > * Beeline has many other dependencies like hive-exec, hive-common. > hadoop-common, supercsv, jline, commons-cli, commons-io, commons-logging etc. > While it may not be possible to eliminate some of these, we should atleast > have a self-contains jar that contains all these to be able to make it work. > * the underlying script used to run beeline should use JAVA as an alternate > means to execute if HADOOP_HOME is not set -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286670#comment-17286670 ] Naveen Gangam commented on HIVE-24786: -- [~prasanth_j] Looks like this change has been merged already. Let me know if you still need me to look into this. Thanks > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. > > Also HIVE-12371 seems to apply the socket timeout only to binary transport. > Same can be passed on to http client as well to avoid retry hang issues with > infinite timeouts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24796) [HS2] Enhance DriverTxnHandler.isValidTxnListState logic to include tableId comparison
[ https://issues.apache.org/jira/browse/HIVE-24796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das reassigned HIVE-24796: - Assignee: Kishen Das > [HS2] Enhance DriverTxnHandler.isValidTxnListState logic to include tableId > comparison > -- > > Key: HIVE-24796 > URL: https://issues.apache.org/jira/browse/HIVE-24796 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > In HS2, after query compilation phase, we acquire lock in DriverTxnHandler. > isValidTxnListState and later ensure there are no conflicting transactions by > using driverContext.getTxnManager().getLatestTxnIdInConflict(); . This > doesn't work well, if there are external entities that drop and recreate the > table with the same name. So, we should also make sure the tableId itself is > not changed, after lock has been acquired. This Jira is to enhance the > DriverTxnHandler.isValidTxnListState logic to include tableId comparison. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23882: -- Description: In probe we cannot currently support Key expressions (on the big table Side) as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at that level). TezCompiler should take this into account when picking MJs to push probe details, In this ticket we extend probedecode compiler logic to: * Validate that MJ is a single Key join, where the bigTable keyCol is only a ExprNodeColumnDesc * For the selected MJ use backtracking logic to find the orignal TS key it maps to (on the same vertex) - this actually revealed several missed or wrong optimizations (updated q outs) * Finally, extend the optimization logic to check if there a type Cast between src and destination (as the types have to match in the probe case) and only use it on the LLAP mode for now was: In probe we cannot currently support Key expressions (on the big table Side) as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at that level). TezCompiler should take this into account when picking MJs to push probe details > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details, In this ticket we extend probedecode compiler logic to: > * Validate that MJ is a single Key join, where the bigTable keyCol is only a > ExprNodeColumnDesc > * For the selected MJ use backtracking logic to find the orignal TS key it > maps to (on the same vertex) - this actually revealed several missed or wrong > optimizations (updated q outs) > * Finally, extend the optimization logic to check if there a type Cast > between src and destination (as the types have to match in the probe case) > and only use it on the LLAP mode for now -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24795) [HS2] Cache tableId in SessionState
[ https://issues.apache.org/jira/browse/HIVE-24795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das reassigned HIVE-24795: - Assignee: Kishen Das > [HS2] Cache tableId in SessionState > > > Key: HIVE-24795 > URL: https://issues.apache.org/jira/browse/HIVE-24795 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > Please go through https://issues.apache.org/jira/browse/HIVE-24794 to > understand why this is required. Its basically to handle a corner case, in > which a table gets dropped and recreated with the same name, after we gather > information about all the tables and we acquire the lock. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24794) [HMS] Populate tableId in the response of get_valid_write_ids API
[ https://issues.apache.org/jira/browse/HIVE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das updated HIVE-24794: -- Summary: [HMS] Populate tableId in the response of get_valid_write_ids API (was: Populate tableId in the response of get_valid_write_ids API) > [HMS] Populate tableId in the response of get_valid_write_ids API > - > > Key: HIVE-24794 > URL: https://issues.apache.org/jira/browse/HIVE-24794 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Priority: Major > > In HS2, after query compilation phase, we acquire lock in DriverTxnHandler. > isValidTxnListState and later ensure there are no conflicting transactions by > using driverContext.getTxnManager().getLatestTxnIdInConflict(); . This > doesn't work well, if there are external entities that drop and recreate the > table with the same name. So, we should also make sure the tableId itself is > not changed, after lock has been acquired. This Jira is to enhance > getValidWriteIdList to return tableId as well. Idea is to cache the tableId > in SessionState and later compare it with what getValidWriteIdList returns. > If the table was dropped and recreated, the tableId will not match and we > have to recompile the query. Caching the tableId in SessionState will be done > as part of a separate Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24795) [HS2] Cache tableId in SessionState
[ https://issues.apache.org/jira/browse/HIVE-24795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das updated HIVE-24795: -- Summary: [HS2] Cache tableId in SessionState (was: Cache tableId in SessionState ) > [HS2] Cache tableId in SessionState > > > Key: HIVE-24795 > URL: https://issues.apache.org/jira/browse/HIVE-24795 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Priority: Major > > Please go through https://issues.apache.org/jira/browse/HIVE-24794 to > understand why this is required. Its basically to handle a corner case, in > which a table gets dropped and recreated with the same name, after we gather > information about all the tables and we acquire the lock. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24794) [HMS] Populate tableId in the response of get_valid_write_ids API
[ https://issues.apache.org/jira/browse/HIVE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das reassigned HIVE-24794: - Assignee: Kishen Das > [HMS] Populate tableId in the response of get_valid_write_ids API > - > > Key: HIVE-24794 > URL: https://issues.apache.org/jira/browse/HIVE-24794 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > In HS2, after query compilation phase, we acquire lock in DriverTxnHandler. > isValidTxnListState and later ensure there are no conflicting transactions by > using driverContext.getTxnManager().getLatestTxnIdInConflict(); . This > doesn't work well, if there are external entities that drop and recreate the > table with the same name. So, we should also make sure the tableId itself is > not changed, after lock has been acquired. This Jira is to enhance > getValidWriteIdList to return tableId as well. Idea is to cache the tableId > in SessionState and later compare it with what getValidWriteIdList returns. > If the table was dropped and recreated, the tableId will not match and we > have to recompile the query. Caching the tableId in SessionState will be done > as part of a separate Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24742) Support router path or view fs path in Hive table location
[ https://issues.apache.org/jira/browse/HIVE-24742?focusedWorklogId=554393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554393 ] ASF GitHub Bot logged work on HIVE-24742: - Author: ASF GitHub Bot Created on: 18/Feb/21 18:37 Start Date: 18/Feb/21 18:37 Worklog Time Spent: 10m Work Description: aihuaxu commented on pull request #1973: URL: https://github.com/apache/hive/pull/1973#issuecomment-781553358 @yongzhi Actually resolvePath() expects the path to exist while e.g., for rename() operation, the target path actually doesn't exist. I will checkout that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554393) Time Spent: 20m (was: 10m) > Support router path or view fs path in Hive table location > -- > > Key: HIVE-24742 > URL: https://issues.apache.org/jira/browse/HIVE-24742 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.2 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24742.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In > [FileUtils.java|https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L747], > equalsFileSystem function checks the base URL to determine if source and > destination are on the same cluster and decides copy or move the data. That > will not work for viewfs or router base file system since > viewfs://ns-default/a and viewfs://ns-default/b may be on different physical > clusters. > FileSystem in HDFS supports resolvePath() function to resolve to the physical > path. We can support viewfs and router through such function. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=554376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554376 ] ASF GitHub Bot logged work on HIVE-23882: - Author: ASF GitHub Bot Created on: 18/Feb/21 18:12 Start Date: 18/Feb/21 18:12 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1286: URL: https://github.com/apache/hive/pull/1286#discussion_r578641665 ## File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out ## @@ -57,6 +57,7 @@ STAGE PLANS: TableScan alias: src filterExpr: key is not null (type: boolean) + probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container, bigKeyColName:key, smallTablePos:0, keyRatio:1.582 Review comment: In general thats the case, this particular query though is a self join where tsKeyCardinality is 500 while mjKeyCardinality is 791 thus the ratio is above 1. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554376) Time Spent: 3h 10m (was: 3h) > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24793) Compiler probe MJ selection candidate fallback
[ https://issues.apache.org/jira/browse/HIVE-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-24793: -- Summary: Compiler probe MJ selection candidate fallback (was: Compiler probe MJ selection algorithm fallback) > Compiler probe MJ selection candidate fallback > -- > > Key: HIVE-24793 > URL: https://issues.apache.org/jira/browse/HIVE-24793 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > As per discussion on #1286 current probe MJ selection algorithm: > * selects best MJ candidate (currently based on the distinct row ratio) > * does some further processing - which may bail out > Bailing out for the best candidate doesn't necessarily mean that we can not > use a less charming candidate. The extra compilation can be wrapped into to > for loop instead of selecting the best candidate the first part could be > implemented as a priority logic -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=554370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554370 ] ASF GitHub Bot logged work on HIVE-23882: - Author: ASF GitHub Bot Created on: 18/Feb/21 18:05 Start Date: 18/Feb/21 18:05 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1286: URL: https://github.com/apache/hive/pull/1286#discussion_r578636427 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java ## @@ -120,7 +120,7 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, String outputColumnName = cSELOutputColumnNames.get(i); ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i); ExprNodeDesc newPSELExprNodeDesc = -ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true); +ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true, false); Review comment: sure, done ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ## @@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) { */ public static ArrayList backtrack(List sources, Operator current, Operator terminal) throws SemanticException { -return backtrack(sources, current, terminal, false); +return backtrack(sources, current, terminal, false, false); } public static ArrayList backtrack(List sources, - Operator current, Operator terminal, boolean foldExpr) throws SemanticException { -ArrayList result = new ArrayList(); + Operator current, Operator terminal, boolean foldExpr, boolean skipRSParent) throws SemanticException { Review comment: sure, renamed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554370) Time Spent: 3h (was: 2h 50m) > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=554369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554369 ] ASF GitHub Bot logged work on HIVE-23882: - Author: ASF GitHub Bot Created on: 18/Feb/21 18:04 Start Date: 18/Feb/21 18:04 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1286: URL: https://github.com/apache/hive/pull/1286#discussion_r578636105 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1589,13 +1588,17 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx) List keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable); ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0); - String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp, keyCol.getColumn()); - if (realTSColName != null) { + ExprNodeColumnDesc originTSColExpr = OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp); + if (originTSColExpr == null) { +LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ Schema: {}", Review comment: Yeah, always had this in the back of my head. Opened HIVE-24793 to track This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554369) Time Spent: 2h 50m (was: 2h 40m) > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24793) Compiler probe MJ selection algorithm fallback
[ https://issues.apache.org/jira/browse/HIVE-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-24793: - > Compiler probe MJ selection algorithm fallback > -- > > Key: HIVE-24793 > URL: https://issues.apache.org/jira/browse/HIVE-24793 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > As per discussion on #1286 current probe MJ selection algorithm: > * selects best MJ candidate (currently based on the distinct row ratio) > * does some further processing - which may bail out > Bailing out for the best candidate doesn't necessarily mean that we can not > use a less charming candidate. The extra compilation can be wrapped into to > for loop instead of selecting the best candidate the first part could be > implemented as a priority logic -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=554348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554348 ] ASF GitHub Bot logged work on HIVE-23882: - Author: ASF GitHub Bot Created on: 18/Feb/21 17:19 Start Date: 18/Feb/21 17:19 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1286: URL: https://github.com/apache/hive/pull/1286#discussion_r578602640 ## File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out ## @@ -57,6 +57,7 @@ STAGE PLANS: TableScan alias: src filterExpr: key is not null (type: boolean) + probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container, bigKeyColName:key, smallTablePos:0, keyRatio:1.582 Review comment: why is `keyRatio` above 1? shouldn't it mean the expected selectivity of the operation? ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ## @@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) { */ public static ArrayList backtrack(List sources, Operator current, Operator terminal) throws SemanticException { -return backtrack(sources, current, terminal, false); +return backtrack(sources, current, terminal, false, false); } public static ArrayList backtrack(List sources, - Operator current, Operator terminal, boolean foldExpr) throws SemanticException { -ArrayList result = new ArrayList(); + Operator current, Operator terminal, boolean foldExpr, boolean skipRSParent) throws SemanticException { Review comment: I think `skipRSParent` is a bit misleading ; you don't want to skip the RS - you want to stay in the same vertex ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1589,13 +1588,17 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx) List keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable); ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0); - String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp, keyCol.getColumn()); - if (realTSColName != null) { + ExprNodeColumnDesc originTSColExpr = OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp); + if (originTSColExpr == null) { +LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ Schema: {}", Review comment: current algorithm seems to be: * select best mj candidate * do some further processing - which may bail out bailing out for the best candidate doesn't neccessarily mean that we will still bail out for a less charming candidate - I think it might worth to try to restructure the extra compilation into to for loop - or instead of selecting the best candidate the first part could be implemented as a priority logic just an idea for a followup ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java ## @@ -120,7 +120,7 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, String outputColumnName = cSELOutputColumnNames.get(i); ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i); ExprNodeDesc newPSELExprNodeDesc = -ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true); +ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true, false); Review comment: instead of modifying every callsite - can we have a method with the original signature? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554348) Time Spent: 2h 40m (was: 2.5h) > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details --
[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check
[ https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=554330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554330 ] ASF GitHub Bot logged work on HIVE-24259: - Author: ASF GitHub Bot Created on: 18/Feb/21 16:42 Start Date: 18/Feb/21 16:42 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1610: URL: https://github.com/apache/hive/pull/1610#discussion_r578572870 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java ## @@ -2187,8 +2160,10 @@ public void addPrimaryKeysToCache(String catName, String dbName, String tblName, cacheLock.readLock().lock(); String tblKey = CacheUtils.buildTableKey(catName, dbName, tblName); TableWrapper tblWrapper = tableCache.getIfPresent(tblKey); - if (tblWrapper != null) { -tblWrapper.cachePrimaryKeys(keys, false); + if ((tblWrapper != null) && tblWrapper.isConstraintsValid()) { +// Because lake of snapshot freshness validation. +// For now disabling cached constraint snapshot addition in parts by invalidating constraint snapshot. +tblWrapper.inValidateConstraints(); Review comment: Resolved ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2674,28 +2554,19 @@ long getPartsFound() { catName = StringUtils.normalizeIdentifier(catName); dbName = StringUtils.normalizeIdentifier(dbName); tblName = StringUtils.normalizeIdentifier(tblName); -if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction())) { +if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction()) || !sharedCache Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554330) Time Spent: 5h (was: 4h 50m) > [CachedStore] Optimise get constraints call by removing redundant table check > -- > > Key: HIVE-24259 > URL: https://issues.apache.org/jira/browse/HIVE-24259 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Description - > Problem - > 1. Redundant check if table is present or not > 2. Currently in order to get all constraint form the cachedstore. 6 different > call is made with in the cached store. Which led to 6 different call to raw > store > > DOD > 1. Check only once if table exit in cached store. > 2. Instead of calling individual constraint in cached store. Add a method > which return all constraint at once and if data is not consistent then fall > back to rawstore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check
[ https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=554326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554326 ] ASF GitHub Bot logged work on HIVE-24259: - Author: ASF GitHub Bot Created on: 18/Feb/21 16:35 Start Date: 18/Feb/21 16:35 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1610: URL: https://github.com/apache/hive/pull/1610#discussion_r578567748 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java ## @@ -29,11 +30,14 @@ private final Table table; private final boolean isReplicated; + private final SQLAllTableConstraints constraints; - public CreateTableEvent(Table table, boolean status, IHMSHandler handler, boolean isReplicated) { + public CreateTableEvent(Table table, boolean status, IHMSHandler handler, boolean isReplicated, + SQLAllTableConstraints constraints) { Review comment: Removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554326) Time Spent: 4h 50m (was: 4h 40m) > [CachedStore] Optimise get constraints call by removing redundant table check > -- > > Key: HIVE-24259 > URL: https://issues.apache.org/jira/browse/HIVE-24259 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > Description - > Problem - > 1. Redundant check if table is present or not > 2. Currently in order to get all constraint form the cachedstore. 6 different > call is made with in the cached store. Which led to 6 different call to raw > store > > DOD > 1. Check only once if table exit in cached store. > 2. Instead of calling individual constraint in cached store. Add a method > which return all constraint at once and if data is not consistent then fall > back to rawstore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=554321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554321 ] ASF GitHub Bot logged work on HIVE-23882: - Author: ASF GitHub Bot Created on: 18/Feb/21 16:31 Start Date: 18/Feb/21 16:31 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #1286: URL: https://github.com/apache/hive/pull/1286#issuecomment-781469495 @pgaref I think it would be usefull to wait #1929 and see if this patch passes with the added check as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554321) Time Spent: 2.5h (was: 2h 20m) > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24786: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. > > Also HIVE-12371 seems to apply the socket timeout only to binary transport. > Same can be passed on to http client as well to avoid retry hang issues with > infinite timeouts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?focusedWorklogId=554277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554277 ] ASF GitHub Bot logged work on HIVE-24786: - Author: ASF GitHub Bot Created on: 18/Feb/21 15:00 Start Date: 18/Feb/21 15:00 Worklog Time Spent: 10m Work Description: prasanthj merged pull request #1983: URL: https://github.com/apache/hive/pull/1983 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554277) Time Spent: 40m (was: 0.5h) > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. > > Also HIVE-12371 seems to apply the socket timeout only to binary transport. > Same can be passed on to http client as well to avoid retry hang issues with > infinite timeouts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24778) Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety properties
[ https://issues.apache.org/jira/browse/HIVE-24778?focusedWorklogId=554265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554265 ] ASF GitHub Bot logged work on HIVE-24778: - Author: ASF GitHub Bot Created on: 18/Feb/21 14:24 Start Date: 18/Feb/21 14:24 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #1982: URL: https://github.com/apache/hive/pull/1982#discussion_r578457991 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java ## @@ -45,7 +45,7 @@ public TimestampCastRestrictorResolver(UDFMethodResolver parentResolver) { this.parentResolver = parentResolver; SessionState ss = SessionState.get(); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { +if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) { Review comment: Small observation: the first example does not reflect the situation in current master. In HIVE-23100, the checks were moved from `ExprNodeGenericFuncDesc` to `TypeCheckProcFactory`. It is true that at the moment `UDFToBoolean`, `UDFToByte`, `UDFToDouble`, etc., are not checked by `TypeCheckProcFactory`; I am not sure if this needs to change or not. In that sense if we want to enforce type checks we need to do it in another way as it is done here with the resolver. The timestamp resolver only checks the case for timestamp/dates but there are other lossy conversions (e.g., BigInt to double) for which we are doing nothing. I assume that if somebody does `CAST( bigint as double)` the query will run without raising an error even if strict checks are enabled. Taking it on the other way around currently if somebody does `CAST (date as double)` the query will run or fail depending on the value of the strict checks property. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554265) Time Spent: 1h (was: 50m) > Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety > properties > > > Key: HIVE-24778 > URL: https://issues.apache.org/jira/browse/HIVE-24778 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The majority of strict type checks can be controlled by > {{hive.strict.checks.type.safety}} property. HIVE-24157 introduced another > property, namely {{hive.strict.timestamp.conversion}}, to control the > implicit comparisons between numerics and timestamps. > The name and description of {{hive.strict.checks.type.safety}} imply that the > property covers all strict checks so having others for specific cases appears > confusing and can easily lead to unexpected behavior. > The goal of this issue is to unify those properties to facilitate > configuration and improve code reuse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24778) Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety properties
[ https://issues.apache.org/jira/browse/HIVE-24778?focusedWorklogId=554182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554182 ] ASF GitHub Bot logged work on HIVE-24778: - Author: ASF GitHub Bot Created on: 18/Feb/21 11:18 Start Date: 18/Feb/21 11:18 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1982: URL: https://github.com/apache/hive/pull/1982#discussion_r578334938 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java ## @@ -45,7 +45,7 @@ public TimestampCastRestrictorResolver(UDFMethodResolver parentResolver) { this.parentResolver = parentResolver; SessionState ss = SessionState.get(); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { +if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) { Review comment: Hey Stamati, it seems that if if its not a direct conversion that can be checked with LOSSY_CONVERSIONS, it has to be part of the UDF init. For example: - Bigint cannot be cast to a double: https://github.com/apache/hive/blob/adf426a87e60e0b99dec7e8aee7d1a5e12462161/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeGenericFuncDesc.java#L248 - Date only allows timestamps and Strings https://github.com/apache/hive/blob/853e1d1f9d17ef61637f094b7bd2cccf44931124/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToDate.java#L68 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554182) Time Spent: 50m (was: 40m) > Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety > properties > > > Key: HIVE-24778 > URL: https://issues.apache.org/jira/browse/HIVE-24778 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The majority of strict type checks can be controlled by > {{hive.strict.checks.type.safety}} property. HIVE-24157 introduced another > property, namely {{hive.strict.timestamp.conversion}}, to control the > implicit comparisons between numerics and timestamps. > The name and description of {{hive.strict.checks.type.safety}} imply that the > property covers all strict checks so having others for specific cases appears > confusing and can easily lead to unexpected behavior. > The goal of this issue is to unify those properties to facilitate > configuration and improve code reuse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check
[ https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=554174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554174 ] ASF GitHub Bot logged work on HIVE-24259: - Author: ASF GitHub Bot Created on: 18/Feb/21 10:52 Start Date: 18/Feb/21 10:52 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1610: URL: https://github.com/apache/hive/pull/1610#discussion_r578260034 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2674,28 +2554,19 @@ long getPartsFound() { catName = StringUtils.normalizeIdentifier(catName); dbName = StringUtils.normalizeIdentifier(dbName); tblName = StringUtils.normalizeIdentifier(tblName); -if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction())) { +if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction()) || !sharedCache Review comment: This check is done in several places. Shall add utility method for this. ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java ## @@ -29,11 +30,14 @@ private final Table table; private final boolean isReplicated; + private final SQLAllTableConstraints constraints; - public CreateTableEvent(Table table, boolean status, IHMSHandler handler, boolean isReplicated) { + public CreateTableEvent(Table table, boolean status, IHMSHandler handler, boolean isReplicated, + SQLAllTableConstraints constraints) { Review comment: We have separate events for add individual constraints. I think, we need not keep it as part of create table event. It adds additional overhead when transmit events for incremental replication. Also, takes additional storage in HMS. ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java ## @@ -2187,8 +2160,10 @@ public void addPrimaryKeysToCache(String catName, String dbName, String tblName, cacheLock.readLock().lock(); String tblKey = CacheUtils.buildTableKey(catName, dbName, tblName); TableWrapper tblWrapper = tableCache.getIfPresent(tblKey); - if (tblWrapper != null) { -tblWrapper.cachePrimaryKeys(keys, false); + if ((tblWrapper != null) && tblWrapper.isConstraintsValid()) { +// Because lake of snapshot freshness validation. +// For now disabling cached constraint snapshot addition in parts by invalidating constraint snapshot. +tblWrapper.inValidateConstraints(); Review comment: invalidate is single word in this context.Change method name to invalidateConstraintsCache. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554174) Time Spent: 4h 40m (was: 4.5h) > [CachedStore] Optimise get constraints call by removing redundant table check > -- > > Key: HIVE-24259 > URL: https://issues.apache.org/jira/browse/HIVE-24259 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > Description - > Problem - > 1. Redundant check if table is present or not > 2. Currently in order to get all constraint form the cachedstore. 6 different > call is made with in the cached store. Which led to 6 different call to raw > store > > DOD > 1. Check only once if table exit in cached store. > 2. Instead of calling individual constraint in cached store. Add a method > which return all constraint at once and if data is not consistent then fall > back to rawstore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24787) Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488
[ https://issues.apache.org/jira/browse/HIVE-24787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-24787: --- External issue URL: (was: https://revivalvape.com/) > Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488 > --- > > Key: HIVE-24787 > URL: https://issues.apache.org/jira/browse/HIVE-24787 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Revival Vape >Assignee: Revival Vape >Priority: Major > Fix For: 2.4.0, 3.1.2 > > > Hive is pulling in log4j 2.12.1 specifically to: > * ./usr/lib/hive/lib/log4j-core-2.12.1.jar > CVE-2020-9488 affects this version and the fix is to upgrade to 2.13.2+. So, > upgrade this dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286402#comment-17286402 ] Peter Varga commented on HIVE-24715: Merged to master, thanks for the patch [~amagyar] > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Bucket Id range increase.pdf > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24715: - Resolution: Fixed Status: Resolved (was: Patch Available) > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Bucket Id range increase.pdf > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?focusedWorklogId=554169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554169 ] ASF GitHub Bot logged work on HIVE-24715: - Author: ASF GitHub Bot Created on: 18/Feb/21 10:18 Start Date: 18/Feb/21 10:18 Worklog Time Spent: 10m Work Description: pvargacl merged pull request #1968: URL: https://github.com/apache/hive/pull/1968 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554169) Time Spent: 1h 20m (was: 1h 10m) > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Bucket Id range increase.pdf > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24792) Potential thread leak in Operation
[ https://issues.apache.org/jira/browse/HIVE-24792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng reassigned HIVE-24792: -- Assignee: (was: Zhihua Deng) > Potential thread leak in Operation > -- > > Key: HIVE-24792 > URL: https://issues.apache.org/jira/browse/HIVE-24792 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The _scheduledExecutorService_ in _Operation_ does not shut down after > scheduling delay operationlog cleanup, which may result to thread leak in > hiveserver2... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24792) Potential thread leak in Operation
[ https://issues.apache.org/jira/browse/HIVE-24792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng reassigned HIVE-24792: -- Assignee: Zhihua Deng > Potential thread leak in Operation > -- > > Key: HIVE-24792 > URL: https://issues.apache.org/jira/browse/HIVE-24792 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The _scheduledExecutorService_ in _Operation_ does not shut down after > scheduling delay operationlog cleanup, which may result to thread leak in > hiveserver2... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24792) Potential thread leak in Operation
[ https://issues.apache.org/jira/browse/HIVE-24792?focusedWorklogId=554123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554123 ] ASF GitHub Bot logged work on HIVE-24792: - Author: ASF GitHub Bot Created on: 18/Feb/21 08:16 Start Date: 18/Feb/21 08:16 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1992: URL: https://github.com/apache/hive/pull/1992 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554123) Remaining Estimate: 0h Time Spent: 10m > Potential thread leak in Operation > -- > > Key: HIVE-24792 > URL: https://issues.apache.org/jira/browse/HIVE-24792 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The _scheduledExecutorService_ in _Operation_ does not shut down after > scheduling delay operationlog cleanup, which may result to thread leak in > hiveserver2... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24792) Potential thread leak in Operation
[ https://issues.apache.org/jira/browse/HIVE-24792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24792: -- Labels: pull-request-available (was: ) > Potential thread leak in Operation > -- > > Key: HIVE-24792 > URL: https://issues.apache.org/jira/browse/HIVE-24792 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The _scheduledExecutorService_ in _Operation_ does not shut down after > scheduling delay operationlog cleanup, which may result to thread leak in > hiveserver2... -- This message was sent by Atlassian Jira (v8.3.4#803005)