[jira] [Updated] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions for External Tables
[ https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narayanan Venkateswaran updated HIVE-25178: --- Summary: Reduce number of getPartition calls during loadDynamicPartitions for External Tables (was: Reduce number of getPartition calls during loadDynamicPartitions) > Reduce number of getPartition calls during loadDynamicPartitions for External > Tables > > > Key: HIVE-25178 > URL: https://issues.apache.org/jira/browse/HIVE-25178 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: Narayanan Venkateswaran >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When dynamic partitions are loaded, Hive::loadDynamicPartition loads all > partitions from HMS causing heavy load on it. This becomes worse when large > number of partitions are present in tables. > Only relevant partitions being loaded in dynamic partitions can be queried > from HMS for partition existence. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604953 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 02/Jun/21 05:24 Start Date: 02/Jun/21 05:24 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r643661160 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -625,6 +639,16 @@ public boolean runOneWorkerIteration( } String cmd = null; try { + TableName tb = req.tableName; + String dbName = MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf); + if (dbsBeingFailedOver.contains(dbName) + || MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb( { +if (!dbsBeingFailedOver.contains(dbName)) { Review comment: you can simplify this. We don't need this check all the times -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604953) Time Spent: 2.5h (was: 2h 20m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25183) Parsing error for Correlated Inner Joins
[ https://issues.apache.org/jira/browse/HIVE-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25183: -- Labels: pull-request-available (was: ) > Parsing error for Correlated Inner Joins > > > Key: HIVE-25183 > URL: https://issues.apache.org/jira/browse/HIVE-25183 > Project: Hive > Issue Type: Sub-task > Components: Parser >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The issue is similar to HIVE-25090 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25183) Parsing error for Correlated Inner Joins
[ https://issues.apache.org/jira/browse/HIVE-25183?focusedWorklogId=604950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604950 ] ASF GitHub Bot logged work on HIVE-25183: - Author: ASF GitHub Bot Created on: 02/Jun/21 05:03 Start Date: 02/Jun/21 05:03 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #2302: URL: https://github.com/apache/hive/pull/2302#discussion_r643653073 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java ## @@ -104,12 +105,20 @@ private boolean hasTableAlias(JoinTypeCheckCtx ctx, String tabName, ASTNode expr tblAliasCnt++; } + if (tblAliasCnt == 0 && ctx.getOuterRR() != null) { Review comment: Why do we do this check? Maybe add a comment to the code. ## File path: ql/src/test/queries/clientpositive/subquery_corr_join.q ## @@ -0,0 +1,69 @@ +create table alltypestiny( +id int, +int_col int, +bigint_col bigint, +bool_col boolean +); + +insert into alltypestiny(id, int_col, bigint_col, bool_col) values +(1, 1, 10, true), +(2, 4, 5, false), +(3, 5, 15, true), +(10, 10, 30, false); + +create table alltypesagg( +id int, +int_col int, +bool_col boolean +); + +insert into alltypesagg(id, int_col, bool_col) values +(1, 1, true), +(2, 4, false), +(5, 6, true), +(null, null, false); + +select * +from alltypesagg t1 +where t1.id not in +(select tt1.id + from alltypestiny tt1 inner JOIN alltypesagg tt2 Review comment: Can we add a q file with a negative test for outer joins? That will be useful to make sure that the query will fail for the time being, as expected. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterJoinRule.java ## @@ -90,6 +90,36 @@ public boolean matches(RelOptRuleCall call) { } + /** + * Rule that tries to push join conditions into its inputs + */ + public static class HiveJoinConditionPushRule extends HiveFilterJoinRule { Review comment: Isn't this the same as `HiveFilterJoinTransposeRule`? It should not be necessary. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java ## @@ -34,6 +34,7 @@ import javax.annotation.Nonnull; +import org.apache.calcite.adapter.enumerable.EnumerableConvention; Review comment: This does not seem needed? ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java ## @@ -194,6 +207,19 @@ private ColumnInfo getColInfo(JoinTypeCheckCtx ctx, String tabName, String colAl } } + if (cInfoToRet == null && ctx.getOuterRR() != null) { +for (RowResolver rr : ImmutableList.of(ctx.getOuterRR())) { Review comment: Is the `ImmutableList.of` wrapping needed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604950) Remaining Estimate: 0h Time Spent: 10m > Parsing error for Correlated Inner Joins > > > Key: HIVE-25183 > URL: https://issues.apache.org/jira/browse/HIVE-25183 > Project: Hive > Issue Type: Sub-task > Components: Parser >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The issue is similar to HIVE-25090 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604944 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 02/Jun/21 04:42 Start Date: 02/Jun/21 04:42 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r643647370 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -625,6 +639,16 @@ public boolean runOneWorkerIteration( } String cmd = null; try { + TableName tb = req.tableName; + String dbName = MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf); + if (dbsBeingFailedOver.contains(dbName) + || MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb( { +if (!dbsBeingFailedOver.contains(dbName)) { Review comment: if current dbName is not present in dbsBeingFailover set, then it'll add this db to it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604944) Time Spent: 2h 20m (was: 2h 10m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604938 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 02/Jun/21 04:20 Start Date: 02/Jun/21 04:20 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r643399198 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -625,6 +639,16 @@ public boolean runOneWorkerIteration( } String cmd = null; try { + TableName tb = req.tableName; + String dbName = MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf); + if (dbsBeingFailedOver.contains(dbName) + || MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb( { +if (!dbsBeingFailedOver.contains(dbName)) { Review comment: How will this condition be true ? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java ## @@ -222,17 +237,31 @@ private void setupMsckPathInvalidation() { private Configuration conf; private String qualifiedTableName; private CountDownLatch countDownLatch; +private Set dbsBeingFailedOver; +private IMetaStoreClient msc; -MsckThread(MsckInfo msckInfo, Configuration conf, String qualifiedTableName, CountDownLatch countDownLatch) { +MsckThread(MsckInfo msckInfo, Configuration conf, String qualifiedTableName, + CountDownLatch countDownLatch, Set dbsBeingFailedOver, IMetaStoreClient msc) { this.msckInfo = msckInfo; this.conf = conf; this.qualifiedTableName = qualifiedTableName; this.countDownLatch = countDownLatch; + this.dbsBeingFailedOver = dbsBeingFailedOver; + this.msc = msc; } @Override public void run() { try { +String dbName = MetaStoreUtils.prependCatalogToDbName(msckInfo.getCatalogName(), msckInfo.getDbName(), conf); +if (dbsBeingFailedOver.contains(dbName) || + MetaStoreUtils.isDbBeingFailedOver(msc.getDatabase(msckInfo.getCatalogName(), msckInfo.getDbName( { + if (!dbsBeingFailedOver.contains(dbName)) { Review comment: This isn't thread-safe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604938) Time Spent: 2h 10m (was: 2h) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler
[ https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604937 ] ASF GitHub Bot logged work on HIVE-25055: - Author: ASF GitHub Bot Created on: 02/Jun/21 04:17 Start Date: 02/Jun/21 04:17 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #2218: URL: https://github.com/apache/hive/pull/2218#discussion_r643639955 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore; + +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.utils.JavaUtils; +import org.apache.thrift.TException; + +import static java.util.Objects.requireNonNull; + +public final class ExceptionHandler { + private final Exception e; + + private ExceptionHandler(Exception e) { +this.e = e; + } + + public static ExceptionHandler handleException(Exception e) { +requireNonNull(e, "Exception e is null"); +return new ExceptionHandler(e); + } + + /** + * Throws if the input e is the instance of the class clz + */ + public ExceptionHandler + throwIfInstance(Class t) throws T { +if (t.isInstance(e)) { + throw t.cast(e); +} +return this; + } + + /** + * Throws if the input e is the instance of the class clzt or class clze in order + */ + public ExceptionHandler + throwIfInstance(Class t, Class e) throws T, E { +throwIfInstance(t); +throwIfInstance(e); +return this; + } + + /** + * Throws if the input e is the instance of the class clzt or clze or clzc in order + */ + public ExceptionHandler + throwIfInstance(Class t, Class e, Class c) throws T, E, C { Review comment: Yes, we can do it by this more simplified way, thank you very much for the comments! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604937) Time Spent: 2.5h (was: 2h 20m) > Improve the exception handling in HMSHandler > > > Key: HIVE-25055 > URL: https://issues.apache.org/jira/browse/HIVE-25055 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler
[ https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604936 ] ASF GitHub Bot logged work on HIVE-25055: - Author: ASF GitHub Bot Created on: 02/Jun/21 04:14 Start Date: 02/Jun/21 04:14 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #2218: URL: https://github.com/apache/hive/pull/2218#discussion_r643639099 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore; + +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.utils.JavaUtils; +import org.apache.thrift.TException; + +import static java.util.Objects.requireNonNull; + +public final class ExceptionHandler { + private final Exception e; + + private ExceptionHandler(Exception e) { +this.e = e; + } + + public static ExceptionHandler handleException(Exception e) { +requireNonNull(e, "Exception e is null"); +return new ExceptionHandler(e); + } + + /** + * Throws if the input e is the instance of the class clz + */ + public ExceptionHandler + throwIfInstance(Class t) throws T { +if (t.isInstance(e)) { + throw t.cast(e); +} +return this; + } + + /** + * Throws if the input e is the instance of the class clzt or class clze in order + */ + public ExceptionHandler + throwIfInstance(Class t, Class e) throws T, E { +throwIfInstance(t); +throwIfInstance(e); +return this; + } + + /** + * Throws if the input e is the instance of the class clzt or clze or clzc in order + */ + public ExceptionHandler + throwIfInstance(Class t, Class e, Class c) throws T, E, C { +throwIfInstance(t); +throwIfInstance(e); +throwIfInstance(c); +return this; + } + + /** + * Converts the input e if it is the instance of class from to the instance of class to and throws + */ + public ExceptionHandler + convertIfInstance(Class from, Class to) throws D { +D targetException = null; +if (from.isInstance(e)) { + try { +targetException = JavaUtils.newInstance(to, new Class[]{String.class}, new Object[]{e.getMessage()}); + } catch (Exception ex) { +// this should not happen +throw new RuntimeException(ex); + } +} +if (targetException != null) { + throw targetException; +} + +return this; + } + + /** + * Converts the input e if it is the instance of classes to MetaException with the given message + */ + public ExceptionHandler convertToMetaExIfInstance(String message, Class... classes) + throws MetaException { +if (classes != null && classes.length > 0) { + for (Class clz : classes) { +if (clz.isInstance(e)) { + // throw the exception if matches + throw new MetaException(message); +} + } +} +return this; + } + + public static TException rethrowException(Exception e) throws TException { Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604936) Time Spent: 2h 20m (was: 2h 10m) > Improve the exception handling in HMSHandler > > > Key: HIVE-25055 > URL: https://issues.apache.org/jira/browse/HIVE-25055 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time
[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler
[ https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604934 ] ASF GitHub Bot logged work on HIVE-25055: - Author: ASF GitHub Bot Created on: 02/Jun/21 04:14 Start Date: 02/Jun/21 04:14 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #2218: URL: https://github.com/apache/hive/pull/2218#discussion_r643638989 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore; + +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.utils.JavaUtils; +import org.apache.thrift.TException; + +import static java.util.Objects.requireNonNull; + +public final class ExceptionHandler { Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604934) Time Spent: 2h (was: 1h 50m) > Improve the exception handling in HMSHandler > > > Key: HIVE-25055 > URL: https://issues.apache.org/jira/browse/HIVE-25055 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler
[ https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604935 ] ASF GitHub Bot logged work on HIVE-25055: - Author: ASF GitHub Bot Created on: 02/Jun/21 04:14 Start Date: 02/Jun/21 04:14 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #2218: URL: https://github.com/apache/hive/pull/2218#discussion_r643639041 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java ## @@ -914,6 +914,7 @@ private boolean isViewTable(String catName, String dbName, String tblName) throw long queryTime = doTrace ? System.nanoTime() : 0; MetastoreDirectSqlUtils.timingTrace(doTrace, queryText, start, queryTime); if (sqlResult.isEmpty()) { + query.closeAll(); Review comment: Fine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604935) Time Spent: 2h 10m (was: 2h) > Improve the exception handling in HMSHandler > > > Key: HIVE-25055 > URL: https://issues.apache.org/jira/browse/HIVE-25055 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'
[ https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355461#comment-17355461 ] zhangbutao commented on HIVE-25186: --- This exception looks like HIVE-23756 which failed to delete a table intermittently. > Drop database fails with excetion 'Cannot delete or update a parent row: a > foreign key constraint fails' > > > Key: HIVE-25186 > URL: https://issues.apache.org/jira/browse/HIVE-25186 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: zhangbutao >Priority: Major > Attachments: drop_database_exception.txt > > > We use Hive master branch (HiveMetastoreClient Api) to create and drop > database. > When we drop database with following sample code, some exceptions will occur > occasionally. > {code:java} > HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); > hiveMetaClient.dropDatabase("testdb", true, true, true); > {code} > {code:java} > java.sql.BatchUpdateException: Cannot delete or update a parent row: a > foreign key constraint fails ("hive"."tbls", CONSTRAINT > "FKdg0lkp80iro5fs41hyvi9ox43" FOREIGN KEY ("DB_ID") REFERENCES "dbs" > ("DB_ID")) > at > com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2058) > at > com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1471) > at > org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125) > at > org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) > at > org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:366) > at > org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:667) > at > org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:635) > at > org.datanucleus.store.rdbms.SQLController$1.transactionFlushed(SQLController.java:721) > at > org.datanucleus.store.connection.AbstractManagedConnection.transactionFlushed(AbstractManagedConnection.java:95) > at > org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:528) > at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:222) > at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:286) > at > org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:107) > at > org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:598) > at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > at com.sun.proxy.$Proxy27.commitTransaction(Unknown Source) > at > org.apache.hadoop.hive.metastore.HMSHandler.drop_database_core(HMSHandler.java:1898) > at > org.apache.hadoop.hive.metastore.HMSHandler.drop_database(HMSHandler.java:1954) > at sun.reflect.GeneratedMethodAccessor219.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy28.drop_database(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17577) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17556) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at >
[jira] [Updated] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'
[ https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-25186: -- Attachment: drop_database_exception.txt > Drop database fails with excetion 'Cannot delete or update a parent row: a > foreign key constraint fails' > > > Key: HIVE-25186 > URL: https://issues.apache.org/jira/browse/HIVE-25186 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: zhangbutao >Priority: Major > Attachments: drop_database_exception.txt > > > We use Hive master branch (HiveMetastoreClient Api) to create and drop > database. > When we drop database with following sample code, some exceptions will occur > occasionally. > {code:java} > HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); > hiveMetaClient.dropDatabase("testdb", true, true, true); > {code} > {code:java} > java.sql.BatchUpdateException: Cannot delete or update a parent row: a > foreign key constraint fails ("hive"."tbls", CONSTRAINT > "FKdg0lkp80iro5fs41hyvi9ox43" FOREIGN KEY ("DB_ID") REFERENCES "dbs" > ("DB_ID")) > at > com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2058) > at > com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1471) > at > org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125) > at > org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) > at > org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:366) > at > org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:667) > at > org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:635) > at > org.datanucleus.store.rdbms.SQLController$1.transactionFlushed(SQLController.java:721) > at > org.datanucleus.store.connection.AbstractManagedConnection.transactionFlushed(AbstractManagedConnection.java:95) > at > org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:528) > at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:222) > at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:286) > at > org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:107) > at > org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:598) > at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > at com.sun.proxy.$Proxy27.commitTransaction(Unknown Source) > at > org.apache.hadoop.hive.metastore.HMSHandler.drop_database_core(HMSHandler.java:1898) > at > org.apache.hadoop.hive.metastore.HMSHandler.drop_database(HMSHandler.java:1954) > at sun.reflect.GeneratedMethodAccessor219.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy28.drop_database(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17577) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17556) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) >
[jira] [Updated] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'
[ https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-25186: -- Description: We use Hive master branch (HiveMetastoreClient Api) to create and drop database. When we drop database with following sample code, some exceptions will occur occasionally. {code:java} HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); hiveMetaClient.dropDatabase("testdb", true, true, true); {code} {code:java} java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails ("hive"."tbls", CONSTRAINT "FKdg0lkp80iro5fs41hyvi9ox43" FOREIGN KEY ("DB_ID") REFERENCES "dbs" ("DB_ID")) at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2058) at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1471) at org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125) at org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:366) at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:667) at org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:635) at org.datanucleus.store.rdbms.SQLController$1.transactionFlushed(SQLController.java:721) at org.datanucleus.store.connection.AbstractManagedConnection.transactionFlushed(AbstractManagedConnection.java:95) at org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:528) at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:222) at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:286) at org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:107) at org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:598) at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) at com.sun.proxy.$Proxy27.commitTransaction(Unknown Source) at org.apache.hadoop.hive.metastore.HMSHandler.drop_database_core(HMSHandler.java:1898) at org.apache.hadoop.hive.metastore.HMSHandler.drop_database(HMSHandler.java:1954) at sun.reflect.GeneratedMethodAccessor219.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy28.drop_database(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17577) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17556) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} was: We use Hive master branch (HiveMetastoreClient Api) to create and drop database. When we drop database with following sample code, some exceptions will occur occasionally. {code:java} HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); hiveMetaClient.dropDatabase("testdb", true, true, true); {code} > Drop database fails with excetion 'Cannot delete or update a parent row: a > foreign key constraint fails' >
[jira] [Updated] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'
[ https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-25186: -- Description: We use Hive master branch (HiveMetastoreClient Api) to create and drop database. When we drop database with following sample code, some exceptions will occur occasionally. {code:java} HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); hiveMetaClient.dropDatabase("testdb", true, true, true); {code} was: We use Hive master branch (HiveMetastoreClient) to create and drop database. > Drop database fails with excetion 'Cannot delete or update a parent row: a > foreign key constraint fails' > > > Key: HIVE-25186 > URL: https://issues.apache.org/jira/browse/HIVE-25186 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: zhangbutao >Priority: Major > > We use Hive master branch (HiveMetastoreClient Api) to create and drop > database. > When we drop database with following sample code, some exceptions will occur > occasionally. > {code:java} > HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); > hiveMetaClient.dropDatabase("testdb", true, true, true); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'
[ https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-25186: -- Description: We use Hive master branch (HiveMetastoreClient) to create and drop database. > Drop database fails with excetion 'Cannot delete or update a parent row: a > foreign key constraint fails' > > > Key: HIVE-25186 > URL: https://issues.apache.org/jira/browse/HIVE-25186 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: zhangbutao >Priority: Major > > We use Hive master branch (HiveMetastoreClient) to create and drop database. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604872 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 02/Jun/21 01:25 Start Date: 02/Jun/21 01:25 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r643587387 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -94,6 +98,8 @@ private BlockingQueue workQueue; private Thread[] workers; + private Set dbsBeingFailedOver; Review comment: This is the only way we can access this set within each iteration of StatsUpdater and also within each execution of actual analysis work after dequeuing from the worker queue. This set is clear when new iteration of this thread kicks in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604872) Time Spent: 2h (was: 1h 50m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=604869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604869 ] ASF GitHub Bot logged work on HIVE-25185: - Author: ASF GitHub Bot Created on: 02/Jun/21 01:19 Start Date: 02/Jun/21 01:19 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #2339: URL: https://github.com/apache/hive/pull/2339#discussion_r643585446 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java ## @@ -131,13 +131,14 @@ SessionType getSession() throws Exception { poolLock.lock(); try { while ((result = pool.poll()) == null) { - notEmpty.await(100, TimeUnit.MILLISECONDS); + LOG.info("Awaiting Tez session to become available in session pool"); + notEmpty.await(10, TimeUnit.SECONDS); Review comment: Hello @miklosgergely. The current change wait for 100ms whereas I have changed this to wait until 10s. I'm not sure what the value is of setting a timeout, it will always just loop again. Since I do not see any value here, but I wouldn't want to remove this `wait` altogether as part of this ticket, I simply increased it. Logging once every 100ms would be far too verbose. So you see, this change is logging-related. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604869) Time Spent: 40m (was: 0.5h) > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=604868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604868 ] ASF GitHub Bot logged work on HIVE-25185: - Author: ASF GitHub Bot Created on: 02/Jun/21 01:19 Start Date: 02/Jun/21 01:19 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #2339: URL: https://github.com/apache/hive/pull/2339#discussion_r643585446 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java ## @@ -131,13 +131,14 @@ SessionType getSession() throws Exception { poolLock.lock(); try { while ((result = pool.poll()) == null) { - notEmpty.await(100, TimeUnit.MILLISECONDS); + LOG.info("Awaiting Tez session to become available in session pool"); + notEmpty.await(10, TimeUnit.SECONDS); Review comment: Hello @miklosgergely. The current change wait for 100ms whereas I have changed this to wait until 10s. I'm not sure what the value is of setting a timeout, it will always just loop again. Since I do not see any value here, but I wouldn't want to remove this as part of this ticket, I simply increased it. Logging once every 100ms would be far too verbose. So you see, this change is logging-related. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604868) Time Spent: 0.5h (was: 20m) > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25168) Add mutable validWriteIdList
[ https://issues.apache.org/jira/browse/HIVE-25168?focusedWorklogId=604858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604858 ] ASF GitHub Bot logged work on HIVE-25168: - Author: ASF GitHub Bot Created on: 02/Jun/21 01:06 Start Date: 02/Jun/21 01:06 Worklog Time Spent: 10m Work Description: hsnusonic closed pull request #2324: URL: https://github.com/apache/hive/pull/2324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604858) Time Spent: 40m (was: 0.5h) > Add mutable validWriteIdList > > > Key: HIVE-25168 > URL: https://issues.apache.org/jira/browse/HIVE-25168 > Project: Hive > Issue Type: New Feature > Components: storage-api >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Although the current implementation for validWriteIdList is not strictly > immutable, it is in some sense to provide a read-only view snapshot. This > change is to add another class to provide functionalities for manipulating > the writeIdList. We could use this to keep writeIdList up-to-date in an > external cache layer for event-based metadata refreshing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler
[ https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604834 ] ASF GitHub Bot logged work on HIVE-25055: - Author: ASF GitHub Bot Created on: 02/Jun/21 00:13 Start Date: 02/Jun/21 00:13 Worklog Time Spent: 10m Work Description: vihangk1 commented on a change in pull request #2218: URL: https://github.com/apache/hive/pull/2218#discussion_r643488870 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java ## @@ -914,6 +914,7 @@ private boolean isViewTable(String catName, String dbName, String tblName) throw long queryTime = doTrace ? System.nanoTime() : 0; MetastoreDirectSqlUtils.timingTrace(doTrace, queryText, start, queryTime); if (sqlResult.isEmpty()) { + query.closeAll(); Review comment: It looks like this is fixing a unrelated bug. Can we move this out of this PR and create a different JIRA for this? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore; + +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.utils.JavaUtils; +import org.apache.thrift.TException; + +import static java.util.Objects.requireNonNull; + +public final class ExceptionHandler { Review comment: Can you add a class level comment? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore; + +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.utils.JavaUtils; +import org.apache.thrift.TException; + +import static java.util.Objects.requireNonNull; + +public final class ExceptionHandler { + private final Exception e; + + private ExceptionHandler(Exception e) { +this.e = e; + } + + public static ExceptionHandler handleException(Exception e) { +requireNonNull(e, "Exception e is null"); +return new ExceptionHandler(e); + } + + /** + * Throws if the input e is the instance of the class clz + */ + public ExceptionHandler + throwIfInstance(Class t) throws T { +if (t.isInstance(e)) { + throw t.cast(e); +} +return this; + } + + /** + * Throws if the input e is the instance of the class clzt or class clze in order + */ + public ExceptionHandler + throwIfInstance(Class t, Class e) throws T, E { +throwIfInstance(t); +throwIfInstance(e); +return this; + } + + /** + * Throws if the input e is the instance of the class clzt or clze or clzc in order + */ + public ExceptionHandler + throwIfInstance(Class t, Class e, Class c) throws T, E, C { +throwIfInstance(t); +throwIfInstance(e); +throwIfInstance(c); +return this; + } + + /** + * Converts the input e if it is the instance of class from to the
[jira] [Work logged] (HIVE-23756) drop table command fails with MySQLIntegrityConstraintViolationException:
[ https://issues.apache.org/jira/browse/HIVE-23756?focusedWorklogId=604821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604821 ] ASF GitHub Bot logged work on HIVE-23756: - Author: ASF GitHub Bot Created on: 01/Jun/21 23:17 Start Date: 01/Jun/21 23:17 Worklog Time Spent: 10m Work Description: scarlin-cloudera opened a new pull request #2340: URL: https://github.com/apache/hive/pull/2340 In a previous checkin, some constraints were added to the package.jdo file, but there are more constraints that need to be added to fix the problem. ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604821) Time Spent: 40m (was: 0.5h) > drop table command fails with MySQLIntegrityConstraintViolationException: > - > > Key: HIVE-23756 > URL: https://issues.apache.org/jira/browse/HIVE-23756 > Project: Hive > Issue Type: Bug >Reporter: Ganesha Shreedhara >Assignee: Ganesha Shreedhara >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23756.1.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Drop table command fails intermittently with the following exception. > {code:java} > Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent > row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT > "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at > com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at > com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) > Appat > org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372) > at > org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628) > at > org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207) > at > org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179) > at > org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901) > ... 36 more > Caused by: > com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: > Cannot delete or update a parent row: a foreign key constraint fails > ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") > REFERENCES "CDS" ("CD_ID")) > at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:377) > at com.mysql.jdbc.Util.getInstance(Util.java:360) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code} > Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 > table specified in package.jdo file is not same as the FK constraint name > used while creating COLUMNS_V2 table ([Ref|#L60]]). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled
[ https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-25140: Attachment: HIVE-25140.03.patch > Hive Distributed Tracing -- Part 1: Disabled > > > Key: HIVE-25140 > URL: https://issues.apache.org/jira/browse/HIVE-25140 > Project: Hive > Issue Type: Sub-task >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > Attachments: HIVE-25140.01.patch, HIVE-25140.02.patch, > HIVE-25140.03.patch > > > Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to > Thrift and protobuf version conflicts. A logging only exporter is used. > There are Spans for BeeLine and Hive. Server 2. The code was developed on > branch-3.1 and porting Spans to the Hive MetaStore on master is taking more > time due to major metastore code refactoring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=604802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604802 ] ASF GitHub Bot logged work on HIVE-25185: - Author: ASF GitHub Bot Created on: 01/Jun/21 22:04 Start Date: 01/Jun/21 22:04 Worklog Time Spent: 10m Work Description: miklosgergely commented on a change in pull request #2339: URL: https://github.com/apache/hive/pull/2339#discussion_r643513763 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java ## @@ -131,13 +131,14 @@ SessionType getSession() throws Exception { poolLock.lock(); try { while ((result = pool.poll()) == null) { - notEmpty.await(100, TimeUnit.MILLISECONDS); + LOG.info("Awaiting Tez session to become available in session pool"); + notEmpty.await(10, TimeUnit.SECONDS); Review comment: You are changing the waiting time to 10s from 100s. If this is intentional, then it shouldn't be put into a commit where the commit message says that it's about improving logging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604802) Time Spent: 20m (was: 10m) > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally
[ https://issues.apache.org/jira/browse/HIVE-24944?focusedWorklogId=604765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604765 ] ASF GitHub Bot logged work on HIVE-24944: - Author: ASF GitHub Bot Created on: 01/Jun/21 20:21 Start Date: 01/Jun/21 20:21 Worklog Time Spent: 10m Work Description: belugabehr edited a comment on pull request #2204: URL: https://github.com/apache/hive/pull/2204#issuecomment-852419901 Looks alright. Can you do me a quick favor and switch statement it? ```java switch(engineInSessionConf) { case "tez": case "mr": default: } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604765) Time Spent: 50m (was: 40m) > When the default engine of the hiveserver is MR and the tez engine is set by > the client, the client TEZ progress log cannot be printed normally > --- > > Key: HIVE-24944 > URL: https://issues.apache.org/jira/browse/HIVE-24944 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 3.1.0, 4.0.0 >Reporter: ZhangQiDong >Assignee: ZhangQiDong >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24944.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > HiveServer configuration parameter execution default MR. When set > hive.execution.engine = tez, the client cannot print the TEZ log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally
[ https://issues.apache.org/jira/browse/HIVE-24944?focusedWorklogId=604764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604764 ] ASF GitHub Bot logged work on HIVE-24944: - Author: ASF GitHub Bot Created on: 01/Jun/21 20:21 Start Date: 01/Jun/21 20:21 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2204: URL: https://github.com/apache/hive/pull/2204#issuecomment-852419901 Looks alright. Can you do me a quick favor and case statement it? ```java switch(engineInSessionConf) { case "tez": case "mr": default: } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604764) Time Spent: 40m (was: 0.5h) > When the default engine of the hiveserver is MR and the tez engine is set by > the client, the client TEZ progress log cannot be printed normally > --- > > Key: HIVE-24944 > URL: https://issues.apache.org/jira/browse/HIVE-24944 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 3.1.0, 4.0.0 >Reporter: ZhangQiDong >Assignee: ZhangQiDong >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24944.001.patch > > Time Spent: 40m > Remaining Estimate: 0h > > HiveServer configuration parameter execution default MR. When set > hive.execution.engine = tez, the client cannot print the TEZ log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25168) Add mutable validWriteIdList
[ https://issues.apache.org/jira/browse/HIVE-25168?focusedWorklogId=604732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604732 ] ASF GitHub Bot logged work on HIVE-25168: - Author: ASF GitHub Bot Created on: 01/Jun/21 19:37 Start Date: 01/Jun/21 19:37 Worklog Time Spent: 10m Work Description: kishendas commented on a change in pull request #2324: URL: https://github.com/apache/hive/pull/2324#discussion_r643423954 ## File path: storage-api/src/java/org/apache/hadoop/hive/common/MutableValidReaderWriteIdList.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.common; + +import com.google.common.base.Preconditions; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.BitSet; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; + +/** + * This class is a mutable version of {@link ValidReaderWriteIdList} for use by an external cache layer. + * To use this class, we need to always mark the writeId as open before to mark it as aborted/committed. + * This class is not thread safe. + */ +public class MutableValidReaderWriteIdList extends ValidReaderWriteIdList implements MutableValidWriteIdList { + private static final Logger LOG = LoggerFactory.getLogger(MutableValidReaderWriteIdList.class.getName()); + + public MutableValidReaderWriteIdList(ValidReaderWriteIdList writeIdList) { +super(writeIdList.writeToString()); +exceptions = new ArrayList<>(exceptions); + } + + @Override + public void addOpenWriteId(long writeId) { +if (writeId <= highWatermark) { + LOG.debug("Won't add any open write id because {} is less than or equal to high watermark: {}", Review comment: Please keep the log messages shorter, as they can occupy lot of storage space. More like ("not adding openWriteId: {} {}", writeId, highWatermark) ## File path: storage-api/src/java/org/apache/hadoop/hive/common/MutableValidReaderWriteIdList.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.common; + +import com.google.common.base.Preconditions; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.BitSet; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; + +/** + * This class is a mutable version of {@link ValidReaderWriteIdList} for use by an external cache layer. + * To use this class, we need to always mark the writeId as open before to mark it as aborted/committed. + * This class is not thread safe. Review comment: What is the implication of this class not being thread safe ? ## File path: storage-api/src/java/org/apache/hadoop/hive/common/MutableValidReaderWriteIdList.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License.
[jira] [Work logged] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats
[ https://issues.apache.org/jira/browse/HIVE-24987?focusedWorklogId=604717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604717 ] ASF GitHub Bot logged work on HIVE-24987: - Author: ASF GitHub Bot Created on: 01/Jun/21 19:18 Start Date: 01/Jun/21 19:18 Worklog Time Spent: 10m Work Description: vihangk1 commented on a change in pull request #2336: URL: https://github.com/apache/hive/pull/2336#discussion_r643416465 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DefaultIncompatibleTableChangeHandler.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.apache.hadoop.hive.metastore.api.InvalidOperationException; +import org.apache.hadoop.hive.metastore.api.SerDeInfo; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Default incompatible table change handler. This is invoked by the {@link + * HiveAlterHandler} when a table is altered to check if the column type changes if any + * are allowed or not. + */ +public class DefaultIncompatibleTableChangeHandler implements +IMetaStoreIncompatibleChangeHandler { + + private static final Logger LOG = LoggerFactory + .getLogger(DefaultIncompatibleTableChangeHandler.class); + private static final DefaultIncompatibleTableChangeHandler INSTANCE = + new DefaultIncompatibleTableChangeHandler(); + + private DefaultIncompatibleTableChangeHandler() { + } + + public static DefaultIncompatibleTableChangeHandler get() { +return INSTANCE; + } + + /** + * Checks if the column type changes in the oldTable and newTable are allowed or not. In + * addition to checking if the incompatible changes are allowed or not, this also checks + * if the table serde library belongs to a list of table serdes which support making any + * column type changes. + * + * @param conf The configuration which if incompatible col type changes are allowed + * or not. + * @param oldTable The instance of the table being altered. + * @param newTable The new instance of the table which represents the altered state of + * the table. + * @throws InvalidOperationException + */ + @Override + public void allowChange(Configuration conf, Table oldTable, Table newTable) + throws InvalidOperationException { +if (!MetastoreConf.getBoolVar(conf, +MetastoreConf.ConfVars.DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES)) { + // incompatible column changes are allowed for all + return; +} +if (oldTable.getTableType().equals(TableType.VIRTUAL_VIEW.toString())) { + // Views derive the column type from the base table definition. So the view + // definition can be altered to change the column types. The column type + // compatibility checks should be done only for non-views. + return; +} +checkColTypeChangeCompatible(conf, oldTable, newTable); + } + + private void checkColTypeChangeCompatible(Configuration conf, Table oldTable, Review comment: Yes, agreed. This was left here because of git conflicts. Thanks for spotting that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604717) Time Spent: 0.5h (was: 20m) > hive.metastore.disallow.incompatible.col.type.changes is too restrictive for > some storage formats > - > > Key: HIVE-24987 >
[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=604714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604714 ] ASF GitHub Bot logged work on HIVE-25185: - Author: ASF GitHub Bot Created on: 01/Jun/21 19:15 Start Date: 01/Jun/21 19:15 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2339: URL: https://github.com/apache/hive/pull/2339 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604714) Remaining Estimate: 0h Time Spent: 10m > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25185: -- Labels: pull-request-available (was: ) > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-25185: - > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions
[ https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narayanan Venkateswaran reassigned HIVE-25178: -- Assignee: Narayanan Venkateswaran > Reduce number of getPartition calls during loadDynamicPartitions > > > Key: HIVE-25178 > URL: https://issues.apache.org/jira/browse/HIVE-25178 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: Narayanan Venkateswaran >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When dynamic partitions are loaded, Hive::loadDynamicPartition loads all > partitions from HMS causing heavy load on it. This becomes worse when large > number of partitions are present in tables. > Only relevant partitions being loaded in dynamic partitions can be queried > from HMS for partition existence. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions
[ https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25178 started by Narayanan Venkateswaran. -- > Reduce number of getPartition calls during loadDynamicPartitions > > > Key: HIVE-25178 > URL: https://issues.apache.org/jira/browse/HIVE-25178 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: Narayanan Venkateswaran >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When dynamic partitions are loaded, Hive::loadDynamicPartition loads all > partitions from HMS causing heavy load on it. This becomes worse when large > number of partitions are present in tables. > Only relevant partitions being loaded in dynamic partitions can be queried > from HMS for partition existence. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats
[ https://issues.apache.org/jira/browse/HIVE-24987?focusedWorklogId=604713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604713 ] ASF GitHub Bot logged work on HIVE-24987: - Author: ASF GitHub Bot Created on: 01/Jun/21 19:14 Start Date: 01/Jun/21 19:14 Worklog Time Spent: 10m Work Description: yongzhi commented on a change in pull request #2336: URL: https://github.com/apache/hive/pull/2336#discussion_r643412805 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DefaultIncompatibleTableChangeHandler.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.apache.hadoop.hive.metastore.api.InvalidOperationException; +import org.apache.hadoop.hive.metastore.api.SerDeInfo; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Default incompatible table change handler. This is invoked by the {@link + * HiveAlterHandler} when a table is altered to check if the column type changes if any + * are allowed or not. + */ +public class DefaultIncompatibleTableChangeHandler implements +IMetaStoreIncompatibleChangeHandler { + + private static final Logger LOG = LoggerFactory + .getLogger(DefaultIncompatibleTableChangeHandler.class); + private static final DefaultIncompatibleTableChangeHandler INSTANCE = + new DefaultIncompatibleTableChangeHandler(); + + private DefaultIncompatibleTableChangeHandler() { + } + + public static DefaultIncompatibleTableChangeHandler get() { +return INSTANCE; + } + + /** + * Checks if the column type changes in the oldTable and newTable are allowed or not. In + * addition to checking if the incompatible changes are allowed or not, this also checks + * if the table serde library belongs to a list of table serdes which support making any + * column type changes. + * + * @param conf The configuration which if incompatible col type changes are allowed + * or not. + * @param oldTable The instance of the table being altered. + * @param newTable The new instance of the table which represents the altered state of + * the table. + * @throws InvalidOperationException + */ + @Override + public void allowChange(Configuration conf, Table oldTable, Table newTable) + throws InvalidOperationException { +if (!MetastoreConf.getBoolVar(conf, +MetastoreConf.ConfVars.DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES)) { + // incompatible column changes are allowed for all + return; +} +if (oldTable.getTableType().equals(TableType.VIRTUAL_VIEW.toString())) { + // Views derive the column type from the base table definition. So the view + // definition can be altered to change the column types. The column type + // compatibility checks should be done only for non-views. + return; +} +checkColTypeChangeCompatible(conf, oldTable, newTable); + } + + private void checkColTypeChangeCompatible(Configuration conf, Table oldTable, Review comment: Should we remove the checkColTypeChangeCompatible in HiveAlterHandler.java ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604713) Time Spent: 20m (was: 10m) > hive.metastore.disallow.incompatible.col.type.changes is too restrictive for > some storage formats > - > > Key: HIVE-24987 >
[jira] [Work logged] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions
[ https://issues.apache.org/jira/browse/HIVE-25178?focusedWorklogId=604709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604709 ] ASF GitHub Bot logged work on HIVE-25178: - Author: ASF GitHub Bot Created on: 01/Jun/21 19:03 Start Date: 01/Jun/21 19:03 Worklog Time Spent: 10m Work Description: vnhive opened a new pull request #2338: URL: https://github.com/apache/hive/pull/2338 …rtitions ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604709) Remaining Estimate: 0h Time Spent: 10m > Reduce number of getPartition calls during loadDynamicPartitions > > > Key: HIVE-25178 > URL: https://issues.apache.org/jira/browse/HIVE-25178 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > Time Spent: 10m > Remaining Estimate: 0h > > When dynamic partitions are loaded, Hive::loadDynamicPartition loads all > partitions from HMS causing heavy load on it. This becomes worse when large > number of partitions are present in tables. > Only relevant partitions being loaded in dynamic partitions can be queried > from HMS for partition existence. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions
[ https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25178: -- Labels: performance pull-request-available (was: performance) > Reduce number of getPartition calls during loadDynamicPartitions > > > Key: HIVE-25178 > URL: https://issues.apache.org/jira/browse/HIVE-25178 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When dynamic partitions are loaded, Hive::loadDynamicPartition loads all > partitions from HMS causing heavy load on it. This becomes worse when large > number of partitions are present in tables. > Only relevant partitions being loaded in dynamic partitions can be queried > from HMS for partition existence. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions
[ https://issues.apache.org/jira/browse/HIVE-25184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-25184: -- Description: Was recently troubleshooting an issue and noticed a NPE in the logs. I tracked it down to {{ReExecDriver}} code. The "afterExecute" code gets called if the Driver call succeed or fails. However, if there is a failure, the Driver is instructed to "clean up" by some internal try-catch and so there the afterExecute code fails with a NPE when it tried to read state out of the Driver class. [https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170] Move this afterExecute code into the try-catch block so it's only executed on success (and there is valid state within the Driver). I looked at the code a bit and it seems like the only listener that handles this afterExecute code assumes the state is always valid, so there is currently no way to pass it 'null' on a failure or 'state' on a success. > ReExecDriver Only Run afterExecute If No Exceptions > --- > > Key: HIVE-25184 > URL: https://issues.apache.org/jira/browse/HIVE-25184 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > Was recently troubleshooting an issue and noticed a NPE in the logs. I > tracked it down to {{ReExecDriver}} code. The "afterExecute" code gets > called if the Driver call succeed or fails. However, if there is a failure, > the Driver is instructed to "clean up" by some internal try-catch and so > there the afterExecute code fails with a NPE when it tried to read state out > of the Driver class. > > [https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170] > > Move this afterExecute code into the try-catch block so it's only executed on > success (and there is valid state within the Driver). I looked at the code a > bit and it seems like the only listener that handles this afterExecute code > assumes the state is always valid, so there is currently no way to pass it > 'null' on a failure or 'state' on a success. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions
[ https://issues.apache.org/jira/browse/HIVE-25184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25184: -- Labels: pull-request-available (was: ) > ReExecDriver Only Run afterExecute If No Exceptions > --- > > Key: HIVE-25184 > URL: https://issues.apache.org/jira/browse/HIVE-25184 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Was recently troubleshooting an issue and noticed a NPE in the logs. I > tracked it down to {{ReExecDriver}} code. The "afterExecute" code gets > called if the Driver call succeed or fails. However, if there is a failure, > the Driver is instructed to "clean up" by some internal try-catch and so > there the afterExecute code fails with a NPE when it tried to read state out > of the Driver class. > > [https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170] > > Move this afterExecute code into the try-catch block so it's only executed on > success (and there is valid state within the Driver). I looked at the code a > bit and it seems like the only listener that handles this afterExecute code > assumes the state is always valid, so there is currently no way to pass it > 'null' on a failure or 'state' on a success. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions
[ https://issues.apache.org/jira/browse/HIVE-25184?focusedWorklogId=604708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604708 ] ASF GitHub Bot logged work on HIVE-25184: - Author: ASF GitHub Bot Created on: 01/Jun/21 19:00 Start Date: 01/Jun/21 19:00 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2337: URL: https://github.com/apache/hive/pull/2337 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604708) Remaining Estimate: 0h Time Spent: 10m > ReExecDriver Only Run afterExecute If No Exceptions > --- > > Key: HIVE-25184 > URL: https://issues.apache.org/jira/browse/HIVE-25184 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Was recently troubleshooting an issue and noticed a NPE in the logs. I > tracked it down to {{ReExecDriver}} code. The "afterExecute" code gets > called if the Driver call succeed or fails. However, if there is a failure, > the Driver is instructed to "clean up" by some internal try-catch and so > there the afterExecute code fails with a NPE when it tried to read state out > of the Driver class. > > [https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170] > > Move this afterExecute code into the try-catch block so it's only executed on > success (and there is valid state within the Driver). I looked at the code a > bit and it seems like the only listener that handles this afterExecute code > assumes the state is always valid, so there is currently no way to pass it > 'null' on a failure or 'state' on a success. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions
[ https://issues.apache.org/jira/browse/HIVE-25184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-25184: - > ReExecDriver Only Run afterExecute If No Exceptions > --- > > Key: HIVE-25184 > URL: https://issues.apache.org/jira/browse/HIVE-25184 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604704 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 01/Jun/21 18:49 Start Date: 01/Jun/21 18:49 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r643398280 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -94,6 +98,8 @@ private BlockingQueue workQueue; private Thread[] workers; + private Set dbsBeingFailedOver; Review comment: Why do we need it at instance level? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604704) Time Spent: 1h 50m (was: 1h 40m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats
[ https://issues.apache.org/jira/browse/HIVE-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355286#comment-17355286 ] Vihang Karajgaonkar commented on HIVE-24987: Published a PR with the proposed change. > hive.metastore.disallow.incompatible.col.type.changes is too restrictive for > some storage formats > - > > Key: HIVE-24987 > URL: https://issues.apache.org/jira/browse/HIVE-24987 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently when {{hive.metastore.disallow.incompatible.col.type.changes}} is > set to true it disallows any schema changes which are deemed as backwards > incompatible e.g dropping a column of a table. While this may be a correct > thing to do for Parquet or Orc tables, it is too restrictive for storage > formats like Kudu. > Currently, for Kudu tables, Impala supports dropping a column. But if we set > this config to true metastore disallows changing the schema of the metastore > table. I am assuming this would be problematic for Iceberg tables too which > supports such schema changes. > The proposal is to have a new configuration which provided a exclusion list > of the table fileformat where this check will be skipped. Currently, we will > only include Kudu tables to skip this check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats
[ https://issues.apache.org/jira/browse/HIVE-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24987: -- Labels: pull-request-available (was: ) > hive.metastore.disallow.incompatible.col.type.changes is too restrictive for > some storage formats > - > > Key: HIVE-24987 > URL: https://issues.apache.org/jira/browse/HIVE-24987 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently when {{hive.metastore.disallow.incompatible.col.type.changes}} is > set to true it disallows any schema changes which are deemed as backwards > incompatible e.g dropping a column of a table. While this may be a correct > thing to do for Parquet or Orc tables, it is too restrictive for storage > formats like Kudu. > Currently, for Kudu tables, Impala supports dropping a column. But if we set > this config to true metastore disallows changing the schema of the metastore > table. I am assuming this would be problematic for Iceberg tables too which > supports such schema changes. > The proposal is to have a new configuration which provided a exclusion list > of the table fileformat where this check will be skipped. Currently, we will > only include Kudu tables to skip this check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats
[ https://issues.apache.org/jira/browse/HIVE-24987?focusedWorklogId=604694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604694 ] ASF GitHub Bot logged work on HIVE-24987: - Author: ASF GitHub Bot Created on: 01/Jun/21 18:36 Start Date: 01/Jun/21 18:36 Worklog Time Spent: 10m Work Description: vihangk1 opened a new pull request #2336: URL: https://github.com/apache/hive/pull/2336 ### What changes were proposed in this pull request? hive.metastore.disallow.incompatible.col.type.check config currently checks if a alter table operation is making a incompatible schema change to the table. By default it is set to true which would error out the alter table call when such a change is detected. However, this change is too restrictive for certain file-formats like Kudu. In case of Kudu, it is allowed to drop a column which could result in the schema to be incompatible according to current implementation of this check. This causes a bad user-experience for Kudu users and there is no real work-around than to disable this check all together. Disabling the check is not an option since for file-formats like Parquet this is should be true to avoid data corruption/incorrect results. This change introduces a new config which can be used by users to provide a exception list based on table serde library name. If a table belongs to such a serde, the check is skipped. By default currently, only Kudu tables are added to this config. ### Why are the changes needed? See above. ### Does this PR introduce _any_ user-facing change? This introduces a new configuration option for metastore. ### How was this patch tested? A new unit-test was added to exercise the specific use-case. Existing tests make sure that previous behavior is not changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604694) Remaining Estimate: 0h Time Spent: 10m > hive.metastore.disallow.incompatible.col.type.changes is too restrictive for > some storage formats > - > > Key: HIVE-24987 > URL: https://issues.apache.org/jira/browse/HIVE-24987 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently when {{hive.metastore.disallow.incompatible.col.type.changes}} is > set to true it disallows any schema changes which are deemed as backwards > incompatible e.g dropping a column of a table. While this may be a correct > thing to do for Parquet or Orc tables, it is too restrictive for storage > formats like Kudu. > Currently, for Kudu tables, Impala supports dropping a column. But if we set > this config to true metastore disallows changing the schema of the metastore > table. I am assuming this would be problematic for Iceberg tables too which > supports such schema changes. > The proposal is to have a new configuration which provided a exclusion list > of the table fileformat where this check will be skipped. Currently, we will > only include Kudu tables to skip this check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25183) Parsing error for Correlated Inner Joins
[ https://issues.apache.org/jira/browse/HIVE-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Soumyakanti Das updated HIVE-25183: --- Description: The issue is similar to HIVE-25090 (was: The issue is similar to [link HIVE-25090|https://issues.apache.org/jira/browse/HIVE-25090]) > Parsing error for Correlated Inner Joins > > > Key: HIVE-25183 > URL: https://issues.apache.org/jira/browse/HIVE-25183 > Project: Hive > Issue Type: Sub-task > Components: Parser >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > > The issue is similar to HIVE-25090 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25183) Parsing error for Correlated Inner Joins
[ https://issues.apache.org/jira/browse/HIVE-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Soumyakanti Das reassigned HIVE-25183: -- > Parsing error for Correlated Inner Joins > > > Key: HIVE-25183 > URL: https://issues.apache.org/jira/browse/HIVE-25183 > Project: Hive > Issue Type: Sub-task > Components: Parser >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > > The issue is similar to [link > HIVE-25090|https://issues.apache.org/jira/browse/HIVE-25090] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378
[ https://issues.apache.org/jira/browse/HIVE-25150?focusedWorklogId=604635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604635 ] ASF GitHub Bot logged work on HIVE-25150: - Author: ASF GitHub Bot Created on: 01/Jun/21 17:15 Start Date: 01/Jun/21 17:15 Worklog Time Spent: 10m Work Description: tarak271 commented on a change in pull request #2308: URL: https://github.com/apache/hive/pull/2308#discussion_r643326657 ## File path: storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java ## @@ -273,7 +269,8 @@ public static boolean fastSetFromBytes(byte[] bytes, int offset, int length, boo int index = offset; if (trimBlanks) { - while (bytes[index] == BYTE_BLANK) { + //Character.isWhitespace handles both space and tab character Review comment: @maheshk114 Added a new function to validate more characters supported by Mysql, postgres like HORIZONTAL_TABULATION, VERTICAL_TABULATION, FORM_FEED & SPACE_SEPARATOR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604635) Time Spent: 50m (was: 40m) > Tab characters are not removed before decimal conversion similar to space > character which is fixed as part of HIVE-24378 > > > Key: HIVE-25150 > URL: https://issues.apache.org/jira/browse/HIVE-25150 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Taraka Rama Rao Lethavadla >Assignee: Taraka Rama Rao Lethavadla >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Test case: > column values with space and tab character > {noformat} > bash-4.2$ cat data/files/test_dec_space.csv > 1,0 > 2, 1 > 3,2{noformat} > {noformat} > create external table test_dec_space (id int, value decimal) ROW FORMAT > DELIMITED > FIELDS TERMINATED BY ',' location '/tmp/test_dec_space'; > {noformat} > output of select * from test_dec_space would be > {noformat} > 1 0 > 2 1 > 3 NULL{noformat} > The behaviour in MySQL when there is tab & space characters in decimal values > {noformat} > bash-4.2$ cat /tmp/insert.csv > "1","aa",11.88 > "2","bb", 99.88 > "4","dd", 209.88{noformat} > > {noformat} > MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields > terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; > Query OK, 3 rows affected, 3 warnings (0.00 sec) > Records: 3 Deleted: 0 Skipped: 0 Warnings: 3 > MariaDB [test]> select * from t2; > +--+--+---+ > | id | name | score | > +--+--+---+ > | 1| aa |12 | > | 2| bb | 100 | > | 4| dd | 210 | > +--+--+---+ > 3 rows in set (0.00 sec) > {noformat} > So in hive also we can make it work by skipping tab character -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift
[ https://issues.apache.org/jira/browse/HIVE-25129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355217#comment-17355217 ] Jesus Camacho Rodriguez commented on HIVE-25129: It's been a while, but I assume if we do timezone shifting, e.g., we use the old write path, this may still occur. On the other hand, I think this would be fixed once we write timestamp as it is represented internally, i.e., in UTC. > Wrong results when timestamps stored in Avro/Parquet fall into the DST shift > > > Key: HIVE-25129 > URL: https://issues.apache.org/jira/browse/HIVE-25129 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: parquet_timestamp_dst.q > > > Timestamp values falling into the daylight savings time of the system > timezone cannot be retrieved as is when those are stored in Parquet/Avro > tables. The respective SELECT query shifts those timestamps by +1 reflecting > the DST shift. > +Example+ > {code:sql} > --! qt:timezone:US/Pacific > create table employee (eid int, birthdate timestamp) stored as parquet; > insert into employee values (0, '2019-03-10 02:00:00'); > insert into employee values (1, '2020-03-08 02:00:00'); > insert into employee values (2, '2021-03-14 02:00:00'); > select eid, birthdate from employee order by eid;{code} > +Actual results+ > |0|2019-03-10 03:00:00| > |1|2020-03-08 03:00:00| > |2|2021-03-14 03:00:00| > +Expected results+ > |0|2019-03-10 02:00:00| > |1|2020-03-08 02:00:00| > |2|2021-03-14 02:00:00| > Storing and retrieving values in columns using the [timestamp data > type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types] > (equivalent with LocalDateTime java API) should not alter at any way the > value that the user is seeing. The results are correct for {{TEXTFILE}} and > {{ORC}} tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24875) Unify InetAddress.getLocalHost()
[ https://issues.apache.org/jira/browse/HIVE-24875?focusedWorklogId=604570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604570 ] ASF GitHub Bot logged work on HIVE-24875: - Author: ASF GitHub Bot Created on: 01/Jun/21 15:48 Start Date: 01/Jun/21 15:48 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #2314: URL: https://github.com/apache/hive/pull/2314#discussion_r643230520 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/InetUtils.java ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.metastore.utils; + +import java.net.InetAddress; +import java.net.UnknownHostException; +import java.util.Objects; +import java.util.Optional; + +/** + * Utility functions around the Java InetAddress class. + */ +public class InetUtils { + + /** + * @return name of current host + */ + public static String hostname() { +return hostname(Optional.empty()); + } + + /** + * @return name of current host + */ + public static String hostname(Optional defaultValue) { Review comment: @kgyrtkirk Thanks so much for looking at this and thanks for pointing out some silly mistakes on my part. I must not have had my head on straight that day. I think we make a functional change as you have proposed in a stand-alone ticket. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604570) Time Spent: 40m (was: 0.5h) > Unify InetAddress.getLocalHost() > > > Key: HIVE-24875 > URL: https://issues.apache.org/jira/browse/HIVE-24875 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Priority: Minor > Labels: newbie, noob, pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Lots of calls in the Hive code to {{InetAddress.getLocalHost()}}. This > should be standardized onto hive-common {{ServerUtils.hostname()}}, which > includes removing (deprecating) a similar method in {{HiveStringUtils}}. > Open to anyone to improve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604561 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 15:23 Start Date: 01/Jun/21 15:23 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r643208790 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap } } - private static PartitionSpec spec(Schema schema, Properties properties, + private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties, org.apache.hadoop.hive.metastore.api.Table hmsTable) { -if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) { +if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname)) Review comment: I'll create a new `IcebergSessionUtil` for providing some util methods, we can sync up with @lcspinter on the implementation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604561) Time Spent: 2h 50m (was: 2h 40m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25173) Fix build failure of hive-pre-upgrade due to missing dependency on pentaho-aggdesigner-algorithm
[ https://issues.apache.org/jira/browse/HIVE-25173?focusedWorklogId=604555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604555 ] ASF GitHub Bot logged work on HIVE-25173: - Author: ASF GitHub Bot Created on: 01/Jun/21 15:12 Start Date: 01/Jun/21 15:12 Worklog Time Spent: 10m Work Description: iwasakims commented on pull request #2326: URL: https://github.com/apache/hive/pull/2326#issuecomment-852206365 @kgyrtkirk I think precommit test failures are not related to the patch. Could you review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604555) Time Spent: 0.5h (was: 20m) > Fix build failure of hive-pre-upgrade due to missing dependency on > pentaho-aggdesigner-algorithm > > > Key: HIVE-25173 > URL: https://issues.apache.org/jira/browse/HIVE-25173 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {noformat} > [ERROR] Failed to execute goal on project hive-pre-upgrade: Could not resolve > dependencies for project org.apache.hive:hive-pre-upgrade:jar:4.0.0-SNAPSHOT: > Failure to find org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in > https://repo.maven.apache.org/maven2 was cached in the local repository, > resolution will not be reattempted until the update interval of central has > elapsed or updates are forced > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25161) Implement CTAS for partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod resolved HIVE-25161. --- Resolution: Fixed > Implement CTAS for partitioned Iceberg tables > - > > Key: HIVE-25161 > URL: https://issues.apache.org/jira/browse/HIVE-25161 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=604524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604524 ] ASF GitHub Bot logged work on HIVE-25161: - Author: ASF GitHub Bot Created on: 01/Jun/21 14:15 Start Date: 01/Jun/21 14:15 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #2316: URL: https://github.com/apache/hive/pull/2316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604524) Time Spent: 7.5h (was: 7h 20m) > Implement CTAS for partitioned Iceberg tables > - > > Key: HIVE-25161 > URL: https://issues.apache.org/jira/browse/HIVE-25161 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604522 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 14:12 Start Date: 01/Jun/21 14:12 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r643139820 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13455,6 +13469,18 @@ ASTNode analyzeCreateTable( } } +if (partitionTransformSpecExists) { + try { +HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler()); Review comment: Can we make this check before, at line 13409? That way we could skip parsing the partition spec unnecessarily first and wouldn't need the boolean -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604522) Time Spent: 2h 40m (was: 2.5h) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604520 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 14:11 Start Date: 01/Jun/21 14:11 Worklog Time Spent: 10m Work Description: marton-bod commented on pull request #2333: URL: https://github.com/apache/hive/pull/2333#issuecomment-852158270 Looks great @lcspinter! Just a few questions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604520) Time Spent: 2.5h (was: 2h 20m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604519 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 14:10 Start Date: 01/Jun/21 14:10 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r643139820 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13455,6 +13469,18 @@ ASTNode analyzeCreateTable( } } +if (partitionTransformSpecExists) { + try { +HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler()); Review comment: Can we make this check before, at line 13409? That way we could skip parsing the partition spec unnecessarily first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604519) Time Spent: 2h 20m (was: 2h 10m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604517 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 14:07 Start Date: 01/Jun/21 14:07 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r643137193 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +public class PartitionTransform { + + private static final Map TRANSFORMS = Stream + .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY }, + { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH }, + { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR }, + { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } }) + .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1])); + + /** + * Parse the partition transform specifications from the AST Tree node. + * @param node AST Tree node, must be not null + * @return list of partition transforms + */ + public static List getPartitionTransformSpec(ASTNode node) { +List partSpecList = new ArrayList<>(); +for (int i = 0; i < node.getChildCount(); i++) { + PartitionTransformSpec spec = new PartitionTransformSpec(); + ASTNode child = (ASTNode) node.getChild(i); + for (int j = 0; j < child.getChildCount(); j++) { +ASTNode grandChild = (ASTNode) child.getChild(j); +switch (grandChild.getToken().getType()) { +case HiveParser.TOK_IDENTITY: Review comment: nit: indentation missing after switch clause -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604517) Time Spent: 2h 10m (was: 2h) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604514 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 14:06 Start Date: 01/Jun/21 14:06 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r643135880 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +public class PartitionTransform { + + private static final Map TRANSFORMS = Stream + .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY }, + { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH }, + { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR }, + { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } }) + .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1])); + + /** + * Parse the partition transform specifications from the AST Tree node. + * @param node AST Tree node, must be not null + * @return list of partition transforms + */ + public static List getPartitionTransformSpec(ASTNode node) { +List partSpecList = new ArrayList<>(); +for (int i = 0; i < node.getChildCount(); i++) { + PartitionTransformSpec spec = new PartitionTransformSpec(); + ASTNode child = (ASTNode) node.getChild(i); + for (int j = 0; j < child.getChildCount(); j++) { +ASTNode grandChild = (ASTNode) child.getChild(j); +switch (grandChild.getToken().getType()) { +case HiveParser.TOK_IDENTITY: +case HiveParser.TOK_YEAR: +case HiveParser.TOK_MONTH: +case HiveParser.TOK_DAY: +case HiveParser.TOK_HOUR: + spec.transformType = TRANSFORMS.get(grandChild.getToken().getType()); + break; +case HiveParser.TOK_TRUNCATE: +case HiveParser.TOK_BUCKET: + spec.transformType = TRANSFORMS.get(grandChild.getToken().getType()); + spec.transformParam = Integer.valueOf(grandChild.getChild(0).getText()); + break; +default: + spec.name = grandChild.getText(); +} + } + partSpecList.add(spec); +} + +return partSpecList; + } + + public enum TransformTypes { +IDENTITY, YEAR, MONTH, DAY, HOUR, TRUNCATE, BUCKET + } + + public static class PartitionTransformSpec { +public String name; +public TransformTypes transformType; +public int transformParam; Review comment: Should we make this Integer or Optional? This would have a 0 value by default for those spec types too that don't support any params, which can be misleading. Although granted, 0 doesn't make much sense for either bucketing or truncate, but maybe semantically it would be a bit better to make this nullable/empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604514) Time Spent: 1h 50m (was: 1h 40m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available >
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604516 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 14:06 Start Date: 01/Jun/21 14:06 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r643135880 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +public class PartitionTransform { + + private static final Map TRANSFORMS = Stream + .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY }, + { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH }, + { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR }, + { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } }) + .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1])); + + /** + * Parse the partition transform specifications from the AST Tree node. + * @param node AST Tree node, must be not null + * @return list of partition transforms + */ + public static List getPartitionTransformSpec(ASTNode node) { +List partSpecList = new ArrayList<>(); +for (int i = 0; i < node.getChildCount(); i++) { + PartitionTransformSpec spec = new PartitionTransformSpec(); + ASTNode child = (ASTNode) node.getChild(i); + for (int j = 0; j < child.getChildCount(); j++) { +ASTNode grandChild = (ASTNode) child.getChild(j); +switch (grandChild.getToken().getType()) { +case HiveParser.TOK_IDENTITY: +case HiveParser.TOK_YEAR: +case HiveParser.TOK_MONTH: +case HiveParser.TOK_DAY: +case HiveParser.TOK_HOUR: + spec.transformType = TRANSFORMS.get(grandChild.getToken().getType()); + break; +case HiveParser.TOK_TRUNCATE: +case HiveParser.TOK_BUCKET: + spec.transformType = TRANSFORMS.get(grandChild.getToken().getType()); + spec.transformParam = Integer.valueOf(grandChild.getChild(0).getText()); + break; +default: + spec.name = grandChild.getText(); +} + } + partSpecList.add(spec); +} + +return partSpecList; + } + + public enum TransformTypes { +IDENTITY, YEAR, MONTH, DAY, HOUR, TRUNCATE, BUCKET + } + + public static class PartitionTransformSpec { +public String name; +public TransformTypes transformType; +public int transformParam; Review comment: Should we make this Integer or Optional< Integer >? This would have a 0 value by default for those spec types too that don't support any params, which can be misleading. Although granted, 0 doesn't make much sense for either bucketing or truncate, but maybe semantically it would be a bit better to make this nullable/empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604516) Time Spent: 2h (was: 1h 50m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels:
[jira] [Work logged] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location
[ https://issues.apache.org/jira/browse/HIVE-24920?focusedWorklogId=604499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604499 ] ASF GitHub Bot logged work on HIVE-24920: - Author: ASF GitHub Bot Created on: 01/Jun/21 13:50 Start Date: 01/Jun/21 13:50 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #2191: URL: https://github.com/apache/hive/pull/2191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604499) Time Spent: 1h 20m (was: 1h 10m) > TRANSLATED_TO_EXTERNAL tables may write to the same location > > > Key: HIVE-24920 > URL: https://issues.apache.org/jira/browse/HIVE-24920 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > {code} > create table t (a integer); > insert into t values(1); > alter table t rename to t2; > create table t (a integer); -- I expected an exception from this command > (location already exists) but because its an external table no exception > insert into t values(2); > select * from t; -- shows 1 and 2 > drop table t2;-- wipes out data location > select * from t; -- empty resultset > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location
[ https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24920. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Naveen for reviewing the changes! > TRANSLATED_TO_EXTERNAL tables may write to the same location > > > Key: HIVE-24920 > URL: https://issues.apache.org/jira/browse/HIVE-24920 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {code} > create table t (a integer); > insert into t values(1); > alter table t rename to t2; > create table t (a integer); -- I expected an exception from this command > (location already exists) but because its an external table no exception > insert into t values(2); > select * from t; -- shows 1 and 2 > drop table t2;-- wipes out data location > select * from t; -- empty resultset > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.
[ https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604452 ] ASF GitHub Bot logged work on HIVE-24663: - Author: ASF GitHub Bot Created on: 01/Jun/21 11:44 Start Date: 01/Jun/21 11:44 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #2266: URL: https://github.com/apache/hive/pull/2266#discussion_r643019982 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5394,6 +5410,476 @@ public void countOpenTxns() throws MetaException { } } + private void cleanOldStatsFromPartColStatTable(Map statsPartInfoMap, + Connection dbConn) throws SQLException { +PreparedStatement preparedStatement = null; +int numRows = 0; +int maxNumRows = MetastoreConf.getIntVar(conf, ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE); + +// Index is present on DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME. use that. +// TODO : Need to add catalog name to the index +String delete = "DELETE FROM \"PART_COL_STATS\" where \"DB_NAME\" = ? AND " ++ "\"TABLE_NAME\" = ? AND \"COLUMN_NAME\" = ? AND \"PARTITION_NAME\" = ? " ++ "AND \"PART_ID\" = ?"; + +try { + preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, delete, null); Review comment: removed the use ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5394,6 +5410,476 @@ public void countOpenTxns() throws MetaException { } } + private void cleanOldStatsFromPartColStatTable(Map statsPartInfoMap, + Connection dbConn) throws SQLException { +PreparedStatement preparedStatement = null; +int numRows = 0; +int maxNumRows = MetastoreConf.getIntVar(conf, ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE); + +// Index is present on DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME. use that. +// TODO : Need to add catalog name to the index +String delete = "DELETE FROM \"PART_COL_STATS\" where \"DB_NAME\" = ? AND " ++ "\"TABLE_NAME\" = ? AND \"COLUMN_NAME\" = ? AND \"PARTITION_NAME\" = ? " ++ "AND \"PART_ID\" = ?"; + +try { + preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, delete, null); + for (Map.Entry entry : statsPartInfoMap.entrySet()) { +ColumnStatistics colStats = (ColumnStatistics) entry.getValue(); +PartitionInfo partitionInfo = (PartitionInfo) entry.getKey(); +for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) { + preparedStatement.setString(1, colStats.getStatsDesc().getDbName()); + preparedStatement.setString(2, colStats.getStatsDesc().getTableName()); + preparedStatement.setString(3, statisticsObj.getColName()); + preparedStatement.setString(4, colStats.getStatsDesc().getPartName()); + preparedStatement.setLong(5, partitionInfo.partitionId); + numRows++; + preparedStatement.addBatch(); + if (numRows == maxNumRows) { +preparedStatement.executeBatch(); +numRows = 0; +LOG.debug("Executed delete " + delete + " for numRows " + numRows); + } +} + } + + if (numRows != 0) { +preparedStatement.executeBatch(); +LOG.debug("Executed delete " + delete + " for numRows " + numRows); + } +} finally { + closeStmt(preparedStatement); +} + } + + private void insertIntoPartColStatTable(Map partitionInfoMap, + long maxCsId, + Connection dbConn) throws SQLException, MetaException, NoSuchObjectException { +PreparedStatement preparedStatement = null; +int numRows = 0; +int maxNumRows = MetastoreConf.getIntVar(conf, ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE); +String insert = "INSERT INTO \"PART_COL_STATS\" (\"CS_ID\", \"CAT_NAME\", \"DB_NAME\"," ++ "\"TABLE_NAME\", \"PARTITION_NAME\", \"COLUMN_NAME\", \"COLUMN_TYPE\", \"PART_ID\"," ++ " \"LONG_LOW_VALUE\", \"LONG_HIGH_VALUE\", \"DOUBLE_HIGH_VALUE\", \"DOUBLE_LOW_VALUE\"," ++ " \"BIG_DECIMAL_LOW_VALUE\", \"BIG_DECIMAL_HIGH_VALUE\", \"NUM_NULLS\", \"NUM_DISTINCTS\", \"BIT_VECTOR\" ," ++ " \"AVG_COL_LEN\", \"MAX_COL_LEN\", \"NUM_TRUES\", \"NUM_FALSES\", \"LAST_ANALYZED\", \"ENGINE\") values " ++ "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"; + +try { + preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, insert, null); + for (Map.Entry entry :
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604437 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 11:15 Start Date: 01/Jun/21 11:15 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r643008410 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap } } - private static PartitionSpec spec(Schema schema, Properties properties, + private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties, org.apache.hadoop.hive.metastore.api.Table hmsTable) { -if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) { +if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname)) Review comment: + 1 I'm working with the QueryState too on a different jira, and we would definitely benefit from some util methods to simplify these operations. ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap } } - private static PartitionSpec spec(Schema schema, Properties properties, + private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties, org.apache.hadoop.hive.metastore.api.Table hmsTable) { -if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) { +if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname)) Review comment: +1 I'm working with the QueryState too on a different jira, and we would definitely benefit from some util methods to simplify these operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604437) Time Spent: 1h 40m (was: 1.5h) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604432 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 10:55 Start Date: 01/Jun/21 10:55 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r642996447 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -3010,3 +3010,5 @@ const string TABLE_BUCKETING_VERSION = "bucketing_version", const string DRUID_CONFIG_PREFIX = "druid.", const string JDBC_CONFIG_PREFIX = "hive.sql.", const string TABLE_IS_CTAS = "created_with_ctas", +const string PARTITION_TRANSFER_SPEC = "partition_transfer_spec", Review comment: how come this is partition_transfer_spec, unlike elsewhere? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604432) Time Spent: 1.5h (was: 1h 20m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604424 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 10:07 Start Date: 01/Jun/21 10:07 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r642966076 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java ## @@ -155,6 +155,36 @@ public void after() throws Exception { HiveIcebergStorageHandlerTestUtils.close(shell); } + @Test + public void testPartitionTransform() { +Schema schema = new Schema( +optional(1, "id", Types.LongType.get()), +optional(2, "year_field", Types.DateType.get()), +optional(3, "month_field", Types.TimestampType.withZone()), +optional(4, "day_field", Types.TimestampType.withoutZone()), +optional(5, "hour_field", Types.TimestampType.withoutZone()), +optional(6, "truncate_field", Types.StringType.get()), +optional(7, "bucket_field", Types.StringType.get()), +optional(8, "identity_field", Types.StringType.get()) +); +PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field") +.hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2) +.identity("identity_field").build(); +String tableName = "part_test"; + +TableIdentifier identifier = TableIdentifier.of("default", tableName); +shell.executeStatement("CREATE EXTERNAL TABLE " + identifier + +" PARTITIONED BY SPEC (year_field year, month_field month, day_field day, hour_field hour, " + +"truncate_field truncate[2], bucket_field bucket[2], identity_field identity)" + +" STORED BY '" + HiveIcebergStorageHandler.class.getName() + "' " + +testTables.locationForCreateTableSQL(identifier) + +"TBLPROPERTIES ('" + InputFormatConfig.TABLE_SCHEMA + "'='" + +SchemaParser.toJson(schema) + "', " + +"'" + InputFormatConfig.CATALOG_NAME + "'='" + Catalogs.ICEBERG_DEFAULT_CATALOG_NAME + "')"); Review comment: Do we need this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604424) Time Spent: 1h 20m (was: 1h 10m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604422 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 10:06 Start Date: 01/Jun/21 10:06 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r642965021 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java ## @@ -155,6 +155,36 @@ public void after() throws Exception { HiveIcebergStorageHandlerTestUtils.close(shell); } + @Test + public void testPartitionTransform() { +Schema schema = new Schema( +optional(1, "id", Types.LongType.get()), +optional(2, "year_field", Types.DateType.get()), +optional(3, "month_field", Types.TimestampType.withZone()), +optional(4, "day_field", Types.TimestampType.withoutZone()), +optional(5, "hour_field", Types.TimestampType.withoutZone()), +optional(6, "truncate_field", Types.StringType.get()), +optional(7, "bucket_field", Types.StringType.get()), +optional(8, "identity_field", Types.StringType.get()) +); +PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field") +.hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2) +.identity("identity_field").build(); +String tableName = "part_test"; Review comment: Do we need this, or it is enough to keep the `identifier` only? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604422) Time Spent: 1h 10m (was: 1h) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604420 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 10:05 Start Date: 01/Jun/21 10:05 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r642964505 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java ## @@ -155,6 +155,36 @@ public void after() throws Exception { HiveIcebergStorageHandlerTestUtils.close(shell); } + @Test + public void testPartitionTransform() { +Schema schema = new Schema( +optional(1, "id", Types.LongType.get()), +optional(2, "year_field", Types.DateType.get()), +optional(3, "month_field", Types.TimestampType.withZone()), +optional(4, "day_field", Types.TimestampType.withoutZone()), +optional(5, "hour_field", Types.TimestampType.withoutZone()), +optional(6, "truncate_field", Types.StringType.get()), +optional(7, "bucket_field", Types.StringType.get()), +optional(8, "identity_field", Types.StringType.get()) +); +PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field") Review comment: nit: newline before, and I think it is more readable if we break after every parameter, like: ``` PartitionSpec.builderFor(schema) .year("year_field") .month("month_field") .day("day_field") .hour("hour_field") .truncate("truncate_field", 2) .bucket("bucket_field", 2) .identity("identity_field") .build(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604420) Time Spent: 1h (was: 50m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604419 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 10:04 Start Date: 01/Jun/21 10:04 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r642963610 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java ## @@ -155,6 +155,36 @@ public void after() throws Exception { HiveIcebergStorageHandlerTestUtils.close(shell); } + @Test + public void testPartitionTransform() { +Schema schema = new Schema( +optional(1, "id", Types.LongType.get()), +optional(2, "year_field", Types.DateType.get()), +optional(3, "month_field", Types.TimestampType.withZone()), +optional(4, "day_field", Types.TimestampType.withoutZone()), +optional(5, "hour_field", Types.TimestampType.withoutZone()), +optional(6, "truncate_field", Types.StringType.get()), +optional(7, "bucket_field", Types.StringType.get()), +optional(8, "identity_field", Types.StringType.get()) +); +PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field") +.hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2) +.identity("identity_field").build(); +String tableName = "part_test"; + +TableIdentifier identifier = TableIdentifier.of("default", tableName); +shell.executeStatement("CREATE EXTERNAL TABLE " + identifier + +" PARTITIONED BY SPEC (year_field year, month_field month, day_field day, hour_field hour, " + +"truncate_field truncate[2], bucket_field bucket[2], identity_field identity)" + +" STORED BY '" + HiveIcebergStorageHandler.class.getName() + "' " + Review comment: We can use `STORED BY ICEBERG` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604419) Time Spent: 50m (was: 40m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604417 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 09:58 Start Date: 01/Jun/21 09:58 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r642960021 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap } } - private static PartitionSpec spec(Schema schema, Properties properties, + private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties, org.apache.hadoop.hive.metastore.api.Table hmsTable) { -if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) { +if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname)) Review comment: Maybe a util class to get Icebeerg objects from the `SessionState`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604417) Time Spent: 40m (was: 0.5h) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604414 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 09:54 Start Date: 01/Jun/21 09:54 Worklog Time Spent: 10m Work Description: szlta commented on a change in pull request #2333: URL: https://github.com/apache/hive/pull/2333#discussion_r642953396 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +public class PartitionTransform { + + private static final Map TRANSFORMS = Stream + .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY }, + { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH }, + { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR }, + { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } }) + .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1])); + + /** + * Parse the partition transform specifications from the AST Tree node. + * @param node AST Tree node, must be not null + * @return list of partition transforms + */ + public static List getPartitionTransformSpec(ASTNode node) { +List partSpecList = new ArrayList<>(); +for (int i = 0; i < node.getChildCount(); i++) { + PartitionTransformSpec spec = new PartitionTransformSpec(); + ASTNode child = (ASTNode) node.getChild(i); + for (int j = 0; j < child.getChildCount(); j++) { +ASTNode grandChild = (ASTNode) child.getChild(j); +switch (grandChild.getToken().getType()) { +case HiveParser.TOK_IDENTITY: +case HiveParser.TOK_YEAR: +case HiveParser.TOK_MONTH: +case HiveParser.TOK_DAY: +case HiveParser.TOK_HOUR: + spec.transformType = TRANSFORMS.get(grandChild.getToken().getType()); + break; +case HiveParser.TOK_TRUNCATE: +case HiveParser.TOK_BUCKET: + spec.transformType = TRANSFORMS.get(grandChild.getToken().getType()); + spec.transformParam = Integer.valueOf(grandChild.getChild(0).getText()); + break; +default: + spec.name = grandChild.getText(); +} + } + partSpecList.add(spec); +} + +return partSpecList; + } + + public enum TransformTypes { Review comment: I think it would be better to use the singular format: TransformType -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604414) Time Spent: 0.5h (was: 20m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604361 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 01/Jun/21 08:03 Start Date: 01/Jun/21 08:03 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r642873937 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -214,23 +219,28 @@ private void stopWorkers() { throws MetaException, NoSuchTxnException, NoSuchObjectException { if (isAnalyzeTableInProgress(fullTableName)) return null; String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = fullTableName.getTable(); +String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf); +if (!isDbTargetOfReplication.containsKey(dbName) || !isDbBeingFailedOver.containsKey(dbName)) { + Database database = rs.getDatabase(cat, db); + isDbTargetOfReplication.put(dbName, ReplUtils.isTargetOfReplication(database)); + isDbBeingFailedOver.put(dbName, MetaStoreUtils.isDbBeingFailedOver(database)); Review comment: Why do we need two separate maps, we don;t need the reason for skip, just tracking what to skip is fine no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604361) Time Spent: 1h 40m (was: 1.5h) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.
[ https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604359 ] ASF GitHub Bot logged work on HIVE-24663: - Author: ASF GitHub Bot Created on: 01/Jun/21 07:55 Start Date: 01/Jun/21 07:55 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2266: URL: https://github.com/apache/hive/pull/2266#discussion_r642867971 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5394,6 +5410,476 @@ public void countOpenTxns() throws MetaException { } } + private void cleanOldStatsFromPartColStatTable(Map statsPartInfoMap, + Connection dbConn) throws SQLException { +PreparedStatement preparedStatement = null; +int numRows = 0; +int maxNumRows = MetastoreConf.getIntVar(conf, ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE); + +// Index is present on DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME. use that. +// TODO : Need to add catalog name to the index +String delete = "DELETE FROM \"PART_COL_STATS\" where \"DB_NAME\" = ? AND " ++ "\"TABLE_NAME\" = ? AND \"COLUMN_NAME\" = ? AND \"PARTITION_NAME\" = ? " ++ "AND \"PART_ID\" = ?"; + +try { + preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, delete, null); + for (Map.Entry entry : statsPartInfoMap.entrySet()) { +ColumnStatistics colStats = (ColumnStatistics) entry.getValue(); +PartitionInfo partitionInfo = (PartitionInfo) entry.getKey(); +for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) { + preparedStatement.setString(1, colStats.getStatsDesc().getDbName()); + preparedStatement.setString(2, colStats.getStatsDesc().getTableName()); + preparedStatement.setString(3, statisticsObj.getColName()); + preparedStatement.setString(4, colStats.getStatsDesc().getPartName()); + preparedStatement.setLong(5, partitionInfo.partitionId); + numRows++; + preparedStatement.addBatch(); + if (numRows == maxNumRows) { +preparedStatement.executeBatch(); +numRows = 0; +LOG.debug("Executed delete " + delete + " for numRows " + numRows); + } +} + } + + if (numRows != 0) { +preparedStatement.executeBatch(); +LOG.debug("Executed delete " + delete + " for numRows " + numRows); + } +} finally { + closeStmt(preparedStatement); +} + } + + private void insertIntoPartColStatTable(Map partitionInfoMap, + long maxCsId, + Connection dbConn) throws SQLException, MetaException, NoSuchObjectException { +PreparedStatement preparedStatement = null; +int numRows = 0; +int maxNumRows = MetastoreConf.getIntVar(conf, ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE); +String insert = "INSERT INTO \"PART_COL_STATS\" (\"CS_ID\", \"CAT_NAME\", \"DB_NAME\"," ++ "\"TABLE_NAME\", \"PARTITION_NAME\", \"COLUMN_NAME\", \"COLUMN_TYPE\", \"PART_ID\"," ++ " \"LONG_LOW_VALUE\", \"LONG_HIGH_VALUE\", \"DOUBLE_HIGH_VALUE\", \"DOUBLE_LOW_VALUE\"," ++ " \"BIG_DECIMAL_LOW_VALUE\", \"BIG_DECIMAL_HIGH_VALUE\", \"NUM_NULLS\", \"NUM_DISTINCTS\", \"BIT_VECTOR\" ," ++ " \"AVG_COL_LEN\", \"MAX_COL_LEN\", \"NUM_TRUES\", \"NUM_FALSES\", \"LAST_ANALYZED\", \"ENGINE\") values " ++ "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"; + +try { + preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, insert, null); + for (Map.Entry entry : partitionInfoMap.entrySet()) { +ColumnStatistics colStats = (ColumnStatistics) entry.getValue(); +PartitionInfo partitionInfo = (PartitionInfo)entry.getKey(); +ColumnStatisticsDesc statsDesc = colStats.getStatsDesc(); +long partId = partitionInfo.partitionId; + +for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) { + MPartitionColumnStatistics mPartitionColumnStatistics = StatObjectConverter. + convertToMPartitionColumnStatistics(null, statsDesc, statisticsObj, colStats.getEngine()); + + preparedStatement.setLong(1, maxCsId); + preparedStatement.setString(2, mPartitionColumnStatistics.getCatName()); + preparedStatement.setString(3, mPartitionColumnStatistics.getDbName()); + preparedStatement.setString(4, mPartitionColumnStatistics.getTableName()); + preparedStatement.setString(5, mPartitionColumnStatistics.getPartitionName()); + preparedStatement.setString(6, mPartitionColumnStatistics.getColName()); +
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604358 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 01/Jun/21 07:54 Start Date: 01/Jun/21 07:54 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r642867437 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -84,6 +86,9 @@ private ConcurrentHashMap partsInProgress = new ConcurrentHashMap<>(); private AtomicInteger itemsInProgress = new AtomicInteger(0); + Map isDbTargetOfReplication = new HashMap<>(); + Map isDbBeingFailedOver = new HashMap<>(); Review comment: Why do you need it at instance level? When is the map getting cleaned? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604358) Time Spent: 1.5h (was: 1h 20m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.
[ https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604357 ] ASF GitHub Bot logged work on HIVE-24663: - Author: ASF GitHub Bot Created on: 01/Jun/21 07:51 Start Date: 01/Jun/21 07:51 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2266: URL: https://github.com/apache/hive/pull/2266#discussion_r642865407 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5394,6 +5410,476 @@ public void countOpenTxns() throws MetaException { } } + private void cleanOldStatsFromPartColStatTable(Map statsPartInfoMap, + Connection dbConn) throws SQLException { +PreparedStatement preparedStatement = null; +int numRows = 0; +int maxNumRows = MetastoreConf.getIntVar(conf, ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE); + +// Index is present on DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME. use that. +// TODO : Need to add catalog name to the index +String delete = "DELETE FROM \"PART_COL_STATS\" where \"DB_NAME\" = ? AND " ++ "\"TABLE_NAME\" = ? AND \"COLUMN_NAME\" = ? AND \"PARTITION_NAME\" = ? " ++ "AND \"PART_ID\" = ?"; + +try { + preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, delete, null); + for (Map.Entry entry : statsPartInfoMap.entrySet()) { +ColumnStatistics colStats = (ColumnStatistics) entry.getValue(); +PartitionInfo partitionInfo = (PartitionInfo) entry.getKey(); +for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) { + preparedStatement.setString(1, colStats.getStatsDesc().getDbName()); + preparedStatement.setString(2, colStats.getStatsDesc().getTableName()); + preparedStatement.setString(3, statisticsObj.getColName()); + preparedStatement.setString(4, colStats.getStatsDesc().getPartName()); + preparedStatement.setLong(5, partitionInfo.partitionId); + numRows++; + preparedStatement.addBatch(); + if (numRows == maxNumRows) { +preparedStatement.executeBatch(); +numRows = 0; +LOG.debug("Executed delete " + delete + " for numRows " + numRows); + } +} + } + + if (numRows != 0) { Review comment: it's more for readablility -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604357) Time Spent: 8h 20m (was: 8h 10m) > Reduce overhead of partition column stats updation. > --- > > Key: HIVE-24663 > URL: https://issues.apache.org/jira/browse/HIVE-24663 > Project: Hive > Issue Type: Sub-task >Reporter: Rajesh Balamohan >Assignee: mahesh kumar behera >Priority: Major > Labels: performance, pull-request-available > Time Spent: 8h 20m > Remaining Estimate: 0h > > When large number of partitions (>20K) are processed, ColStatsProcessor runs > into DB issues. > {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together > and in some cases postgres stops processing. > It would be good to introduce small batches for stats gathering in > ColStatsProcessor instead of bulk update. > Ref: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.
[ https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604355 ] ASF GitHub Bot logged work on HIVE-24663: - Author: ASF GitHub Bot Created on: 01/Jun/21 07:49 Start Date: 01/Jun/21 07:49 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2266: URL: https://github.com/apache/hive/pull/2266#discussion_r642863679 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -8994,10 +9028,15 @@ public boolean set_aggr_stats_for(SetPartitionsStatsRequest request) throws TExc colNames, newStatsMap, request); } else { // No merge. Table t = getTable(catName, dbName, tableName); -for (Map.Entry entry : newStatsMap.entrySet()) { - // We don't short-circuit on errors here anymore. That can leave acid stats invalid. - ret = updatePartitonColStatsInternal(t, entry.getValue(), - request.getValidWriteIdList(), request.getWriteId()) && ret; +// We don't short-circuit on errors here anymore. That can leave acid stats invalid. +if (newStatsMap.size() > 1) { + LOG.info("ETL_PERF started updatePartitionColStatsInBatch"); + ret = updatePartitionColStatsInBatch(t, newStatsMap, + request.getValidWriteIdList(), request.getWriteId()); + LOG.info("ETL_PERF done updatePartitionColStatsInBatch"); +} else { + ret = updatePartitonColStatsInternal(t, newStatsMap.values().iterator().next(), Review comment: in future we would need to support both implementations, why can't we generalize? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604355) Time Spent: 8h 10m (was: 8h) > Reduce overhead of partition column stats updation. > --- > > Key: HIVE-24663 > URL: https://issues.apache.org/jira/browse/HIVE-24663 > Project: Hive > Issue Type: Sub-task >Reporter: Rajesh Balamohan >Assignee: mahesh kumar behera >Priority: Major > Labels: performance, pull-request-available > Time Spent: 8h 10m > Remaining Estimate: 0h > > When large number of partitions (>20K) are processed, ColStatsProcessor runs > into DB issues. > {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together > and in some cases postgres stops processing. > It would be good to introduce small batches for stats gathering in > ColStatsProcessor instead of bulk update. > Ref: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.
[ https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604354 ] ASF GitHub Bot logged work on HIVE-24663: - Author: ASF GitHub Bot Created on: 01/Jun/21 07:47 Start Date: 01/Jun/21 07:47 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #2266: URL: https://github.com/apache/hive/pull/2266#issuecomment-851905628 > > NOTE: I don't think TxnHandler is a good place for this kind of functionality. TxnHandler is responsible for managing txn metadata ONLY!! Wouldn't MetaStoreDirectSql be more appropriate here? @pvary what do you think? > > The MetaStoreDirectSql does not have framework to handle adding notification logs within same transaction. That can cause issues in replication if the notification logs addition fails because of some reason. The MetaStoreDirectSql batch insert/update also not very performant. What I meant here is that TxnHandler shouldn't have colStats update logic, it's responsible for other functionality like Hive TXN management. HiveMetaStore can handle notification logs, what about it? Or if we can't find close by functionality service - we should create a new one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604354) Time Spent: 8h (was: 7h 50m) > Reduce overhead of partition column stats updation. > --- > > Key: HIVE-24663 > URL: https://issues.apache.org/jira/browse/HIVE-24663 > Project: Hive > Issue Type: Sub-task >Reporter: Rajesh Balamohan >Assignee: mahesh kumar behera >Priority: Major > Labels: performance, pull-request-available > Time Spent: 8h > Remaining Estimate: 0h > > When large number of partitions (>20K) are processed, ColStatsProcessor runs > into DB issues. > {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together > and in some cases postgres stops processing. > It would be good to introduce small batches for stats gathering in > ColStatsProcessor instead of bulk update. > Ref: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally
[ https://issues.apache.org/jira/browse/HIVE-24944?focusedWorklogId=604353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604353 ] ASF GitHub Bot logged work on HIVE-24944: - Author: ASF GitHub Bot Created on: 01/Jun/21 07:37 Start Date: 01/Jun/21 07:37 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2204: URL: https://github.com/apache/hive/pull/2204#issuecomment-851899226 Hey @thejasmn, @kgyrtkirk could you please take another look if have secs? thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604353) Time Spent: 0.5h (was: 20m) > When the default engine of the hiveserver is MR and the tez engine is set by > the client, the client TEZ progress log cannot be printed normally > --- > > Key: HIVE-24944 > URL: https://issues.apache.org/jira/browse/HIVE-24944 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 3.1.0, 4.0.0 >Reporter: ZhangQiDong >Assignee: ZhangQiDong >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24944.001.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > HiveServer configuration parameter execution default MR. When set > hive.execution.engine = tez, the client cannot print the TEZ log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25166) Query with multiple count(distinct constant) fails
[ https://issues.apache.org/jira/browse/HIVE-25166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-25166: -- Status: Patch Available (was: In Progress) > Query with multiple count(distinct constant) fails > -- > > Key: HIVE-25166 > URL: https://issues.apache.org/jira/browse/HIVE-25166 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > select count(distinct 0), count(distinct null) from alltypes; > {code} > {code} > org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not > in GROUP BY key 'TOK_NULL' > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) > at
[jira] [Work logged] (HIVE-25166) Query with multiple count(distinct constant) fails
[ https://issues.apache.org/jira/browse/HIVE-25166?focusedWorklogId=604330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604330 ] ASF GitHub Bot logged work on HIVE-25166: - Author: ASF GitHub Bot Created on: 01/Jun/21 06:29 Start Date: 01/Jun/21 06:29 Worklog Time Spent: 10m Work Description: kasakrisz closed pull request #2325: URL: https://github.com/apache/hive/pull/2325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604330) Time Spent: 0.5h (was: 20m) > Query with multiple count(distinct constant) fails > -- > > Key: HIVE-25166 > URL: https://issues.apache.org/jira/browse/HIVE-25166 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > select count(distinct 0), count(distinct null) from alltypes; > {code} > {code} > org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not > in GROUP BY key 'TOK_NULL' > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at >
[jira] [Work logged] (HIVE-25166) Query with multiple count(distinct constant) fails
[ https://issues.apache.org/jira/browse/HIVE-25166?focusedWorklogId=604331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604331 ] ASF GitHub Bot logged work on HIVE-25166: - Author: ASF GitHub Bot Created on: 01/Jun/21 06:29 Start Date: 01/Jun/21 06:29 Worklog Time Spent: 10m Work Description: kasakrisz commented on pull request #2325: URL: https://github.com/apache/hive/pull/2325#issuecomment-851857576 This is not the right approach now because it adds and extra column to shuffle which is not always necessary. Closing this. See #2334 for fix the original issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604331) Time Spent: 40m (was: 0.5h) > Query with multiple count(distinct constant) fails > -- > > Key: HIVE-25166 > URL: https://issues.apache.org/jira/browse/HIVE-25166 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > select count(distinct 0), count(distinct null) from alltypes; > {code} > {code} > org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not > in GROUP BY key 'TOK_NULL' > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at
[jira] [Commented] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354841#comment-17354841 ] Wei Zhang commented on HIVE-25170: -- I've fixated an patch for this issue in github. Can somebody help review the code? Thanks! > Data error in constant propagation caused by wrong colExprMap generated in > SemanticAnalyzer > --- > > Key: HIVE-25170 > URL: https://issues.apache.org/jira/browse/HIVE-25170 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.2 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > > {code:java} > SET hive.remove.orderby.in.subquery=false; > EXPLAIN > SELECT constant_col, key, max(value) > FROM > ( > SELECT 'constant' as constant_col, key, value > FROM src > DISTRIBUTE BY constant_col, key > SORT BY constant_col, key, value > ) a > GROUP BY constant_col, key > LIMIT 10; > OK > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0 > Fetch Operator > limit:10 > Stage-1 > Reducer 3 > File Output Operator [FS_10] > Limit [LIM_9] (rows=1 width=368) > Number of rows:10 > Select Operator [SEL_8] (rows=1 width=368) > Output:["_col0","_col1","_col2"] > Group By Operator [GBY_7] (rows=1 width=368) > > Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', > 'constant' > <-Reducer 2 [SIMPLE_EDGE] > SHUFFLE [RS_6] > PartitionCols:'constant', 'constant' > Group By Operator [GBY_5] (rows=1 width=368) > > Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', > 'constant' > Select Operator [SEL_3] (rows=500 width=178) > Output:["_col2"] > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_2] > PartitionCols:'constant', _col1 > Select Operator [SEL_1] (rows=500 width=178) > Output:["_col1","_col2"] > TableScan [TS_0] (rows=500 width=10) > > src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code} > Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', > 'constant', it should be 'constant', _col1 > > That's because after HIVE-13808, SemanticAnalyzer uses sortCols to generate > the colExprMap structure in the key part, while the key columns are generated > by newSortCols, leading to a column and expr mismatch when the constant > column is not the trailing column in the key columns. > Constant propagation optimizer uses this colExprMap and finds extra const > expression in the mismatched map, resulting in this error. > > In fact, colExprMap is used by multiple optimizers, which makes this quite a > serious problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zhang updated HIVE-25170: - Status: Patch Available (was: Open) > Data error in constant propagation caused by wrong colExprMap generated in > SemanticAnalyzer > --- > > Key: HIVE-25170 > URL: https://issues.apache.org/jira/browse/HIVE-25170 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.2 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > > {code:java} > SET hive.remove.orderby.in.subquery=false; > EXPLAIN > SELECT constant_col, key, max(value) > FROM > ( > SELECT 'constant' as constant_col, key, value > FROM src > DISTRIBUTE BY constant_col, key > SORT BY constant_col, key, value > ) a > GROUP BY constant_col, key > LIMIT 10; > OK > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0 > Fetch Operator > limit:10 > Stage-1 > Reducer 3 > File Output Operator [FS_10] > Limit [LIM_9] (rows=1 width=368) > Number of rows:10 > Select Operator [SEL_8] (rows=1 width=368) > Output:["_col0","_col1","_col2"] > Group By Operator [GBY_7] (rows=1 width=368) > > Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', > 'constant' > <-Reducer 2 [SIMPLE_EDGE] > SHUFFLE [RS_6] > PartitionCols:'constant', 'constant' > Group By Operator [GBY_5] (rows=1 width=368) > > Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', > 'constant' > Select Operator [SEL_3] (rows=500 width=178) > Output:["_col2"] > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_2] > PartitionCols:'constant', _col1 > Select Operator [SEL_1] (rows=500 width=178) > Output:["_col1","_col2"] > TableScan [TS_0] (rows=500 width=10) > > src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code} > Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', > 'constant', it should be 'constant', _col1 > > That's because after HIVE-13808, SemanticAnalyzer uses sortCols to generate > the colExprMap structure in the key part, while the key columns are generated > by newSortCols, leading to a column and expr mismatch when the constant > column is not the trailing column in the key columns. > Constant propagation optimizer uses this colExprMap and finds extra const > expression in the mismatched map, resulting in this error. > > In fact, colExprMap is used by multiple optimizers, which makes this quite a > serious problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table
[ https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604323 ] ASF GitHub Bot logged work on HIVE-25179: - Author: ASF GitHub Bot Created on: 01/Jun/21 06:09 Start Date: 01/Jun/21 06:09 Worklog Time Spent: 10m Work Description: lcspinter commented on pull request #2333: URL: https://github.com/apache/hive/pull/2333#issuecomment-851846758 @jcamachor @zabetak @marton-bod @pvary @szlta Could you please review this PR? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 604323) Time Spent: 20m (was: 10m) > Support all partition transforms for Iceberg in create table > > > Key: HIVE-25179 > URL: https://issues.apache.org/jira/browse/HIVE-25179 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Enhance table create syntax with support to partition transforms: > {code:sql} > CREATE TABLE ... PARTITIONED BY SPEC( > year_field year, > month_field month, > day_field day, > hour_field hour, > truncate_field truncate[3], > bucket_field bucket[5], > identity_field identity > ) STORED BY ICEBERG; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)