date:20210601

[jira] [Updated] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions for External Tables

2021-06-01 Thread Narayanan Venkateswaran (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayanan Venkateswaran updated HIVE-25178:
---
Summary: Reduce number of getPartition calls during loadDynamicPartitions 
for External Tables  (was: Reduce number of getPartition calls during 
loadDynamicPartitions)

> Reduce number of getPartition calls during loadDynamicPartitions for External 
> Tables
> 
>
> Key: HIVE-25178
> URL: https://issues.apache.org/jira/browse/HIVE-25178
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Narayanan Venkateswaran
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When dynamic partitions are loaded, Hive::loadDynamicPartition loads all 
> partitions from HMS causing heavy load on it. This becomes worse when large 
> number of partitions are present in tables.
> Only relevant partitions being loaded in dynamic partitions can be queried 
> from HMS for partition existence.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604953
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 05:24
Start Date: 02/Jun/21 05:24
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643661160



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -625,6 +639,16 @@ public boolean runOneWorkerIteration(
 }
 String cmd = null;
 try {
+  TableName tb = req.tableName;
+  String dbName = 
MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf);
+  if (dbsBeingFailedOver.contains(dbName)
+  || 
MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb( {
+if (!dbsBeingFailedOver.contains(dbName)) {

Review comment:
   you can simplify this. We don't need this check all the times




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604953)
Time Spent: 2.5h  (was: 2h 20m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25183) Parsing error for Correlated Inner Joins

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25183:
--
Labels: pull-request-available  (was: )

> Parsing error for Correlated Inner Joins
> 
>
> Key: HIVE-25183
> URL: https://issues.apache.org/jira/browse/HIVE-25183
> Project: Hive
>  Issue Type: Sub-task
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-25090



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25183) Parsing error for Correlated Inner Joins

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25183?focusedWorklogId=604950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604950
 ]

ASF GitHub Bot logged work on HIVE-25183:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 05:03
Start Date: 02/Jun/21 05:03
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r643653073



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java
##
@@ -104,12 +105,20 @@ private boolean hasTableAlias(JoinTypeCheckCtx ctx, 
String tabName, ASTNode expr
   tblAliasCnt++;
   }
 
+  if (tblAliasCnt == 0 && ctx.getOuterRR() != null) {

Review comment:
   Why do we do this check? Maybe add a comment to the code.

##
File path: ql/src/test/queries/clientpositive/subquery_corr_join.q
##
@@ -0,0 +1,69 @@
+create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+);
+
+insert into alltypestiny(id, int_col, bigint_col, bool_col) values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false);
+
+create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+);
+
+insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false);
+
+select *
+from alltypesagg t1
+where t1.id not in
+(select tt1.id
+ from alltypestiny tt1 inner JOIN alltypesagg tt2

Review comment:
   Can we add a q file with a negative test for outer joins? That will be 
useful to make sure that the query will fail for the time being, as expected.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterJoinRule.java
##
@@ -90,6 +90,36 @@ public boolean matches(RelOptRuleCall call) {
 
   }
 
+  /**
+   * Rule that tries to push join conditions into its inputs
+   */
+  public static class HiveJoinConditionPushRule extends HiveFilterJoinRule {

Review comment:
   Isn't this the same as `HiveFilterJoinTransposeRule`? It should not be 
necessary.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
##
@@ -34,6 +34,7 @@
 
 import javax.annotation.Nonnull;
 
+import org.apache.calcite.adapter.enumerable.EnumerableConvention;

Review comment:
   This does not seem needed?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/JoinCondTypeCheckProcFactory.java
##
@@ -194,6 +207,19 @@ private ColumnInfo getColInfo(JoinTypeCheckCtx ctx, String 
tabName, String colAl
 }
   }
 
+  if (cInfoToRet == null && ctx.getOuterRR() != null) {
+for (RowResolver rr : ImmutableList.of(ctx.getOuterRR())) {

Review comment:
   Is the `ImmutableList.of` wrapping needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604950)
Remaining Estimate: 0h
Time Spent: 10m

> Parsing error for Correlated Inner Joins
> 
>
> Key: HIVE-25183
> URL: https://issues.apache.org/jira/browse/HIVE-25183
> Project: Hive
>  Issue Type: Sub-task
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-25090



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604944
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 04:42
Start Date: 02/Jun/21 04:42
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643647370



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -625,6 +639,16 @@ public boolean runOneWorkerIteration(
 }
 String cmd = null;
 try {
+  TableName tb = req.tableName;
+  String dbName = 
MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf);
+  if (dbsBeingFailedOver.contains(dbName)
+  || 
MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb( {
+if (!dbsBeingFailedOver.contains(dbName)) {

Review comment:
   if current dbName is not present in dbsBeingFailover set, then it'll add 
this db to it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604944)
Time Spent: 2h 20m  (was: 2h 10m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604938
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 04:20
Start Date: 02/Jun/21 04:20
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643399198



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -625,6 +639,16 @@ public boolean runOneWorkerIteration(
 }
 String cmd = null;
 try {
+  TableName tb = req.tableName;
+  String dbName = 
MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf);
+  if (dbsBeingFailedOver.contains(dbName)
+  || 
MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb( {
+if (!dbsBeingFailedOver.contains(dbName)) {

Review comment:
   How will this condition be true ?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##
@@ -222,17 +237,31 @@ private void setupMsckPathInvalidation() {
 private Configuration conf;
 private String qualifiedTableName;
 private CountDownLatch countDownLatch;
+private Set dbsBeingFailedOver;
+private IMetaStoreClient msc;
 
-MsckThread(MsckInfo msckInfo, Configuration conf, String 
qualifiedTableName, CountDownLatch countDownLatch) {
+MsckThread(MsckInfo msckInfo, Configuration conf, String 
qualifiedTableName,
+   CountDownLatch countDownLatch, Set dbsBeingFailedOver, 
IMetaStoreClient msc) {
   this.msckInfo = msckInfo;
   this.conf = conf;
   this.qualifiedTableName = qualifiedTableName;
   this.countDownLatch = countDownLatch;
+  this.dbsBeingFailedOver = dbsBeingFailedOver;
+  this.msc = msc;
 }
 
 @Override
 public void run() {
   try {
+String dbName = 
MetaStoreUtils.prependCatalogToDbName(msckInfo.getCatalogName(), 
msckInfo.getDbName(), conf);
+if (dbsBeingFailedOver.contains(dbName) ||
+
MetaStoreUtils.isDbBeingFailedOver(msc.getDatabase(msckInfo.getCatalogName(), 
msckInfo.getDbName( {
+  if (!dbsBeingFailedOver.contains(dbName)) {

Review comment:
   This isn't thread-safe




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604938)
Time Spent: 2h 10m  (was: 2h)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604937
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 04:17
Start Date: 02/Jun/21 04:17
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2218:
URL: https://github.com/apache/hive/pull/2218#discussion_r643639955



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.utils.JavaUtils;
+import org.apache.thrift.TException;
+
+import static java.util.Objects.requireNonNull;
+
+public final class ExceptionHandler {
+  private final Exception e;
+
+  private ExceptionHandler(Exception e) {
+this.e = e;
+  }
+
+  public static ExceptionHandler handleException(Exception e) {
+requireNonNull(e, "Exception e is null");
+return new ExceptionHandler(e);
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clz
+   */
+  public  ExceptionHandler
+  throwIfInstance(Class t) throws T {
+if (t.isInstance(e)) {
+  throw t.cast(e);
+}
+return this;
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clzt or  class clze in 
order
+   */
+  public  ExceptionHandler
+  throwIfInstance(Class t, Class e) throws T, E {
+throwIfInstance(t);
+throwIfInstance(e);
+return this;
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clzt or  clze or clzc 
in order
+   */
+  public  
ExceptionHandler
+  throwIfInstance(Class t, Class e, Class c) throws T, E, C {

Review comment:
   Yes, we can do it by this more simplified way, thank you very much for 
the comments!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604937)
Time Spent: 2.5h  (was: 2h 20m)

> Improve the exception handling in HMSHandler
> 
>
> Key: HIVE-25055
> URL: https://issues.apache.org/jira/browse/HIVE-25055
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604936
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 04:14
Start Date: 02/Jun/21 04:14
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2218:
URL: https://github.com/apache/hive/pull/2218#discussion_r643639099



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.utils.JavaUtils;
+import org.apache.thrift.TException;
+
+import static java.util.Objects.requireNonNull;
+
+public final class ExceptionHandler {
+  private final Exception e;
+
+  private ExceptionHandler(Exception e) {
+this.e = e;
+  }
+
+  public static ExceptionHandler handleException(Exception e) {
+requireNonNull(e, "Exception e is null");
+return new ExceptionHandler(e);
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clz
+   */
+  public  ExceptionHandler
+  throwIfInstance(Class t) throws T {
+if (t.isInstance(e)) {
+  throw t.cast(e);
+}
+return this;
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clzt or  class clze in 
order
+   */
+  public  ExceptionHandler
+  throwIfInstance(Class t, Class e) throws T, E {
+throwIfInstance(t);
+throwIfInstance(e);
+return this;
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clzt or  clze or clzc 
in order
+   */
+  public  
ExceptionHandler
+  throwIfInstance(Class t, Class e, Class c) throws T, E, C {
+throwIfInstance(t);
+throwIfInstance(e);
+throwIfInstance(c);
+return this;
+  }
+
+  /**
+   * Converts the input e if it is the instance of class from to the instance 
of class to and throws
+   */
+  public  ExceptionHandler
+  convertIfInstance(Class from, Class to) throws D {
+D targetException = null;
+if (from.isInstance(e)) {
+  try {
+targetException = JavaUtils.newInstance(to, new Class[]{String.class}, 
new Object[]{e.getMessage()});
+  } catch (Exception ex) {
+// this should not happen
+throw new RuntimeException(ex);
+  }
+}
+if (targetException != null) {
+  throw targetException;
+}
+
+return this;
+  }
+
+  /**
+   * Converts the input e if it is the instance of classes to MetaException 
with the given message
+   */
+  public ExceptionHandler convertToMetaExIfInstance(String message, 
Class... classes)
+  throws MetaException {
+if (classes != null && classes.length > 0) {
+  for (Class clz : classes) {
+if (clz.isInstance(e)) {
+  // throw the exception if matches
+  throw new MetaException(message);
+}
+  }
+}
+return this;
+  }
+
+  public static TException rethrowException(Exception e) throws TException {

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604936)
Time Spent: 2h 20m  (was: 2h 10m)

> Improve the exception handling in HMSHandler
> 
>
> Key: HIVE-25055
> URL: https://issues.apache.org/jira/browse/HIVE-25055
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time

[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604934
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 04:14
Start Date: 02/Jun/21 04:14
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2218:
URL: https://github.com/apache/hive/pull/2218#discussion_r643638989



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.utils.JavaUtils;
+import org.apache.thrift.TException;
+
+import static java.util.Objects.requireNonNull;
+
+public final class ExceptionHandler {

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604934)
Time Spent: 2h  (was: 1h 50m)

> Improve the exception handling in HMSHandler
> 
>
> Key: HIVE-25055
> URL: https://issues.apache.org/jira/browse/HIVE-25055
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604935
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 04:14
Start Date: 02/Jun/21 04:14
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2218:
URL: https://github.com/apache/hive/pull/2218#discussion_r643639041



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -914,6 +914,7 @@ private boolean isViewTable(String catName, String dbName, 
String tblName) throw
 long queryTime = doTrace ? System.nanoTime() : 0;
 MetastoreDirectSqlUtils.timingTrace(doTrace, queryText, start, queryTime);
 if (sqlResult.isEmpty()) {
+  query.closeAll();

Review comment:
   Fine




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604935)
Time Spent: 2h 10m  (was: 2h)

> Improve the exception handling in HMSHandler
> 
>
> Key: HIVE-25055
> URL: https://issues.apache.org/jira/browse/HIVE-25055
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'

2021-06-01 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355461#comment-17355461
 ] 

zhangbutao commented on HIVE-25186:
---

This exception looks like  HIVE-23756 which failed to delete a table 
intermittently.

> Drop database fails with excetion 'Cannot delete or update a parent row: a 
> foreign key constraint fails'
> 
>
> Key: HIVE-25186
> URL: https://issues.apache.org/jira/browse/HIVE-25186
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: zhangbutao
>Priority: Major
> Attachments: drop_database_exception.txt
>
>
> We use Hive master branch  (HiveMetastoreClient Api) to create and drop 
> database. 
>  When we drop database with following sample code, some exceptions will occur 
> occasionally.
> {code:java}
>  HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); 
>  hiveMetaClient.dropDatabase("testdb", true, true, true);
> {code}
> {code:java}
> java.sql.BatchUpdateException: Cannot delete or update a parent row: a 
> foreign key constraint fails ("hive"."tbls", CONSTRAINT 
> "FKdg0lkp80iro5fs41hyvi9ox43" FOREIGN KEY ("DB_ID") REFERENCES "dbs" 
> ("DB_ID"))
> at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2058)
> at 
> com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1471)
> at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125)
> at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
> at 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:366)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:667)
> at 
> org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:635)
> at 
> org.datanucleus.store.rdbms.SQLController$1.transactionFlushed(SQLController.java:721)
> at 
> org.datanucleus.store.connection.AbstractManagedConnection.transactionFlushed(AbstractManagedConnection.java:95)
> at 
> org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:528)
> at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:222)
> at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:286)
> at 
> org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:107)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:598)
> at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
> at com.sun.proxy.$Proxy27.commitTransaction(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.drop_database_core(HMSHandler.java:1898)
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.drop_database(HMSHandler.java:1954)
> at sun.reflect.GeneratedMethodAccessor219.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
> at com.sun.proxy.$Proxy28.drop_database(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17577)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17556)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
> at 
>

[jira] [Updated] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'

2021-06-01 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-25186:
--
Attachment: drop_database_exception.txt

> Drop database fails with excetion 'Cannot delete or update a parent row: a 
> foreign key constraint fails'
> 
>
> Key: HIVE-25186
> URL: https://issues.apache.org/jira/browse/HIVE-25186
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: zhangbutao
>Priority: Major
> Attachments: drop_database_exception.txt
>
>
> We use Hive master branch  (HiveMetastoreClient Api) to create and drop 
> database. 
>  When we drop database with following sample code, some exceptions will occur 
> occasionally.
> {code:java}
>  HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); 
>  hiveMetaClient.dropDatabase("testdb", true, true, true);
> {code}
> {code:java}
> java.sql.BatchUpdateException: Cannot delete or update a parent row: a 
> foreign key constraint fails ("hive"."tbls", CONSTRAINT 
> "FKdg0lkp80iro5fs41hyvi9ox43" FOREIGN KEY ("DB_ID") REFERENCES "dbs" 
> ("DB_ID"))
> at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2058)
> at 
> com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1471)
> at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125)
> at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
> at 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:366)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:667)
> at 
> org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:635)
> at 
> org.datanucleus.store.rdbms.SQLController$1.transactionFlushed(SQLController.java:721)
> at 
> org.datanucleus.store.connection.AbstractManagedConnection.transactionFlushed(AbstractManagedConnection.java:95)
> at 
> org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:528)
> at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:222)
> at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:286)
> at 
> org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:107)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:598)
> at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
> at com.sun.proxy.$Proxy27.commitTransaction(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.drop_database_core(HMSHandler.java:1898)
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.drop_database(HMSHandler.java:1954)
> at sun.reflect.GeneratedMethodAccessor219.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
> at com.sun.proxy.$Proxy28.drop_database(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17577)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17556)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
>

[jira] [Updated] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'

2021-06-01 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-25186:
--
Description: 
We use Hive master branch  (HiveMetastoreClient Api) to create and drop 
database. 

 When we drop database with following sample code, some exceptions will occur 
occasionally.
{code:java}
 HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf); 
 hiveMetaClient.dropDatabase("testdb", true, true, true);
{code}
{code:java}
java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign 
key constraint fails ("hive"."tbls", CONSTRAINT "FKdg0lkp80iro5fs41hyvi9ox43" 
FOREIGN KEY ("DB_ID") REFERENCES "dbs" ("DB_ID"))
at 
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2058)
at 
com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1471)
at 
org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125)
at 
org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
at 
org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:366)
at 
org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:667)
at 
org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:635)
at 
org.datanucleus.store.rdbms.SQLController$1.transactionFlushed(SQLController.java:721)
at 
org.datanucleus.store.connection.AbstractManagedConnection.transactionFlushed(AbstractManagedConnection.java:95)
at 
org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:528)
at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:222)
at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:286)
at 
org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:107)
at 
org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:598)
at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at com.sun.proxy.$Proxy27.commitTransaction(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HMSHandler.drop_database_core(HMSHandler.java:1898)
at 
org.apache.hadoop.hive.metastore.HMSHandler.drop_database(HMSHandler.java:1954)
at sun.reflect.GeneratedMethodAccessor219.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy28.drop_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17577)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:17556)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

  was:
We use Hive master branch  (HiveMetastoreClient Api) to create and drop 
database. 

 When we drop database with following sample code, some exceptions will occur 
occasionally.
{code:java}
 HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf);
 hiveMetaClient.dropDatabase("testdb", true, true, true);
{code}


> Drop database fails with excetion 'Cannot delete or update a parent row: a 
> foreign key constraint fails'
>

[jira] [Updated] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'

2021-06-01 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-25186:
--
Description: 
We use Hive master branch  (HiveMetastoreClient Api) to create and drop 
database. 

 When we drop database with following sample code, some exceptions will occur 
occasionally.
{code:java}
 HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf);
 hiveMetaClient.dropDatabase("testdb", true, true, true);
{code}

  was:
We use Hive master branch  (HiveMetastoreClient) to create and drop database. 

 


> Drop database fails with excetion 'Cannot delete or update a parent row: a 
> foreign key constraint fails'
> 
>
> Key: HIVE-25186
> URL: https://issues.apache.org/jira/browse/HIVE-25186
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: zhangbutao
>Priority: Major
>
> We use Hive master branch  (HiveMetastoreClient Api) to create and drop 
> database. 
>  When we drop database with following sample code, some exceptions will occur 
> occasionally.
> {code:java}
>  HiveMetaStoreClient hiveMetaClient = new HiveMetaStoreClient(hiveConf);
>  hiveMetaClient.dropDatabase("testdb", true, true, true);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25186) Drop database fails with excetion 'Cannot delete or update a parent row: a foreign key constraint fails'

2021-06-01 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-25186:
--
Description: 
We use Hive master branch  (HiveMetastoreClient) to create and drop database. 

 

> Drop database fails with excetion 'Cannot delete or update a parent row: a 
> foreign key constraint fails'
> 
>
> Key: HIVE-25186
> URL: https://issues.apache.org/jira/browse/HIVE-25186
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: zhangbutao
>Priority: Major
>
> We use Hive master branch  (HiveMetastoreClient) to create and drop database. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604872
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 01:25
Start Date: 02/Jun/21 01:25
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643587387



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -94,6 +98,8 @@
   private BlockingQueue workQueue;
   private Thread[] workers;
 
+  private Set dbsBeingFailedOver;

Review comment:
   This is the only way we can access this set within each iteration of 
StatsUpdater and also within each execution of actual analysis work after 
dequeuing from the worker queue. This set is clear when new iteration of this 
thread kicks in.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604872)
Time Spent: 2h  (was: 1h 50m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=604869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604869
 ]

ASF GitHub Bot logged work on HIVE-25185:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 01:19
Start Date: 02/Jun/21 01:19
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2339:
URL: https://github.com/apache/hive/pull/2339#discussion_r643585446



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##
@@ -131,13 +131,14 @@ SessionType getSession() throws Exception {
   poolLock.lock();
   try {
 while ((result = pool.poll()) == null) {
-  notEmpty.await(100, TimeUnit.MILLISECONDS);
+  LOG.info("Awaiting Tez session to become available in session pool");
+  notEmpty.await(10, TimeUnit.SECONDS);

Review comment:
   Hello @miklosgergely.  The current change wait for 100ms whereas I have 
changed this to wait until 10s.  I'm not sure what the value is of setting a 
timeout, it will always just loop again.  Since I do not see any value here, 
but I wouldn't want to remove this `wait` altogether as part of this ticket, I 
simply increased it.  Logging once every 100ms would be far too verbose. So you 
see, this change is logging-related. :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604869)
Time Spent: 40m  (was: 0.5h)

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=604868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604868
 ]

ASF GitHub Bot logged work on HIVE-25185:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 01:19
Start Date: 02/Jun/21 01:19
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2339:
URL: https://github.com/apache/hive/pull/2339#discussion_r643585446



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##
@@ -131,13 +131,14 @@ SessionType getSession() throws Exception {
   poolLock.lock();
   try {
 while ((result = pool.poll()) == null) {
-  notEmpty.await(100, TimeUnit.MILLISECONDS);
+  LOG.info("Awaiting Tez session to become available in session pool");
+  notEmpty.await(10, TimeUnit.SECONDS);

Review comment:
   Hello @miklosgergely.  The current change wait for 100ms whereas I have 
changed this to wait until 10s.  I'm not sure what the value is of setting a 
timeout, it will always just loop again.  Since I do not see any value here, 
but I wouldn't want to remove this as part of this ticket, I simply increased 
it.  Logging once every 100ms would be far too verbose. So you see, this change 
is logging-related. :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604868)
Time Spent: 0.5h  (was: 20m)

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25168) Add mutable validWriteIdList

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25168?focusedWorklogId=604858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604858
 ]

ASF GitHub Bot logged work on HIVE-25168:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 01:06
Start Date: 02/Jun/21 01:06
Worklog Time Spent: 10m 
  Work Description: hsnusonic closed pull request #2324:
URL: https://github.com/apache/hive/pull/2324


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604858)
Time Spent: 40m  (was: 0.5h)

> Add mutable validWriteIdList
> 
>
> Key: HIVE-25168
> URL: https://issues.apache.org/jira/browse/HIVE-25168
> Project: Hive
>  Issue Type: New Feature
>  Components: storage-api
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Although the current implementation for validWriteIdList is not strictly 
> immutable, it is in some sense to provide a read-only view snapshot. This 
> change is to add another class to provide functionalities for manipulating 
> the writeIdList. We could use this to keep writeIdList up-to-date in an 
> external cache layer for event-based metadata refreshing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=604834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604834
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 00:13
Start Date: 02/Jun/21 00:13
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #2218:
URL: https://github.com/apache/hive/pull/2218#discussion_r643488870



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -914,6 +914,7 @@ private boolean isViewTable(String catName, String dbName, 
String tblName) throw
 long queryTime = doTrace ? System.nanoTime() : 0;
 MetastoreDirectSqlUtils.timingTrace(doTrace, queryText, start, queryTime);
 if (sqlResult.isEmpty()) {
+  query.closeAll();

Review comment:
   It looks like this is fixing a unrelated bug. Can we move this out of 
this PR and create a different JIRA for this?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.utils.JavaUtils;
+import org.apache.thrift.TException;
+
+import static java.util.Objects.requireNonNull;
+
+public final class ExceptionHandler {

Review comment:
   Can you add a class level comment?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.utils.JavaUtils;
+import org.apache.thrift.TException;
+
+import static java.util.Objects.requireNonNull;
+
+public final class ExceptionHandler {
+  private final Exception e;
+
+  private ExceptionHandler(Exception e) {
+this.e = e;
+  }
+
+  public static ExceptionHandler handleException(Exception e) {
+requireNonNull(e, "Exception e is null");
+return new ExceptionHandler(e);
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clz
+   */
+  public  ExceptionHandler
+  throwIfInstance(Class t) throws T {
+if (t.isInstance(e)) {
+  throw t.cast(e);
+}
+return this;
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clzt or  class clze in 
order
+   */
+  public  ExceptionHandler
+  throwIfInstance(Class t, Class e) throws T, E {
+throwIfInstance(t);
+throwIfInstance(e);
+return this;
+  }
+
+  /**
+   * Throws if the input e is the instance of the class clzt or  clze or clzc 
in order
+   */
+  public  
ExceptionHandler
+  throwIfInstance(Class t, Class e, Class c) throws T, E, C {
+throwIfInstance(t);
+throwIfInstance(e);
+throwIfInstance(c);
+return this;
+  }
+
+  /**
+   * Converts the input e if it is the instance of class from to the

[jira] [Work logged] (HIVE-23756) drop table command fails with MySQLIntegrityConstraintViolationException:

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23756?focusedWorklogId=604821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604821
 ]

ASF GitHub Bot logged work on HIVE-23756:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 23:17
Start Date: 01/Jun/21 23:17
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera opened a new pull request #2340:
URL: https://github.com/apache/hive/pull/2340


   In a previous checkin, some constraints were added to the package.jdo
   file, but there are more constraints that need to be added to fix
   the problem.
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604821)
Time Spent: 40m  (was: 0.5h)

> drop table command fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23756.1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table ([Ref|#L60]]). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-06-01 Thread Matt McCline (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25140:

Attachment: HIVE-25140.03.patch

> Hive Distributed Tracing -- Part 1: Disabled
> 
>
> Key: HIVE-25140
> URL: https://issues.apache.org/jira/browse/HIVE-25140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25140.01.patch, HIVE-25140.02.patch, 
> HIVE-25140.03.patch
>
>
> Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to 
> Thrift and protobuf version conflicts. A logging only exporter is used.
> There are Spans for BeeLine and Hive. Server 2. The code was developed on 
> branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
> time due to major metastore code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=604802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604802
 ]

ASF GitHub Bot logged work on HIVE-25185:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 22:04
Start Date: 01/Jun/21 22:04
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#2339:
URL: https://github.com/apache/hive/pull/2339#discussion_r643513763



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##
@@ -131,13 +131,14 @@ SessionType getSession() throws Exception {
   poolLock.lock();
   try {
 while ((result = pool.poll()) == null) {
-  notEmpty.await(100, TimeUnit.MILLISECONDS);
+  LOG.info("Awaiting Tez session to become available in session pool");
+  notEmpty.await(10, TimeUnit.SECONDS);

Review comment:
   You are changing the waiting time to 10s from 100s. If this is 
intentional, then it shouldn't be put into a commit where the commit message 
says that it's about improving logging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604802)
Time Spent: 20m  (was: 10m)

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24944?focusedWorklogId=604765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604765
 ]

ASF GitHub Bot logged work on HIVE-24944:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 20:21
Start Date: 01/Jun/21 20:21
Worklog Time Spent: 10m 
  Work Description: belugabehr edited a comment on pull request #2204:
URL: https://github.com/apache/hive/pull/2204#issuecomment-852419901


   Looks alright.  Can you do me a quick favor and switch statement it?
   
   ```java
   
   switch(engineInSessionConf) {
 case "tez":
 case "mr":
 default:
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604765)
Time Spent: 50m  (was: 40m)

> When the default engine of the hiveserver is MR and the tez engine is set by 
> the client, the client TEZ progress log cannot be printed normally
> ---
>
> Key: HIVE-24944
> URL: https://issues.apache.org/jira/browse/HIVE-24944
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: ZhangQiDong
>Assignee: ZhangQiDong
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24944.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HiveServer configuration parameter execution default MR. When set 
> hive.execution.engine = tez, the client cannot print the TEZ log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24944?focusedWorklogId=604764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604764
 ]

ASF GitHub Bot logged work on HIVE-24944:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 20:21
Start Date: 01/Jun/21 20:21
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2204:
URL: https://github.com/apache/hive/pull/2204#issuecomment-852419901


   Looks alright.  Can you do me a quick favor and case statement it?
   
   ```java
   
   switch(engineInSessionConf) {
 case "tez":
 case "mr":
 default:
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604764)
Time Spent: 40m  (was: 0.5h)

> When the default engine of the hiveserver is MR and the tez engine is set by 
> the client, the client TEZ progress log cannot be printed normally
> ---
>
> Key: HIVE-24944
> URL: https://issues.apache.org/jira/browse/HIVE-24944
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: ZhangQiDong
>Assignee: ZhangQiDong
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24944.001.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HiveServer configuration parameter execution default MR. When set 
> hive.execution.engine = tez, the client cannot print the TEZ log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25168) Add mutable validWriteIdList

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25168?focusedWorklogId=604732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604732
 ]

ASF GitHub Bot logged work on HIVE-25168:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 19:37
Start Date: 01/Jun/21 19:37
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #2324:
URL: https://github.com/apache/hive/pull/2324#discussion_r643423954



##
File path: 
storage-api/src/java/org/apache/hadoop/hive/common/MutableValidReaderWriteIdList.java
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.common;
+
+import com.google.common.base.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * This class is a mutable version of {@link ValidReaderWriteIdList} for use 
by an external cache layer.
+ * To use this class, we need to always mark the writeId as open before to 
mark it as aborted/committed.
+ * This class is not thread safe.
+ */
+public class MutableValidReaderWriteIdList extends ValidReaderWriteIdList 
implements MutableValidWriteIdList {
+  private static final Logger LOG = 
LoggerFactory.getLogger(MutableValidReaderWriteIdList.class.getName());
+
+  public MutableValidReaderWriteIdList(ValidReaderWriteIdList writeIdList) {
+super(writeIdList.writeToString());
+exceptions = new ArrayList<>(exceptions);
+  }
+
+  @Override
+  public void addOpenWriteId(long writeId) {
+if (writeId <= highWatermark) {
+  LOG.debug("Won't add any open write id because {} is less than or equal 
to high watermark: {}",

Review comment:
   Please keep the log messages shorter, as they can occupy lot of storage 
space. More like ("not adding openWriteId: {} {}", writeId, highWatermark)

##
File path: 
storage-api/src/java/org/apache/hadoop/hive/common/MutableValidReaderWriteIdList.java
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.common;
+
+import com.google.common.base.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * This class is a mutable version of {@link ValidReaderWriteIdList} for use 
by an external cache layer.
+ * To use this class, we need to always mark the writeId as open before to 
mark it as aborted/committed.
+ * This class is not thread safe.

Review comment:
   What is the implication of this class not being thread safe ?

##
File path: 
storage-api/src/java/org/apache/hadoop/hive/common/MutableValidReaderWriteIdList.java
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.

[jira] [Work logged] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24987?focusedWorklogId=604717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604717
 ]

ASF GitHub Bot logged work on HIVE-24987:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 19:18
Start Date: 01/Jun/21 19:18
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #2336:
URL: https://github.com/apache/hive/pull/2336#discussion_r643416465



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DefaultIncompatibleTableChangeHandler.java
##
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.InvalidOperationException;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Default incompatible table change handler. This is invoked by the {@link
+ * HiveAlterHandler} when a table is altered to check if the column type 
changes if any
+ * are allowed or not.
+ */
+public class DefaultIncompatibleTableChangeHandler implements
+IMetaStoreIncompatibleChangeHandler {
+
+  private static final Logger LOG = LoggerFactory
+  .getLogger(DefaultIncompatibleTableChangeHandler.class);
+  private static final DefaultIncompatibleTableChangeHandler INSTANCE =
+  new DefaultIncompatibleTableChangeHandler();
+
+  private DefaultIncompatibleTableChangeHandler() {
+  }
+
+  public static DefaultIncompatibleTableChangeHandler get() {
+return INSTANCE;
+  }
+
+  /**
+   * Checks if the column type changes in the oldTable and newTable are 
allowed or not. In
+   * addition to checking if the incompatible changes are allowed or not, this 
also checks
+   * if the table serde library belongs to a list of table serdes which 
support making any
+   * column type changes.
+   *
+   * @param conf The configuration which if incompatible col type changes 
are allowed
+   * or not.
+   * @param oldTable The instance of the table being altered.
+   * @param newTable The new instance of the table which represents the 
altered state of
+   * the table.
+   * @throws InvalidOperationException
+   */
+  @Override
+  public void allowChange(Configuration conf, Table oldTable, Table newTable)
+  throws InvalidOperationException {
+if (!MetastoreConf.getBoolVar(conf,
+MetastoreConf.ConfVars.DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES)) {
+  // incompatible column changes are allowed for all
+  return;
+}
+if (oldTable.getTableType().equals(TableType.VIRTUAL_VIEW.toString())) {
+  // Views derive the column type from the base table definition. So the 
view
+  // definition can be altered to change the column types. The column type
+  // compatibility checks should be done only for non-views.
+  return;
+}
+checkColTypeChangeCompatible(conf, oldTable, newTable);
+  }
+
+  private void checkColTypeChangeCompatible(Configuration conf, Table oldTable,

Review comment:
   Yes, agreed. This was left here because of git conflicts. Thanks for 
spotting that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604717)
Time Spent: 0.5h  (was: 20m)

> hive.metastore.disallow.incompatible.col.type.changes is too restrictive for 
> some storage formats
> -
>
> Key: HIVE-24987
>

[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=604714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604714
 ]

ASF GitHub Bot logged work on HIVE-25185:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 19:15
Start Date: 01/Jun/21 19:15
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2339:
URL: https://github.com/apache/hive/pull/2339


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604714)
Remaining Estimate: 0h
Time Spent: 10m

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25185:
--
Labels: pull-request-available  (was: )

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-01 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25185:
-


> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions

2021-06-01 Thread Narayanan Venkateswaran (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayanan Venkateswaran reassigned HIVE-25178:
--

Assignee: Narayanan Venkateswaran

> Reduce number of getPartition calls during loadDynamicPartitions
> 
>
> Key: HIVE-25178
> URL: https://issues.apache.org/jira/browse/HIVE-25178
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Narayanan Venkateswaran
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When dynamic partitions are loaded, Hive::loadDynamicPartition loads all 
> partitions from HMS causing heavy load on it. This becomes worse when large 
> number of partitions are present in tables.
> Only relevant partitions being loaded in dynamic partitions can be queried 
> from HMS for partition existence.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions

2021-06-01 Thread Narayanan Venkateswaran (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25178 started by Narayanan Venkateswaran.
--
> Reduce number of getPartition calls during loadDynamicPartitions
> 
>
> Key: HIVE-25178
> URL: https://issues.apache.org/jira/browse/HIVE-25178
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Narayanan Venkateswaran
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When dynamic partitions are loaded, Hive::loadDynamicPartition loads all 
> partitions from HMS causing heavy load on it. This becomes worse when large 
> number of partitions are present in tables.
> Only relevant partitions being loaded in dynamic partitions can be queried 
> from HMS for partition existence.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24987?focusedWorklogId=604713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604713
 ]

ASF GitHub Bot logged work on HIVE-24987:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 19:14
Start Date: 01/Jun/21 19:14
Worklog Time Spent: 10m 
  Work Description: yongzhi commented on a change in pull request #2336:
URL: https://github.com/apache/hive/pull/2336#discussion_r643412805



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DefaultIncompatibleTableChangeHandler.java
##
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.InvalidOperationException;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Default incompatible table change handler. This is invoked by the {@link
+ * HiveAlterHandler} when a table is altered to check if the column type 
changes if any
+ * are allowed or not.
+ */
+public class DefaultIncompatibleTableChangeHandler implements
+IMetaStoreIncompatibleChangeHandler {
+
+  private static final Logger LOG = LoggerFactory
+  .getLogger(DefaultIncompatibleTableChangeHandler.class);
+  private static final DefaultIncompatibleTableChangeHandler INSTANCE =
+  new DefaultIncompatibleTableChangeHandler();
+
+  private DefaultIncompatibleTableChangeHandler() {
+  }
+
+  public static DefaultIncompatibleTableChangeHandler get() {
+return INSTANCE;
+  }
+
+  /**
+   * Checks if the column type changes in the oldTable and newTable are 
allowed or not. In
+   * addition to checking if the incompatible changes are allowed or not, this 
also checks
+   * if the table serde library belongs to a list of table serdes which 
support making any
+   * column type changes.
+   *
+   * @param conf The configuration which if incompatible col type changes 
are allowed
+   * or not.
+   * @param oldTable The instance of the table being altered.
+   * @param newTable The new instance of the table which represents the 
altered state of
+   * the table.
+   * @throws InvalidOperationException
+   */
+  @Override
+  public void allowChange(Configuration conf, Table oldTable, Table newTable)
+  throws InvalidOperationException {
+if (!MetastoreConf.getBoolVar(conf,
+MetastoreConf.ConfVars.DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES)) {
+  // incompatible column changes are allowed for all
+  return;
+}
+if (oldTable.getTableType().equals(TableType.VIRTUAL_VIEW.toString())) {
+  // Views derive the column type from the base table definition. So the 
view
+  // definition can be altered to change the column types. The column type
+  // compatibility checks should be done only for non-views.
+  return;
+}
+checkColTypeChangeCompatible(conf, oldTable, newTable);
+  }
+
+  private void checkColTypeChangeCompatible(Configuration conf, Table oldTable,

Review comment:
   Should we remove the checkColTypeChangeCompatible in 
HiveAlterHandler.java ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604713)
Time Spent: 20m  (was: 10m)

> hive.metastore.disallow.incompatible.col.type.changes is too restrictive for 
> some storage formats
> -
>
> Key: HIVE-24987
>

[jira] [Work logged] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25178?focusedWorklogId=604709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604709
 ]

ASF GitHub Bot logged work on HIVE-25178:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 19:03
Start Date: 01/Jun/21 19:03
Worklog Time Spent: 10m 
  Work Description: vnhive opened a new pull request #2338:
URL: https://github.com/apache/hive/pull/2338


   …rtitions
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604709)
Remaining Estimate: 0h
Time Spent: 10m

> Reduce number of getPartition calls during loadDynamicPartitions
> 
>
> Key: HIVE-25178
> URL: https://issues.apache.org/jira/browse/HIVE-25178
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When dynamic partitions are loaded, Hive::loadDynamicPartition loads all 
> partitions from HMS causing heavy load on it. This becomes worse when large 
> number of partitions are present in tables.
> Only relevant partitions being loaded in dynamic partitions can be queried 
> from HMS for partition existence.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25178:
--
Labels: performance pull-request-available  (was: performance)

> Reduce number of getPartition calls during loadDynamicPartitions
> 
>
> Key: HIVE-25178
> URL: https://issues.apache.org/jira/browse/HIVE-25178
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When dynamic partitions are loaded, Hive::loadDynamicPartition loads all 
> partitions from HMS causing heavy load on it. This becomes worse when large 
> number of partitions are present in tables.
> Only relevant partitions being loaded in dynamic partitions can be queried 
> from HMS for partition existence.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions

2021-06-01 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25184:
--
Description: 
Was recently troubleshooting an issue and noticed a NPE in the logs.  I tracked 
it down to {{ReExecDriver}} code.  The "afterExecute" code gets called if the 
Driver call succeed or fails. However, if there is a failure, the Driver is 
instructed to "clean up" by some internal try-catch and so there the 
afterExecute code fails with a NPE when it tried to read state out of the 
Driver class.

 

[https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170]

 

Move this afterExecute code into the try-catch block so it's only executed on 
success (and there is valid state within the Driver).  I looked at the code a 
bit and it seems like the only listener that handles this afterExecute code 
assumes the state is always valid, so there is currently no way to pass it 
'null' on a failure or 'state' on a success.

> ReExecDriver Only Run afterExecute If No Exceptions
> ---
>
> Key: HIVE-25184
> URL: https://issues.apache.org/jira/browse/HIVE-25184
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> Was recently troubleshooting an issue and noticed a NPE in the logs.  I 
> tracked it down to {{ReExecDriver}} code.  The "afterExecute" code gets 
> called if the Driver call succeed or fails. However, if there is a failure, 
> the Driver is instructed to "clean up" by some internal try-catch and so 
> there the afterExecute code fails with a NPE when it tried to read state out 
> of the Driver class.
>  
> [https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170]
>  
> Move this afterExecute code into the try-catch block so it's only executed on 
> success (and there is valid state within the Driver).  I looked at the code a 
> bit and it seems like the only listener that handles this afterExecute code 
> assumes the state is always valid, so there is currently no way to pass it 
> 'null' on a failure or 'state' on a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25184:
--
Labels: pull-request-available  (was: )

> ReExecDriver Only Run afterExecute If No Exceptions
> ---
>
> Key: HIVE-25184
> URL: https://issues.apache.org/jira/browse/HIVE-25184
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Was recently troubleshooting an issue and noticed a NPE in the logs.  I 
> tracked it down to {{ReExecDriver}} code.  The "afterExecute" code gets 
> called if the Driver call succeed or fails. However, if there is a failure, 
> the Driver is instructed to "clean up" by some internal try-catch and so 
> there the afterExecute code fails with a NPE when it tried to read state out 
> of the Driver class.
>  
> [https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170]
>  
> Move this afterExecute code into the try-catch block so it's only executed on 
> success (and there is valid state within the Driver).  I looked at the code a 
> bit and it seems like the only listener that handles this afterExecute code 
> assumes the state is always valid, so there is currently no way to pass it 
> 'null' on a failure or 'state' on a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25184?focusedWorklogId=604708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604708
 ]

ASF GitHub Bot logged work on HIVE-25184:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 19:00
Start Date: 01/Jun/21 19:00
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2337:
URL: https://github.com/apache/hive/pull/2337


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604708)
Remaining Estimate: 0h
Time Spent: 10m

> ReExecDriver Only Run afterExecute If No Exceptions
> ---
>
> Key: HIVE-25184
> URL: https://issues.apache.org/jira/browse/HIVE-25184
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Was recently troubleshooting an issue and noticed a NPE in the logs.  I 
> tracked it down to {{ReExecDriver}} code.  The "afterExecute" code gets 
> called if the Driver call succeed or fails. However, if there is a failure, 
> the Driver is instructed to "clean up" by some internal try-catch and so 
> there the afterExecute code fails with a NPE when it tried to read state out 
> of the Driver class.
>  
> [https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170]
>  
> Move this afterExecute code into the try-catch block so it's only executed on 
> success (and there is valid state within the Driver).  I looked at the code a 
> bit and it seems like the only listener that handles this afterExecute code 
> assumes the state is always valid, so there is currently no way to pass it 
> 'null' on a failure or 'state' on a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions

2021-06-01 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25184:
-


> ReExecDriver Only Run afterExecute If No Exceptions
> ---
>
> Key: HIVE-25184
> URL: https://issues.apache.org/jira/browse/HIVE-25184
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604704
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 18:49
Start Date: 01/Jun/21 18:49
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643398280



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -94,6 +98,8 @@
   private BlockingQueue workQueue;
   private Thread[] workers;
 
+  private Set dbsBeingFailedOver;

Review comment:
   Why do we need it at instance level?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604704)
Time Spent: 1h 50m  (was: 1h 40m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats

2021-06-01 Thread Vihang Karajgaonkar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355286#comment-17355286
 ] 

Vihang Karajgaonkar commented on HIVE-24987:


Published a PR with the proposed change.

> hive.metastore.disallow.incompatible.col.type.changes is too restrictive for 
> some storage formats
> -
>
> Key: HIVE-24987
> URL: https://issues.apache.org/jira/browse/HIVE-24987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when {{hive.metastore.disallow.incompatible.col.type.changes}} is 
> set to true it disallows any schema changes which are deemed as backwards 
> incompatible e.g dropping a column of a table. While this may be a correct 
> thing to do for Parquet or Orc tables, it is too restrictive for storage 
> formats like Kudu. 
> Currently, for Kudu tables, Impala supports dropping a column. But if we set 
> this config to true metastore disallows changing the schema of the metastore 
> table. I am assuming this would be problematic for Iceberg tables too which 
> supports such schema changes.
> The proposal is to have a new configuration which provided a exclusion list 
> of the table fileformat where this check will be skipped. Currently, we will 
> only include Kudu tables to skip this check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24987:
--
Labels: pull-request-available  (was: )

> hive.metastore.disallow.incompatible.col.type.changes is too restrictive for 
> some storage formats
> -
>
> Key: HIVE-24987
> URL: https://issues.apache.org/jira/browse/HIVE-24987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when {{hive.metastore.disallow.incompatible.col.type.changes}} is 
> set to true it disallows any schema changes which are deemed as backwards 
> incompatible e.g dropping a column of a table. While this may be a correct 
> thing to do for Parquet or Orc tables, it is too restrictive for storage 
> formats like Kudu. 
> Currently, for Kudu tables, Impala supports dropping a column. But if we set 
> this config to true metastore disallows changing the schema of the metastore 
> table. I am assuming this would be problematic for Iceberg tables too which 
> supports such schema changes.
> The proposal is to have a new configuration which provided a exclusion list 
> of the table fileformat where this check will be skipped. Currently, we will 
> only include Kudu tables to skip this check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24987?focusedWorklogId=604694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604694
 ]

ASF GitHub Bot logged work on HIVE-24987:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 18:36
Start Date: 01/Jun/21 18:36
Worklog Time Spent: 10m 
  Work Description: vihangk1 opened a new pull request #2336:
URL: https://github.com/apache/hive/pull/2336


   
   
   ### What changes were proposed in this pull request?
   hive.metastore.disallow.incompatible.col.type.check config currently checks 
if a alter table operation is making a incompatible schema change to the table. 
By default it is set to true which would error out the alter table call when 
such a change is detected. However, this change is too restrictive for certain 
file-formats like Kudu. In case of Kudu, it is allowed to drop a column which 
could result in the schema to be incompatible according to current 
implementation of this check. This causes a bad user-experience for Kudu users 
and there is no real work-around than to disable this check all together. 
Disabling the check is not an option since for file-formats like Parquet this 
is should be true to avoid data corruption/incorrect results.
   
   This change introduces a new config which can be used by users to provide a 
exception list based on table serde library name. If a table belongs to such a 
serde, the check is skipped. By default currently, only Kudu tables are added 
to this config.
   
   ### Why are the changes needed?
   See above.
   
   ### Does this PR introduce _any_ user-facing change?
   This introduces a new configuration option for metastore.
   
   ### How was this patch tested?
   A new unit-test was added to exercise the specific use-case. Existing tests 
make sure that previous behavior is not changed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604694)
Remaining Estimate: 0h
Time Spent: 10m

> hive.metastore.disallow.incompatible.col.type.changes is too restrictive for 
> some storage formats
> -
>
> Key: HIVE-24987
> URL: https://issues.apache.org/jira/browse/HIVE-24987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when {{hive.metastore.disallow.incompatible.col.type.changes}} is 
> set to true it disallows any schema changes which are deemed as backwards 
> incompatible e.g dropping a column of a table. While this may be a correct 
> thing to do for Parquet or Orc tables, it is too restrictive for storage 
> formats like Kudu. 
> Currently, for Kudu tables, Impala supports dropping a column. But if we set 
> this config to true metastore disallows changing the schema of the metastore 
> table. I am assuming this would be problematic for Iceberg tables too which 
> supports such schema changes.
> The proposal is to have a new configuration which provided a exclusion list 
> of the table fileformat where this check will be skipped. Currently, we will 
> only include Kudu tables to skip this check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25183) Parsing error for Correlated Inner Joins

2021-06-01 Thread Soumyakanti Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-25183:
---
Description: The issue is similar to HIVE-25090  (was: The issue is similar 
to [link HIVE-25090|https://issues.apache.org/jira/browse/HIVE-25090])

> Parsing error for Correlated Inner Joins
> 
>
> Key: HIVE-25183
> URL: https://issues.apache.org/jira/browse/HIVE-25183
> Project: Hive
>  Issue Type: Sub-task
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
> The issue is similar to HIVE-25090



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25183) Parsing error for Correlated Inner Joins

2021-06-01 Thread Soumyakanti Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das reassigned HIVE-25183:
--


> Parsing error for Correlated Inner Joins
> 
>
> Key: HIVE-25183
> URL: https://issues.apache.org/jira/browse/HIVE-25183
> Project: Hive
>  Issue Type: Sub-task
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
> The issue is similar to [link 
> HIVE-25090|https://issues.apache.org/jira/browse/HIVE-25090]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25150?focusedWorklogId=604635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604635
 ]

ASF GitHub Bot logged work on HIVE-25150:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 17:15
Start Date: 01/Jun/21 17:15
Worklog Time Spent: 10m 
  Work Description: tarak271 commented on a change in pull request #2308:
URL: https://github.com/apache/hive/pull/2308#discussion_r643326657



##
File path: 
storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java
##
@@ -273,7 +269,8 @@ public static boolean fastSetFromBytes(byte[] bytes, int 
offset, int length, boo
 int index = offset;
 
 if (trimBlanks) {
-  while (bytes[index] == BYTE_BLANK) {
+  //Character.isWhitespace handles both space and tab character

Review comment:
   @maheshk114 
   Added a new function to validate more characters supported by Mysql, 
postgres like HORIZONTAL_TABULATION, VERTICAL_TABULATION, FORM_FEED & 
SPACE_SEPARATOR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604635)
Time Spent: 50m  (was: 40m)

> Tab characters are not removed before decimal conversion similar to space 
> character which is fixed as part of HIVE-24378
> 
>
> Key: HIVE-25150
> URL: https://issues.apache.org/jira/browse/HIVE-25150
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Test case: 
>  column values with space and tab character 
> {noformat}
> bash-4.2$ cat data/files/test_dec_space.csv
> 1,0
> 2, 1
> 3,2{noformat}
> {noformat}
> create external table test_dec_space (id int, value decimal) ROW FORMAT 
> DELIMITED
>  FIELDS TERMINATED BY ',' location '/tmp/test_dec_space';
> {noformat}
> output of select * from test_dec_space would be
> {noformat}
> 1 0
> 2 1
> 3 NULL{noformat}
> The behaviour in MySQL when there is tab & space characters in decimal values
> {noformat}
> bash-4.2$ cat /tmp/insert.csv 
> "1","aa",11.88
> "2","bb", 99.88
> "4","dd", 209.88{noformat}
>  
> {noformat}
> MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields 
> terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
>  Query OK, 3 rows affected, 3 warnings (0.00 sec) 
>  Records: 3 Deleted: 0 Skipped: 0 Warnings: 3
> MariaDB [test]> select * from t2;
> +--+--+---+
> | id   | name | score |
> +--+--+---+
> | 1| aa   |12 |
> | 2| bb   |   100 |
> | 4| dd   |   210 |
> +--+--+---+
>  3 rows in set (0.00 sec)
> {noformat}
> So in hive also we can make it work by skipping tab character



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift

2021-06-01 Thread Jesus Camacho Rodriguez (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355217#comment-17355217
 ] 

Jesus Camacho Rodriguez commented on HIVE-25129:


It's been a while, but I assume if we do timezone shifting, e.g., we use the 
old write path, this may still occur. On the other hand, I think this would be 
fixed once we write timestamp as it is represented internally, i.e., in UTC.

> Wrong results when timestamps stored in Avro/Parquet fall into the DST shift
> 
>
> Key: HIVE-25129
> URL: https://issues.apache.org/jira/browse/HIVE-25129
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: parquet_timestamp_dst.q
>
>
> Timestamp values falling into the daylight savings time of the system 
> timezone cannot be retrieved as is when those are stored in Parquet/Avro 
> tables. The respective SELECT query shifts those timestamps by +1 reflecting 
> the DST shift.
> +Example+
> {code:sql}
> --! qt:timezone:US/Pacific
> create table employee (eid int, birthdate timestamp) stored as parquet;
> insert into employee values (0, '2019-03-10 02:00:00');
> insert into employee values (1, '2020-03-08 02:00:00');
> insert into employee values (2, '2021-03-14 02:00:00');
> select eid, birthdate from employee order by eid;{code}
> +Actual results+
> |0|2019-03-10 03:00:00|
> |1|2020-03-08 03:00:00|
> |2|2021-03-14 03:00:00|
> +Expected results+
> |0|2019-03-10 02:00:00|
> |1|2020-03-08 02:00:00|
> |2|2021-03-14 02:00:00|
> Storing and retrieving values in columns using the [timestamp data 
> type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]
>  (equivalent with LocalDateTime java API) should not alter at any way the 
> value that the user is seeing. The results are correct for {{TEXTFILE}} and 
> {{ORC}} tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24875) Unify InetAddress.getLocalHost()

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24875?focusedWorklogId=604570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604570
 ]

ASF GitHub Bot logged work on HIVE-24875:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 15:48
Start Date: 01/Jun/21 15:48
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2314:
URL: https://github.com/apache/hive/pull/2314#discussion_r643230520



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/InetUtils.java
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.utils;
+
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.Objects;
+import java.util.Optional;
+
+/**
+ * Utility functions around the Java InetAddress class.
+ */
+public class InetUtils {
+
+  /**
+   * @return name of current host
+   */
+  public static String hostname() {
+return hostname(Optional.empty());
+  }
+
+  /**
+   * @return name of current host
+   */
+  public static String hostname(Optional defaultValue) {

Review comment:
   @kgyrtkirk Thanks so much for looking at this and thanks for pointing 
out some silly mistakes on my part.  I must not have had my head on straight 
that day.
   
   I think we make a functional change as you have proposed in a stand-alone 
ticket.  Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604570)
Time Spent: 40m  (was: 0.5h)

> Unify InetAddress.getLocalHost()
> 
>
> Key: HIVE-24875
> URL: https://issues.apache.org/jira/browse/HIVE-24875
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>  Labels: newbie, noob, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Lots of calls in the Hive code to {{InetAddress.getLocalHost()}}.  This 
> should be standardized onto hive-common {{ServerUtils.hostname()}}, which 
> includes removing (deprecating) a similar method in {{HiveStringUtils}}.
> Open to anyone to improve.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604561
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 15:23
Start Date: 01/Jun/21 15:23
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643208790



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, 
org.apache.hadoop.hive.metastore.ap
 }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema 
schema, Properties properties,
   org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != 
null) {
+if 
(SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
   I'll create a new `IcebergSessionUtil` for providing some util methods, 
we can sync up with @lcspinter on the implementation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604561)
Time Spent: 2h 50m  (was: 2h 40m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25173) Fix build failure of hive-pre-upgrade due to missing dependency on pentaho-aggdesigner-algorithm

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25173?focusedWorklogId=604555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604555
 ]

ASF GitHub Bot logged work on HIVE-25173:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 15:12
Start Date: 01/Jun/21 15:12
Worklog Time Spent: 10m 
  Work Description: iwasakims commented on pull request #2326:
URL: https://github.com/apache/hive/pull/2326#issuecomment-852206365


   @kgyrtkirk I think precommit test failures are not related to the patch. 
Could you review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604555)
Time Spent: 0.5h  (was: 20m)

> Fix build failure of hive-pre-upgrade due to missing dependency on 
> pentaho-aggdesigner-algorithm
> 
>
> Key: HIVE-25173
> URL: https://issues.apache.org/jira/browse/HIVE-25173
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] Failed to execute goal on project hive-pre-upgrade: Could not resolve 
> dependencies for project org.apache.hive:hive-pre-upgrade:jar:4.0.0-SNAPSHOT: 
> Failure to find org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in 
> https://repo.maven.apache.org/maven2 was cached in the local repository, 
> resolution will not be reattempted until the update interval of central has 
> elapsed or updates are forced
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-06-01 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25161.
---
Resolution: Fixed

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=604524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604524
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 14:15
Start Date: 01/Jun/21 14:15
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2316:
URL: https://github.com/apache/hive/pull/2316


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604524)
Time Spent: 7.5h  (was: 7h 20m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604522
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 14:12
Start Date: 01/Jun/21 14:12
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643139820



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13455,6 +13469,18 @@ ASTNode analyzeCreateTable(
   }
 }
 
+if (partitionTransformSpecExists) {
+  try {
+HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, 
storageFormat.getStorageHandler());

Review comment:
   Can we make this check before, at line 13409? That way we could skip 
parsing the partition spec unnecessarily first and wouldn't need the boolean




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604522)
Time Spent: 2h 40m  (was: 2.5h)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604520
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 14:11
Start Date: 01/Jun/21 14:11
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2333:
URL: https://github.com/apache/hive/pull/2333#issuecomment-852158270


   Looks great @lcspinter! Just a few questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604520)
Time Spent: 2.5h  (was: 2h 20m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604519
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 14:10
Start Date: 01/Jun/21 14:10
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643139820



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13455,6 +13469,18 @@ ASTNode analyzeCreateTable(
   }
 }
 
+if (partitionTransformSpecExists) {
+  try {
+HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, 
storageFormat.getStorageHandler());

Review comment:
   Can we make this check before, at line 13409? That way we could skip 
parsing the partition spec unnecessarily first




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604519)
Time Spent: 2h 20m  (was: 2h 10m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604517
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 14:07
Start Date: 01/Jun/21 14:07
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643137193



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map TRANSFORMS = Stream
+  .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY 
},
+  { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { 
HiveParser.TOK_MONTH, TransformTypes.MONTH },
+  { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, 
TransformTypes.HOUR },
+  { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { 
HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+  .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) 
e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List getPartitionTransformSpec(ASTNode 
node) {
+List partSpecList = new ArrayList<>();
+for (int i = 0; i < node.getChildCount(); i++) {
+  PartitionTransformSpec spec = new PartitionTransformSpec();
+  ASTNode child = (ASTNode) node.getChild(i);
+  for (int j = 0; j < child.getChildCount(); j++) {
+ASTNode grandChild = (ASTNode) child.getChild(j);
+switch (grandChild.getToken().getType()) {
+case HiveParser.TOK_IDENTITY:

Review comment:
   nit: indentation missing after switch clause




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604517)
Time Spent: 2h 10m  (was: 2h)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604514
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 14:06
Start Date: 01/Jun/21 14:06
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643135880



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map TRANSFORMS = Stream
+  .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY 
},
+  { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { 
HiveParser.TOK_MONTH, TransformTypes.MONTH },
+  { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, 
TransformTypes.HOUR },
+  { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { 
HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+  .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) 
e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List getPartitionTransformSpec(ASTNode 
node) {
+List partSpecList = new ArrayList<>();
+for (int i = 0; i < node.getChildCount(); i++) {
+  PartitionTransformSpec spec = new PartitionTransformSpec();
+  ASTNode child = (ASTNode) node.getChild(i);
+  for (int j = 0; j < child.getChildCount(); j++) {
+ASTNode grandChild = (ASTNode) child.getChild(j);
+switch (grandChild.getToken().getType()) {
+case HiveParser.TOK_IDENTITY:
+case HiveParser.TOK_YEAR:
+case HiveParser.TOK_MONTH:
+case HiveParser.TOK_DAY:
+case HiveParser.TOK_HOUR:
+  spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+  break;
+case HiveParser.TOK_TRUNCATE:
+case HiveParser.TOK_BUCKET:
+  spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+  spec.transformParam = 
Integer.valueOf(grandChild.getChild(0).getText());
+  break;
+default:
+  spec.name = grandChild.getText();
+}
+  }
+  partSpecList.add(spec);
+}
+
+return partSpecList;
+  }
+
+  public enum TransformTypes {
+IDENTITY, YEAR, MONTH, DAY, HOUR, TRUNCATE, BUCKET
+  }
+
+  public static class PartitionTransformSpec {
+public String name;
+public TransformTypes transformType;
+public int transformParam;

Review comment:
   Should we make this Integer or Optional? This would have a 0 
value by default for those spec types too that don't support any params, which 
can be misleading. Although granted, 0 doesn't make much sense for either 
bucketing or truncate, but maybe semantically it would be a bit better to make 
this nullable/empty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604514)
Time Spent: 1h 50m  (was: 1h 40m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604516
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 14:06
Start Date: 01/Jun/21 14:06
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643135880



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map TRANSFORMS = Stream
+  .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY 
},
+  { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { 
HiveParser.TOK_MONTH, TransformTypes.MONTH },
+  { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, 
TransformTypes.HOUR },
+  { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { 
HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+  .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) 
e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List getPartitionTransformSpec(ASTNode 
node) {
+List partSpecList = new ArrayList<>();
+for (int i = 0; i < node.getChildCount(); i++) {
+  PartitionTransformSpec spec = new PartitionTransformSpec();
+  ASTNode child = (ASTNode) node.getChild(i);
+  for (int j = 0; j < child.getChildCount(); j++) {
+ASTNode grandChild = (ASTNode) child.getChild(j);
+switch (grandChild.getToken().getType()) {
+case HiveParser.TOK_IDENTITY:
+case HiveParser.TOK_YEAR:
+case HiveParser.TOK_MONTH:
+case HiveParser.TOK_DAY:
+case HiveParser.TOK_HOUR:
+  spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+  break;
+case HiveParser.TOK_TRUNCATE:
+case HiveParser.TOK_BUCKET:
+  spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+  spec.transformParam = 
Integer.valueOf(grandChild.getChild(0).getText());
+  break;
+default:
+  spec.name = grandChild.getText();
+}
+  }
+  partSpecList.add(spec);
+}
+
+return partSpecList;
+  }
+
+  public enum TransformTypes {
+IDENTITY, YEAR, MONTH, DAY, HOUR, TRUNCATE, BUCKET
+  }
+
+  public static class PartitionTransformSpec {
+public String name;
+public TransformTypes transformType;
+public int transformParam;

Review comment:
   Should we make this Integer or Optional< Integer >? This would have a 0 
value by default for those spec types too that don't support any params, which 
can be misleading. Although granted, 0 doesn't make much sense for either 
bucketing or truncate, but maybe semantically it would be a bit better to make 
this nullable/empty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604516)
Time Spent: 2h  (was: 1h 50m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels:

[jira] [Work logged] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24920?focusedWorklogId=604499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604499
 ]

ASF GitHub Bot logged work on HIVE-24920:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 13:50
Start Date: 01/Jun/21 13:50
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2191:
URL: https://github.com/apache/hive/pull/2191


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604499)
Time Spent: 1h 20m  (was: 1h 10m)

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2021-06-01 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24920.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Naveen for reviewing the changes!

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604452
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 11:44
Start Date: 01/Jun/21 11:44
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r643019982



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5394,6 +5410,476 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map statsPartInfoMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement preparedStatement = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+
+// Index is present on DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME. use 
that.
+// TODO : Need to add catalog name to the index
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"DB_NAME\" = ? AND "
++ "\"TABLE_NAME\" = ? AND \"COLUMN_NAME\" = ? AND 
\"PARTITION_NAME\" = ? "
++ "AND \"PART_ID\" = ?";
+
+try {
+  preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, 
delete, null);

Review comment:
   removed the use

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5394,6 +5410,476 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map statsPartInfoMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement preparedStatement = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+
+// Index is present on DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME. use 
that.
+// TODO : Need to add catalog name to the index
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"DB_NAME\" = ? AND "
++ "\"TABLE_NAME\" = ? AND \"COLUMN_NAME\" = ? AND 
\"PARTITION_NAME\" = ? "
++ "AND \"PART_ID\" = ?";
+
+try {
+  preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, 
delete, null);
+  for (Map.Entry entry : statsPartInfoMap.entrySet()) {
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+PartitionInfo partitionInfo = (PartitionInfo) entry.getKey();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  preparedStatement.setString(1, colStats.getStatsDesc().getDbName());
+  preparedStatement.setString(2, 
colStats.getStatsDesc().getTableName());
+  preparedStatement.setString(3, statisticsObj.getColName());
+  preparedStatement.setString(4, 
colStats.getStatsDesc().getPartName());
+  preparedStatement.setLong(5, partitionInfo.partitionId);
+  numRows++;
+  preparedStatement.addBatch();
+  if (numRows == maxNumRows) {
+preparedStatement.executeBatch();
+numRows = 0;
+LOG.debug("Executed delete " + delete + " for numRows " + numRows);
+  }
+}
+  }
+
+  if (numRows != 0) {
+preparedStatement.executeBatch();
+LOG.debug("Executed delete " + delete + " for numRows " + numRows);
+  }
+} finally {
+  closeStmt(preparedStatement);
+}
+  }
+
+  private void insertIntoPartColStatTable(Map 
partitionInfoMap,
+  long maxCsId,
+  Connection dbConn) throws 
SQLException, MetaException, NoSuchObjectException {
+PreparedStatement preparedStatement = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+String insert = "INSERT INTO \"PART_COL_STATS\" (\"CS_ID\", \"CAT_NAME\", 
\"DB_NAME\","
++ "\"TABLE_NAME\", \"PARTITION_NAME\", \"COLUMN_NAME\", 
\"COLUMN_TYPE\", \"PART_ID\","
++ " \"LONG_LOW_VALUE\", \"LONG_HIGH_VALUE\", 
\"DOUBLE_HIGH_VALUE\", \"DOUBLE_LOW_VALUE\","
++ " \"BIG_DECIMAL_LOW_VALUE\", \"BIG_DECIMAL_HIGH_VALUE\", 
\"NUM_NULLS\", \"NUM_DISTINCTS\", \"BIT_VECTOR\" ,"
++ " \"AVG_COL_LEN\", \"MAX_COL_LEN\", \"NUM_TRUES\", 
\"NUM_FALSES\", \"LAST_ANALYZED\", \"ENGINE\") values "
++ "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
?, ?)";
+
+try {
+  preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, 
insert, null);
+  for (Map.Entry entry :

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604437
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 11:15
Start Date: 01/Jun/21 11:15
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643008410



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, 
org.apache.hadoop.hive.metastore.ap
 }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema 
schema, Properties properties,
   org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != 
null) {
+if 
(SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
   + 1
   I'm working with the QueryState too on a different jira, and we would 
definitely benefit from some util methods to simplify these operations.

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, 
org.apache.hadoop.hive.metastore.ap
 }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema 
schema, Properties properties,
   org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != 
null) {
+if 
(SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
+1
   I'm working with the QueryState too on a different jira, and we would 
definitely benefit from some util methods to simplify these operations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604437)
Time Spent: 1h 40m  (was: 1.5h)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604432
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 10:55
Start Date: 01/Jun/21 10:55
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642996447



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -3010,3 +3010,5 @@ const string TABLE_BUCKETING_VERSION = 
"bucketing_version",
 const string DRUID_CONFIG_PREFIX = "druid.",
 const string JDBC_CONFIG_PREFIX = "hive.sql.",
 const string TABLE_IS_CTAS = "created_with_ctas",
+const string PARTITION_TRANSFER_SPEC = "partition_transfer_spec",

Review comment:
   how come this is partition_transfer_spec, unlike elsewhere?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604432)
Time Spent: 1.5h  (was: 1h 20m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604424
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 10:07
Start Date: 01/Jun/21 10:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642966076



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -155,6 +155,36 @@ public void after() throws Exception {
 HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+Schema schema = new Schema(
+optional(1, "id", Types.LongType.get()),
+optional(2, "year_field", Types.DateType.get()),
+optional(3, "month_field", Types.TimestampType.withZone()),
+optional(4, "day_field", Types.TimestampType.withoutZone()),
+optional(5, "hour_field", Types.TimestampType.withoutZone()),
+optional(6, "truncate_field", Types.StringType.get()),
+optional(7, "bucket_field", Types.StringType.get()),
+optional(8, "identity_field", Types.StringType.get())
+);
+PartitionSpec spec = 
PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+.hour("hour_field").truncate("truncate_field", 
2).bucket("bucket_field", 2)
+.identity("identity_field").build();
+String tableName = "part_test";
+
+TableIdentifier identifier = TableIdentifier.of("default", tableName);
+shell.executeStatement("CREATE EXTERNAL TABLE " + identifier +
+" PARTITIONED BY SPEC (year_field year, month_field month, day_field 
day, hour_field hour, " +
+"truncate_field truncate[2], bucket_field bucket[2], identity_field 
identity)" +
+" STORED BY '" + HiveIcebergStorageHandler.class.getName() + "' " +
+testTables.locationForCreateTableSQL(identifier) +
+"TBLPROPERTIES ('" + InputFormatConfig.TABLE_SCHEMA + "'='" +
+SchemaParser.toJson(schema) + "', " +
+"'" + InputFormatConfig.CATALOG_NAME + "'='" + 
Catalogs.ICEBERG_DEFAULT_CATALOG_NAME + "')");

Review comment:
   Do we need this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604424)
Time Spent: 1h 20m  (was: 1h 10m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604422
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 10:06
Start Date: 01/Jun/21 10:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642965021



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -155,6 +155,36 @@ public void after() throws Exception {
 HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+Schema schema = new Schema(
+optional(1, "id", Types.LongType.get()),
+optional(2, "year_field", Types.DateType.get()),
+optional(3, "month_field", Types.TimestampType.withZone()),
+optional(4, "day_field", Types.TimestampType.withoutZone()),
+optional(5, "hour_field", Types.TimestampType.withoutZone()),
+optional(6, "truncate_field", Types.StringType.get()),
+optional(7, "bucket_field", Types.StringType.get()),
+optional(8, "identity_field", Types.StringType.get())
+);
+PartitionSpec spec = 
PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+.hour("hour_field").truncate("truncate_field", 
2).bucket("bucket_field", 2)
+.identity("identity_field").build();
+String tableName = "part_test";

Review comment:
   Do we need this, or it is enough to keep the `identifier` only?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604422)
Time Spent: 1h 10m  (was: 1h)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604420
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 10:05
Start Date: 01/Jun/21 10:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642964505



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -155,6 +155,36 @@ public void after() throws Exception {
 HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+Schema schema = new Schema(
+optional(1, "id", Types.LongType.get()),
+optional(2, "year_field", Types.DateType.get()),
+optional(3, "month_field", Types.TimestampType.withZone()),
+optional(4, "day_field", Types.TimestampType.withoutZone()),
+optional(5, "hour_field", Types.TimestampType.withoutZone()),
+optional(6, "truncate_field", Types.StringType.get()),
+optional(7, "bucket_field", Types.StringType.get()),
+optional(8, "identity_field", Types.StringType.get())
+);
+PartitionSpec spec = 
PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")

Review comment:
   nit: newline before, and I think it is more readable if we break after 
every parameter, like:
   ```
   PartitionSpec.builderFor(schema)
   .year("year_field")
   .month("month_field")
   .day("day_field")
   .hour("hour_field")
   .truncate("truncate_field", 2)
   .bucket("bucket_field", 2)
   .identity("identity_field")
   .build();
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604420)
Time Spent: 1h  (was: 50m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604419
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 10:04
Start Date: 01/Jun/21 10:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642963610



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -155,6 +155,36 @@ public void after() throws Exception {
 HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+Schema schema = new Schema(
+optional(1, "id", Types.LongType.get()),
+optional(2, "year_field", Types.DateType.get()),
+optional(3, "month_field", Types.TimestampType.withZone()),
+optional(4, "day_field", Types.TimestampType.withoutZone()),
+optional(5, "hour_field", Types.TimestampType.withoutZone()),
+optional(6, "truncate_field", Types.StringType.get()),
+optional(7, "bucket_field", Types.StringType.get()),
+optional(8, "identity_field", Types.StringType.get())
+);
+PartitionSpec spec = 
PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+.hour("hour_field").truncate("truncate_field", 
2).bucket("bucket_field", 2)
+.identity("identity_field").build();
+String tableName = "part_test";
+
+TableIdentifier identifier = TableIdentifier.of("default", tableName);
+shell.executeStatement("CREATE EXTERNAL TABLE " + identifier +
+" PARTITIONED BY SPEC (year_field year, month_field month, day_field 
day, hour_field hour, " +
+"truncate_field truncate[2], bucket_field bucket[2], identity_field 
identity)" +
+" STORED BY '" + HiveIcebergStorageHandler.class.getName() + "' " +

Review comment:
   We can use `STORED BY ICEBERG`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604419)
Time Spent: 50m  (was: 40m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604417
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 09:58
Start Date: 01/Jun/21 09:58
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642960021



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, 
org.apache.hadoop.hive.metastore.ap
 }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema 
schema, Properties properties,
   org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != 
null) {
+if 
(SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
   Maybe a util class to get Icebeerg objects from the `SessionState`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604417)
Time Spent: 40m  (was: 0.5h)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604414
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 09:54
Start Date: 01/Jun/21 09:54
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642953396



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map TRANSFORMS = Stream
+  .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY 
},
+  { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { 
HiveParser.TOK_MONTH, TransformTypes.MONTH },
+  { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, 
TransformTypes.HOUR },
+  { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { 
HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+  .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) 
e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List getPartitionTransformSpec(ASTNode 
node) {
+List partSpecList = new ArrayList<>();
+for (int i = 0; i < node.getChildCount(); i++) {
+  PartitionTransformSpec spec = new PartitionTransformSpec();
+  ASTNode child = (ASTNode) node.getChild(i);
+  for (int j = 0; j < child.getChildCount(); j++) {
+ASTNode grandChild = (ASTNode) child.getChild(j);
+switch (grandChild.getToken().getType()) {
+case HiveParser.TOK_IDENTITY:
+case HiveParser.TOK_YEAR:
+case HiveParser.TOK_MONTH:
+case HiveParser.TOK_DAY:
+case HiveParser.TOK_HOUR:
+  spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+  break;
+case HiveParser.TOK_TRUNCATE:
+case HiveParser.TOK_BUCKET:
+  spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+  spec.transformParam = 
Integer.valueOf(grandChild.getChild(0).getText());
+  break;
+default:
+  spec.name = grandChild.getText();
+}
+  }
+  partSpecList.add(spec);
+}
+
+return partSpecList;
+  }
+
+  public enum TransformTypes {

Review comment:
   I think it would be better to use the singular format: TransformType




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604414)
Time Spent: 0.5h  (was: 20m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604361
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 08:03
Start Date: 01/Jun/21 08:03
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r642873937



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -214,23 +219,28 @@ private void stopWorkers() {
   throws MetaException, NoSuchTxnException, NoSuchObjectException {
 if (isAnalyzeTableInProgress(fullTableName)) return null;
 String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = 
fullTableName.getTable();
+String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf);
+if (!isDbTargetOfReplication.containsKey(dbName) || 
!isDbBeingFailedOver.containsKey(dbName)) {
+  Database database = rs.getDatabase(cat, db);
+  isDbTargetOfReplication.put(dbName, 
ReplUtils.isTargetOfReplication(database));
+  isDbBeingFailedOver.put(dbName, 
MetaStoreUtils.isDbBeingFailedOver(database));

Review comment:
   Why do we need two separate maps, we don;t need the reason for skip, 
just tracking what to skip is fine no?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604361)
Time Spent: 1h 40m  (was: 1.5h)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604359
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 07:55
Start Date: 01/Jun/21 07:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r642867971



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5394,6 +5410,476 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map statsPartInfoMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement preparedStatement = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+
+// Index is present on DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME. use 
that.
+// TODO : Need to add catalog name to the index
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"DB_NAME\" = ? AND "
++ "\"TABLE_NAME\" = ? AND \"COLUMN_NAME\" = ? AND 
\"PARTITION_NAME\" = ? "
++ "AND \"PART_ID\" = ?";
+
+try {
+  preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, 
delete, null);
+  for (Map.Entry entry : statsPartInfoMap.entrySet()) {
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+PartitionInfo partitionInfo = (PartitionInfo) entry.getKey();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  preparedStatement.setString(1, colStats.getStatsDesc().getDbName());
+  preparedStatement.setString(2, 
colStats.getStatsDesc().getTableName());
+  preparedStatement.setString(3, statisticsObj.getColName());
+  preparedStatement.setString(4, 
colStats.getStatsDesc().getPartName());
+  preparedStatement.setLong(5, partitionInfo.partitionId);
+  numRows++;
+  preparedStatement.addBatch();
+  if (numRows == maxNumRows) {
+preparedStatement.executeBatch();
+numRows = 0;
+LOG.debug("Executed delete " + delete + " for numRows " + numRows);
+  }
+}
+  }
+
+  if (numRows != 0) {
+preparedStatement.executeBatch();
+LOG.debug("Executed delete " + delete + " for numRows " + numRows);
+  }
+} finally {
+  closeStmt(preparedStatement);
+}
+  }
+
+  private void insertIntoPartColStatTable(Map 
partitionInfoMap,
+  long maxCsId,
+  Connection dbConn) throws 
SQLException, MetaException, NoSuchObjectException {
+PreparedStatement preparedStatement = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+String insert = "INSERT INTO \"PART_COL_STATS\" (\"CS_ID\", \"CAT_NAME\", 
\"DB_NAME\","
++ "\"TABLE_NAME\", \"PARTITION_NAME\", \"COLUMN_NAME\", 
\"COLUMN_TYPE\", \"PART_ID\","
++ " \"LONG_LOW_VALUE\", \"LONG_HIGH_VALUE\", 
\"DOUBLE_HIGH_VALUE\", \"DOUBLE_LOW_VALUE\","
++ " \"BIG_DECIMAL_LOW_VALUE\", \"BIG_DECIMAL_HIGH_VALUE\", 
\"NUM_NULLS\", \"NUM_DISTINCTS\", \"BIT_VECTOR\" ,"
++ " \"AVG_COL_LEN\", \"MAX_COL_LEN\", \"NUM_TRUES\", 
\"NUM_FALSES\", \"LAST_ANALYZED\", \"ENGINE\") values "
++ "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
?, ?)";
+
+try {
+  preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, 
insert, null);
+  for (Map.Entry entry : partitionInfoMap.entrySet()) {
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+PartitionInfo partitionInfo = (PartitionInfo)entry.getKey();
+ColumnStatisticsDesc statsDesc = colStats.getStatsDesc();
+long partId = partitionInfo.partitionId;
+
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  MPartitionColumnStatistics mPartitionColumnStatistics = 
StatObjectConverter.
+  convertToMPartitionColumnStatistics(null, statsDesc, 
statisticsObj, colStats.getEngine());
+
+  preparedStatement.setLong(1, maxCsId);
+  preparedStatement.setString(2, 
mPartitionColumnStatistics.getCatName());
+  preparedStatement.setString(3, 
mPartitionColumnStatistics.getDbName());
+  preparedStatement.setString(4, 
mPartitionColumnStatistics.getTableName());
+  preparedStatement.setString(5, 
mPartitionColumnStatistics.getPartitionName());
+  preparedStatement.setString(6, 
mPartitionColumnStatistics.getColName());
+

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=604358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604358
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 07:54
Start Date: 01/Jun/21 07:54
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r642867437



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -84,6 +86,9 @@
   private ConcurrentHashMap partsInProgress = new 
ConcurrentHashMap<>();
   private AtomicInteger itemsInProgress = new AtomicInteger(0);
 
+  Map isDbTargetOfReplication = new HashMap<>();
+  Map isDbBeingFailedOver = new HashMap<>();

Review comment:
   Why do you need it at instance level?
   When is the map getting cleaned?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604358)
Time Spent: 1.5h  (was: 1h 20m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604357
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 07:51
Start Date: 01/Jun/21 07:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r642865407



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5394,6 +5410,476 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map statsPartInfoMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement preparedStatement = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+
+// Index is present on DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME. use 
that.
+// TODO : Need to add catalog name to the index
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"DB_NAME\" = ? AND "
++ "\"TABLE_NAME\" = ? AND \"COLUMN_NAME\" = ? AND 
\"PARTITION_NAME\" = ? "
++ "AND \"PART_ID\" = ?";
+
+try {
+  preparedStatement = sqlGenerator.prepareStmtWithParameters(dbConn, 
delete, null);
+  for (Map.Entry entry : statsPartInfoMap.entrySet()) {
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+PartitionInfo partitionInfo = (PartitionInfo) entry.getKey();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  preparedStatement.setString(1, colStats.getStatsDesc().getDbName());
+  preparedStatement.setString(2, 
colStats.getStatsDesc().getTableName());
+  preparedStatement.setString(3, statisticsObj.getColName());
+  preparedStatement.setString(4, 
colStats.getStatsDesc().getPartName());
+  preparedStatement.setLong(5, partitionInfo.partitionId);
+  numRows++;
+  preparedStatement.addBatch();
+  if (numRows == maxNumRows) {
+preparedStatement.executeBatch();
+numRows = 0;
+LOG.debug("Executed delete " + delete + " for numRows " + numRows);
+  }
+}
+  }
+
+  if (numRows != 0) {

Review comment:
   it's more for readablility




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604357)
Time Spent: 8h 20m  (was: 8h 10m)

> Reduce overhead of partition column stats updation.
> ---
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604355
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 07:49
Start Date: 01/Jun/21 07:49
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r642863679



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -8994,10 +9028,15 @@ public boolean 
set_aggr_stats_for(SetPartitionsStatsRequest request) throws TExc
 colNames, newStatsMap, request);
   } else { // No merge.
 Table t = getTable(catName, dbName, tableName);
-for (Map.Entry entry : 
newStatsMap.entrySet()) {
-  // We don't short-circuit on errors here anymore. That can leave 
acid stats invalid.
-  ret = updatePartitonColStatsInternal(t, entry.getValue(),
-  request.getValidWriteIdList(), request.getWriteId()) && ret;
+// We don't short-circuit on errors here anymore. That can leave acid 
stats invalid.
+if (newStatsMap.size() > 1) {
+  LOG.info("ETL_PERF started updatePartitionColStatsInBatch");
+  ret = updatePartitionColStatsInBatch(t, newStatsMap,
+  request.getValidWriteIdList(), request.getWriteId());
+  LOG.info("ETL_PERF done updatePartitionColStatsInBatch");
+} else {
+  ret = updatePartitonColStatsInternal(t, 
newStatsMap.values().iterator().next(),

Review comment:
   in future we would need to support both implementations, why can't we 
generalize? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604355)
Time Spent: 8h 10m  (was: 8h)

> Reduce overhead of partition column stats updation.
> ---
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24663) Reduce overhead of partition column stats updation.

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=604354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604354
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 07:47
Start Date: 01/Jun/21 07:47
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #2266:
URL: https://github.com/apache/hive/pull/2266#issuecomment-851905628


   > > NOTE: I don't think TxnHandler is a good place for this kind of 
functionality. TxnHandler is responsible for managing txn metadata ONLY!! 
Wouldn't MetaStoreDirectSql be more appropriate here? @pvary what do you think?
   > 
   > The MetaStoreDirectSql does not have framework to handle adding 
notification logs within same transaction. That can cause issues in replication 
if the notification logs addition fails because of some reason. The 
MetaStoreDirectSql batch insert/update also not very performant.
   
   What I meant here is that TxnHandler shouldn't have colStats update logic, 
it's responsible for other functionality like Hive TXN management. 
HiveMetaStore can handle notification logs, what about it? Or if we can't find 
close by functionality service - we should create a new one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604354)
Time Spent: 8h  (was: 7h 50m)

> Reduce overhead of partition column stats updation.
> ---
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24944?focusedWorklogId=604353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604353
 ]

ASF GitHub Bot logged work on HIVE-24944:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 07:37
Start Date: 01/Jun/21 07:37
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2204:
URL: https://github.com/apache/hive/pull/2204#issuecomment-851899226


   Hey @thejasmn, @kgyrtkirk could you please take another look if have secs? 
   thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604353)
Time Spent: 0.5h  (was: 20m)

> When the default engine of the hiveserver is MR and the tez engine is set by 
> the client, the client TEZ progress log cannot be printed normally
> ---
>
> Key: HIVE-24944
> URL: https://issues.apache.org/jira/browse/HIVE-24944
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: ZhangQiDong
>Assignee: ZhangQiDong
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24944.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HiveServer configuration parameter execution default MR. When set 
> hive.execution.engine = tez, the client cannot print the TEZ log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25166) Query with multiple count(distinct constant) fails

2021-06-01 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25166:
--
Status: Patch Available  (was: In Progress)

> Query with multiple count(distinct constant) fails
> --
>
> Key: HIVE-25166
> URL: https://issues.apache.org/jira/browse/HIVE-25166
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> select count(distinct 0), count(distinct null) from alltypes;
> {code}
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not 
> in GROUP BY key 'TOK_NULL'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>   at

[jira] [Work logged] (HIVE-25166) Query with multiple count(distinct constant) fails

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25166?focusedWorklogId=604330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604330
 ]

ASF GitHub Bot logged work on HIVE-25166:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 06:29
Start Date: 01/Jun/21 06:29
Worklog Time Spent: 10m 
  Work Description: kasakrisz closed pull request #2325:
URL: https://github.com/apache/hive/pull/2325


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604330)
Time Spent: 0.5h  (was: 20m)

> Query with multiple count(distinct constant) fails
> --
>
> Key: HIVE-25166
> URL: https://issues.apache.org/jira/browse/HIVE-25166
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> select count(distinct 0), count(distinct null) from alltypes;
> {code}
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not 
> in GROUP BY key 'TOK_NULL'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
>

[jira] [Work logged] (HIVE-25166) Query with multiple count(distinct constant) fails

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25166?focusedWorklogId=604331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604331
 ]

ASF GitHub Bot logged work on HIVE-25166:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 06:29
Start Date: 01/Jun/21 06:29
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on pull request #2325:
URL: https://github.com/apache/hive/pull/2325#issuecomment-851857576


   This is not the right approach now because it adds and extra column to 
shuffle which is not always necessary. 
   Closing this. See #2334 for fix the original issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604331)
Time Spent: 40m  (was: 0.5h)

> Query with multiple count(distinct constant) fails
> --
>
> Key: HIVE-25166
> URL: https://issues.apache.org/jira/browse/HIVE-25166
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> select count(distinct 0), count(distinct null) from alltypes;
> {code}
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not 
> in GROUP BY key 'TOK_NULL'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at

[jira] [Commented] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

2021-06-01 Thread Wei Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354841#comment-17354841
 ] 

Wei Zhang commented on HIVE-25170:
--

I've fixated an patch for this issue in github. Can somebody help review the 
code? Thanks!

> Data error in constant propagation caused by wrong colExprMap generated in 
> SemanticAnalyzer
> ---
>
> Key: HIVE-25170
> URL: https://issues.apache.org/jira/browse/HIVE-25170
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.2
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> SET hive.remove.orderby.in.subquery=false;
> EXPLAIN
> SELECT constant_col, key, max(value)
> FROM
> (
>   SELECT 'constant' as constant_col, key, value
>   FROM src
>   DISTRIBUTE BY constant_col, key
>   SORT BY constant_col, key, value
> ) a
> GROUP BY constant_col, key
> LIMIT 10;
> OK
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
>   Fetch Operator
> limit:10
> Stage-1
>   Reducer 3
>   File Output Operator [FS_10]
> Limit [LIM_9] (rows=1 width=368)
>   Number of rows:10
>   Select Operator [SEL_8] (rows=1 width=368)
> Output:["_col0","_col1","_col2"]
> Group By Operator [GBY_7] (rows=1 width=368)
>   
> Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant',
>  'constant'
> <-Reducer 2 [SIMPLE_EDGE]
>   SHUFFLE [RS_6]
> PartitionCols:'constant', 'constant'
> Group By Operator [GBY_5] (rows=1 width=368)
>   
> Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 
> 'constant'
>   Select Operator [SEL_3] (rows=500 width=178)
> Output:["_col2"]
>   <-Map 1 [SIMPLE_EDGE]
> SHUFFLE [RS_2]
>   PartitionCols:'constant', _col1
>   Select Operator [SEL_1] (rows=500 width=178)
> Output:["_col1","_col2"]
> TableScan [TS_0] (rows=500 width=10)
>   
> src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
> Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 
> 'constant', it should be 'constant', _col1
>  
> That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate 
> the colExprMap structure in the key part, while the key columns are generated 
> by newSortCols, leading to a column and expr mismatch when the constant 
> column is not the trailing column in the key columns.
> Constant propagation optimizer uses this colExprMap and finds extra const 
> expression in the mismatched map, resulting in this error.
>  
> In fact, colExprMap is used by multiple optimizers, which makes this quite a 
> serious problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

2021-06-01 Thread Wei Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-25170:
-
Status: Patch Available  (was: Open)

> Data error in constant propagation caused by wrong colExprMap generated in 
> SemanticAnalyzer
> ---
>
> Key: HIVE-25170
> URL: https://issues.apache.org/jira/browse/HIVE-25170
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.2
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> SET hive.remove.orderby.in.subquery=false;
> EXPLAIN
> SELECT constant_col, key, max(value)
> FROM
> (
>   SELECT 'constant' as constant_col, key, value
>   FROM src
>   DISTRIBUTE BY constant_col, key
>   SORT BY constant_col, key, value
> ) a
> GROUP BY constant_col, key
> LIMIT 10;
> OK
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
>   Fetch Operator
> limit:10
> Stage-1
>   Reducer 3
>   File Output Operator [FS_10]
> Limit [LIM_9] (rows=1 width=368)
>   Number of rows:10
>   Select Operator [SEL_8] (rows=1 width=368)
> Output:["_col0","_col1","_col2"]
> Group By Operator [GBY_7] (rows=1 width=368)
>   
> Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant',
>  'constant'
> <-Reducer 2 [SIMPLE_EDGE]
>   SHUFFLE [RS_6]
> PartitionCols:'constant', 'constant'
> Group By Operator [GBY_5] (rows=1 width=368)
>   
> Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 
> 'constant'
>   Select Operator [SEL_3] (rows=500 width=178)
> Output:["_col2"]
>   <-Map 1 [SIMPLE_EDGE]
> SHUFFLE [RS_2]
>   PartitionCols:'constant', _col1
>   Select Operator [SEL_1] (rows=500 width=178)
> Output:["_col1","_col2"]
> TableScan [TS_0] (rows=500 width=10)
>   
> src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
> Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 
> 'constant', it should be 'constant', _col1
>  
> That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate 
> the colExprMap structure in the key part, while the key columns are generated 
> by newSortCols, leading to a column and expr mismatch when the constant 
> column is not the trailing column in the key columns.
> Constant propagation optimizer uses this colExprMap and finds extra const 
> expression in the mismatched map, resulting in this error.
>  
> In fact, colExprMap is used by multiple optimizers, which makes this quite a 
> serious problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25179?focusedWorklogId=604323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-604323
 ]

ASF GitHub Bot logged work on HIVE-25179:
-

Author: ASF GitHub Bot
Created on: 01/Jun/21 06:09
Start Date: 01/Jun/21 06:09
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #2333:
URL: https://github.com/apache/hive/pull/2333#issuecomment-851846758


   @jcamachor @zabetak @marton-bod @pvary @szlta Could you please review this 
PR? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 604323)
Time Spent: 20m  (was: 10m)

> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC(
> year_field year,
> month_field month,
> day_field day,
> hour_field hour,
> truncate_field truncate[3],
> bucket_field bucket[5],
> identity_field identity
> ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

82 matches

Mail list logo