date:20210421

[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=587062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-587062
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 22/Apr/21 05:37
Start Date: 22/Apr/21 05:37
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r618093086



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -194,6 +210,54 @@ public boolean canProvideBasicStatistics() {
 return stats;
   }
 
+  public boolean 
addDynamicSplitPruningEdge(org.apache.hadoop.hive.ql.metadata.Table table,
+  ExprNodeDesc syntheticFilterPredicate) {
+try {
+  Collection partitionColumns = ((HiveIcebergSerDe) 
table.getDeserializer()).partitionColumns();
+  if (partitionColumns.size() > 0) {
+// Collect the column names from the predicate
+Set filterColumns = Sets.newHashSet();
+columns(syntheticFilterPredicate, filterColumns);
+
+// While Iceberg could handle multiple columns the current pruning 
only able to handle filters for a
+// single column. We keep the logic below to handle multiple columns 
so if pruning is available on executor
+// side the we can easily adapt to it as well.
+if (filterColumns.size() > 1) {

Review comment:
   We collect every column name in the filterColumns set through the 
columns() method. That method is traversing every node recursively, so it might 
be time-consuming.  After that, the size of the set is validated, and if it's 
greater than 1, return false. 
   Can we introduce some logic, to fail fast, without the need of traversing 
every node? I'm just thinking aloud, I don't know whether it is feasible or not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 587062)
Time Spent: 5h 10m  (was: 5h)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-25010.
--
Resolution: Fixed

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=587056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-587056
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 22/Apr/21 05:12
Start Date: 22/Apr/21 05:12
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #2193:
URL: https://github.com/apache/hive/pull/2193#issuecomment-824543176


   Thanks, @marton-bod and @pvary for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 587056)
Time Spent: 3h 50m  (was: 3h 40m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=587055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-587055
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 22/Apr/21 05:11
Start Date: 22/Apr/21 05:11
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2193:
URL: https://github.com/apache/hive/pull/2193


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 587055)
Time Spent: 3h 40m  (was: 3.5h)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24596) Explain ddl for debugging

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24596?focusedWorklogId=587026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-587026
 ]

ASF GitHub Bot logged work on HIVE-24596:
-

Author: ASF GitHub Bot
Created on: 22/Apr/21 03:29
Start Date: 22/Apr/21 03:29
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #2033:
URL: https://github.com/apache/hive/pull/2033#discussion_r618054976



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLPlanUtils.java
##
@@ -0,0 +1,1019 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Sets;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.common.StatsSetupConst;
+import org.apache.hadoop.hive.metastore.TableType;
+import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.BinaryColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.BooleanColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.DateColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.DecimalColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.DoubleColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Order;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.SkewedInfo;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.StringColumnStatsData;
+import org.apache.hadoop.hive.ql.ddl.ShowUtils;
+import org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation;
+import org.apache.hadoop.hive.ql.metadata.CheckConstraint;
+import org.apache.hadoop.hive.ql.metadata.CheckConstraint.CheckConstraintCol;
+import org.apache.hadoop.hive.ql.metadata.DefaultConstraint;
+import 
org.apache.hadoop.hive.ql.metadata.DefaultConstraint.DefaultConstraintCol;
+import org.apache.hadoop.hive.ql.metadata.ForeignKeyInfo;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.NotNullConstraint;
+import org.apache.hadoop.hive.ql.metadata.Partition;
+import org.apache.hadoop.hive.ql.metadata.PrimaryKeyInfo;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.metadata.UniqueConstraint;
+import org.apache.hadoop.hive.ql.util.DirectionUtils;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.hive.serde2.typeinfo.UnionTypeInfo;
+import org.apache.hive.common.util.HiveStringUtils;
+import org.stringtemplate.v4.ST;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Base64;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Set;
+import java.util.SortedMap;
+import java.util.TreeMap;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_STORAGE;
+
+public class DDLPlanUtils {
+  private static final String EXTERNAL = "external";
+  private static final String TEMPORARY = "temporary";
+  private static final String LIST_COLUMNS = "columns";
+  private static final String COMMENT = "comment";
+  private static final String PARTITIONS = "partitions";
+  private static final String

[jira] [Work logged] (HIVE-24933) Replication fails for transactional tables having same name as dropped non-transactional table

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24933?focusedWorklogId=587022=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-587022
 ]

ASF GitHub Bot logged work on HIVE-24933:
-

Author: ASF GitHub Bot
Created on: 22/Apr/21 03:16
Start Date: 22/Apr/21 03:16
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2151:
URL: https://github.com/apache/hive/pull/2151#discussion_r618048430



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
##
@@ -1406,4 +1439,114 @@ public static Table tableIfExists(ImportTableDesc 
tblDesc, Hive db) throws HiveE
 }
   }
 
+  public static class LoadTableStateWrapper {

Review comment:
   Why are both LoadTableStateWrapper and DelayExecUtil needed? They are 
almost alike

##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/CopyWork.java
##
@@ -136,4 +140,18 @@ public boolean isOverwrite() {
   public void setOverwrite(boolean overwrite) {
 this.overwrite = overwrite;
   }
+
+  public void setLoadTableStateWrapper(LoadTableStateWrapper 
loadTableStateWrapper) {
+this.loadTableStateWrapper = loadTableStateWrapper;
+  }
+
+  public void setValuesBeforeExec() throws HiveException {
+if (loadTableStateWrapper == null) {
+  return;
+}
+
+Table table = loadTableStateWrapper.getTableIfExists();

Review comment:
   CopyWork was generic in nature. setValuesBeforeExec is tying it to 
resolve the path only from table. All we need is to be able to resolve the path 
dynamically. A resolver interface should be prefered. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 587022)
Time Spent: 40m  (was: 0.5h)

> Replication fails for transactional tables having same name as dropped 
> non-transactional table
> --
>
> Key: HIVE-24933
> URL: https://issues.apache.org/jira/browse/HIVE-24933
> Project: Hive
>  Issue Type: Bug
>Reporter: Pratyushotpal Madhukar
>Assignee: Pratyushotpal Madhukar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24772) Revamp Server Request Error Logging

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24772?focusedWorklogId=586938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586938
 ]

ASF GitHub Bot logged work on HIVE-24772:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 22:17
Start Date: 21/Apr/21 22:17
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1974:
URL: https://github.com/apache/hive/pull/1974#issuecomment-824393358


   @zchovan @aihuaxu  Can you please take another look?  Got the tests passing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586938)
Time Spent: 2h 40m  (was: 2.5h)

> Revamp Server Request Error Logging
> ---
>
> Key: HIVE-24772
> URL: https://issues.apache.org/jira/browse/HIVE-24772
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Most of the action takes place in {{ThriftCLIService}} where errors are 
> logged in response to client requests (though I know in many instances things 
> are logged multiple times).
> I propose to improve this on multiple fronts:
> # Many log messages have the word "Error" in it, but log at the WARN level.  
> I have changed all relevant logging to be at ERROR level and removed the word 
> "Error" from the message
> # Some of the error message in the logging code had copy & paste errors where 
> they printed the wrong request name
> # Print the actual request object in the error message
> # Big one for me: Do not pass a stack trace to the client.  This is bad 
> practice from a security perspective,... clients should not know that level 
> of detail of the server, and also it's very confusing for the client 
> perspective to understand that the stack trace is actually from the remote 
> server, not the local client, and finally, it's too messy for a typical user 
> to deal with anyway.  Stack trace should be presented in the HS2 logs only.
> # Various clean up
> # Log an IP address for the client as part of standard operating procedure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25040?focusedWorklogId=586774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586774
 ]

ASF GitHub Bot logged work on HIVE-25040:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 17:51
Start Date: 21/Apr/21 17:51
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2200:
URL: https://github.com/apache/hive/pull/2200#discussion_r617760731



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -553,21 +553,19 @@ private void addFunction(String functionName, 
FunctionInfo function) {
 Integer refCount = persistent.get(functionClass);
 persistent.put(functionClass, Integer.valueOf(refCount == null ? 1 : 
refCount + 1));
   }
+} catch (ClassNotFoundException e) {

Review comment:
   got it, just realized its been called from here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586774)
Time Spent: 40m  (was: 0.5h)

> Drop database cascade cannot remove persistent functions
> 
>
> Key: HIVE-25040
> URL: https://issues.apache.org/jira/browse/HIVE-25040
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add a persistent custom function to a database using a Jar file: CREATE 
> FUNCTION myfunction USING JAR 'x.jar';
> Restart the session and immediately issue DROP DATABASE mydb CASCADE. It 
> throws ClassNotFoundException:
> {code:java}
> java.lang.ClassNotFoundException: DummyUDF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
>   at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
>

[jira] [Work logged] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25040?focusedWorklogId=586770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586770
 ]

ASF GitHub Bot logged work on HIVE-25040:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 17:48
Start Date: 21/Apr/21 17:48
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2200:
URL: https://github.com/apache/hive/pull/2200#discussion_r617758895



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -553,21 +553,19 @@ private void addFunction(String functionName, 
FunctionInfo function) {
 Integer refCount = persistent.get(functionClass);
 persistent.put(functionClass, Integer.valueOf(refCount == null ? 1 : 
refCount + 1));
   }
+} catch (ClassNotFoundException e) {

Review comment:
   I see, but this is not part of the Drop cascade issue right?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586770)
Time Spent: 0.5h  (was: 20m)

> Drop database cascade cannot remove persistent functions
> 
>
> Key: HIVE-25040
> URL: https://issues.apache.org/jira/browse/HIVE-25040
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add a persistent custom function to a database using a Jar file: CREATE 
> FUNCTION myfunction USING JAR 'x.jar';
> Restart the session and immediately issue DROP DATABASE mydb CASCADE. It 
> throws ClassNotFoundException:
> {code:java}
> java.lang.ClassNotFoundException: DummyUDF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
>   at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
>

[jira] [Work logged] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25040?focusedWorklogId=586766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586766
 ]

ASF GitHub Bot logged work on HIVE-25040:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 17:45
Start Date: 21/Apr/21 17:45
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #2200:
URL: https://github.com/apache/hive/pull/2200#discussion_r617756383



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -553,21 +553,19 @@ private void addFunction(String functionName, 
FunctionInfo function) {
 Integer refCount = persistent.get(functionClass);
 persistent.put(functionClass, Integer.valueOf(refCount == null ? 1 : 
refCount + 1));
   }
+} catch (ClassNotFoundException e) {

Review comment:
   getPermanentUdfClass may throw ClassNotFoundException




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586766)
Time Spent: 20m  (was: 10m)

> Drop database cascade cannot remove persistent functions
> 
>
> Key: HIVE-25040
> URL: https://issues.apache.org/jira/browse/HIVE-25040
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add a persistent custom function to a database using a Jar file: CREATE 
> FUNCTION myfunction USING JAR 'x.jar';
> Restart the session and immediately issue DROP DATABASE mydb CASCADE. It 
> throws ClassNotFoundException:
> {code:java}
> java.lang.ClassNotFoundException: DummyUDF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
>   at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
>

[jira] [Updated] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25040:
--
Labels: pull-request-available  (was: )

> Drop database cascade cannot remove persistent functions
> 
>
> Key: HIVE-25040
> URL: https://issues.apache.org/jira/browse/HIVE-25040
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a persistent custom function to a database using a Jar file: CREATE 
> FUNCTION myfunction USING JAR 'x.jar';
> Restart the session and immediately issue DROP DATABASE mydb CASCADE. It 
> throws ClassNotFoundException:
> {code:java}
> java.lang.ClassNotFoundException: DummyUDF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
>   at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_282]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  ~[hadoop-common-3.1.1.7.2.10.0-36.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_282]
>   at

[jira] [Work logged] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25040?focusedWorklogId=586746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586746
 ]

ASF GitHub Bot logged work on HIVE-25040:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 17:08
Start Date: 21/Apr/21 17:08
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2200:
URL: https://github.com/apache/hive/pull/2200#discussion_r617731217



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -553,21 +553,19 @@ private void addFunction(String functionName, 
FunctionInfo function) {
 Integer refCount = persistent.get(functionClass);
 persistent.put(functionClass, Integer.valueOf(refCount == null ? 1 : 
refCount + 1));
   }
+} catch (ClassNotFoundException e) {

Review comment:
   The issues' root should be removePersistentFunctionUnderLock when trying 
to get per UDFClass below --  not sure why this is needed here as part of 
addFunction?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586746)
Remaining Estimate: 0h
Time Spent: 10m

> Drop database cascade cannot remove persistent functions
> 
>
> Key: HIVE-25040
> URL: https://issues.apache.org/jira/browse/HIVE-25040
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a persistent custom function to a database using a Jar file: CREATE 
> FUNCTION myfunction USING JAR 'x.jar';
> Restart the session and immediately issue DROP DATABASE mydb CASCADE. It 
> throws ClassNotFoundException:
> {code:java}
> java.lang.ClassNotFoundException: DummyUDF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
>   at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
>

[jira] [Updated] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman updated HIVE-25040:

Description: 
Add a persistent custom function to a database using a Jar file: CREATE 
FUNCTION myfunction USING JAR 'x.jar';

Restart the session and immediately issue DROP DATABASE mydb CASCADE. It throws 
ClassNotFoundException:
{code:java}
java.lang.ClassNotFoundException: DummyUDF
at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
~[?:1.8.0_282]
at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
at 
org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
 ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
 ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
 ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
 ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
 ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
 ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_282]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_282]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
 ~[hadoop-common-3.1.1.7.2.10.0-36.jar:?]
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
 ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_282]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_282]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_282]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_282]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_282]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]

{code}
 

Since new session did not use the custom udf before trying to

[jira] [Work logged] (HIVE-23456) Upgrade Calcite version to 1.25.0

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23456?focusedWorklogId=586710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586710
 ]

ASF GitHub Bot logged work on HIVE-23456:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 16:17
Start Date: 21/Apr/21 16:17
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #2203:
URL: https://github.com/apache/hive/pull/2203


   Work in progress, do not review
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586710)
Time Spent: 0.5h  (was: 20m)

> Upgrade Calcite version to 1.25.0
> -
>
> Key: HIVE-23456
> URL: https://issues.apache.org/jira/browse/HIVE-23456
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23456.01.patch, HIVE-23456.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-04-21 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24537:
-

Assignee: Panagiotis Garefalakis

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-04-21 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24537:
--
Parent: HIVE-24913
Issue Type: Sub-task  (was: Improvement)

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=586659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586659
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 15:16
Start Date: 21/Apr/21 15:16
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r617644403



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -275,4 +339,74 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
 // this is an exception to the interface documentation, but it's a safe 
operation to add this property
 props.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
   }
+
+  /**
+   * Recursively collects the column names from the predicate.
+   * @param node The node we are traversing
+   * @param columns The already collected column names
+   */
+  private void columns(ExprNodeDesc node, Set columns) {
+if (node instanceof ExprNodeColumnDesc) {
+  columns.add(((ExprNodeColumnDesc) node).getColumn());
+} else {
+  List children = node.getChildren();
+  if (children != null && !children.isEmpty()) {

Review comment:
   Good point!
   Thanks, for pointing out




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586659)
Time Spent: 5h  (was: 4h 50m)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=586657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586657
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 15:15
Start Date: 21/Apr/21 15:15
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r617643973



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -194,6 +210,54 @@ public boolean canProvideBasicStatistics() {
 return stats;
   }
 
+  public boolean 
addDynamicSplitPruningEdge(org.apache.hadoop.hive.ql.metadata.Table table,
+  ExprNodeDesc syntheticFilterPredicate) {
+try {
+  Collection partitionColumns = ((HiveIcebergSerDe) 
table.getDeserializer()).partitionColumns();
+  if (partitionColumns.size() > 0) {
+// Collect the column names from the predicate
+Set filterColumns = Sets.newHashSet();
+columns(syntheticFilterPredicate, filterColumns);
+
+// While Iceberg could handle multiple columns the current pruning 
only able to handle filters for a
+// single column. We keep the logic below to handle multiple columns 
so if pruning is available on executor
+// side the we can easily adapt to it as well.
+if (filterColumns.size() > 1) {

Review comment:
   I do not understand this. Could you please ellaborate?
   
   Thanks,
   Peter




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586657)
Time Spent: 4h 40m  (was: 4.5h)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=586658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586658
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 15:15
Start Date: 21/Apr/21 15:15
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r617643973



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -194,6 +210,54 @@ public boolean canProvideBasicStatistics() {
 return stats;
   }
 
+  public boolean 
addDynamicSplitPruningEdge(org.apache.hadoop.hive.ql.metadata.Table table,
+  ExprNodeDesc syntheticFilterPredicate) {
+try {
+  Collection partitionColumns = ((HiveIcebergSerDe) 
table.getDeserializer()).partitionColumns();
+  if (partitionColumns.size() > 0) {
+// Collect the column names from the predicate
+Set filterColumns = Sets.newHashSet();
+columns(syntheticFilterPredicate, filterColumns);
+
+// While Iceberg could handle multiple columns the current pruning 
only able to handle filters for a
+// single column. We keep the logic below to handle multiple columns 
so if pruning is available on executor
+// side the we can easily adapt to it as well.
+if (filterColumns.size() > 1) {

Review comment:
   I do not understand this. Could you please elaborate?
   
   Thanks,
   Peter




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586658)
Time Spent: 4h 50m  (was: 4h 40m)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=586656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586656
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 15:15
Start Date: 21/Apr/21 15:15
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r617643436



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -50,9 +61,14 @@
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.base.Splitter;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
 import org.apache.iceberg.util.SerializationUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 public class HiveIcebergStorageHandler implements HiveStoragePredicateHandler, 
HiveStorageHandler {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergStorageHandler.class);
+
   private static final Splitter TABLE_NAME_SPLITTER = Splitter.on("..");

Review comment:
   That's a good point. Will do it in a separate jira as it should be the 
same on Iceberg too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586656)
Time Spent: 4.5h  (was: 4h 20m)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=586653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586653
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 15:14
Start Date: 21/Apr/21 15:14
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r617642837



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -275,4 +339,74 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
 // this is an exception to the interface documentation, but it's a safe 
operation to add this property
 props.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
   }
+
+  /**
+   * Recursively collects the column names from the predicate.
+   * @param node The node we are traversing
+   * @param columns The already collected column names
+   */
+  private void columns(ExprNodeDesc node, Set columns) {
+if (node instanceof ExprNodeColumnDesc) {
+  columns.add(((ExprNodeColumnDesc) node).getColumn());
+} else {
+  List children = node.getChildren();
+  if (children != null && !children.isEmpty()) {
+children.forEach(child -> columns(child, columns));
+  }
+}
+  }
+
+  /**
+   * Recursively replaces the ExprNodeDynamicListDesc nodes by a dummy 
ExprNodeConstantDesc so we can test if we can
+   * convert the predicate to an Iceberg predicate when pruning the partitions 
later. Please make sure that it is ok
+   * to change the input node (clone if needed)
+   * @param node The node we are traversing
+   */
+  private void replaceWithDummyValues(ExprNodeDesc node) {
+List children = node.getChildren();
+if (children != null && !children.isEmpty()) {
+  ListIterator iterator = node.getChildren().listIterator();

Review comment:
   We do an in-place replacement for the `Dynamic` nodes with 
`iterator.set(new ExprNodeConstantDesc(child.getTypeInfo(), dummy));`. I think 
I can not do that with a for loop.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586653)
Time Spent: 4h 20m  (was: 4h 10m)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25028) Hive: Select query with IS operator producing unexpected result

2021-04-21 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-25028.
---
Resolution: Fixed

Pushed to master. Thanks [~soumyakanti.das].

> Hive: Select query with IS operator producing unexpected result
> ---
>
> Key: HIVE-25028
> URL: https://issues.apache.org/jira/browse/HIVE-25028
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive: Select query with IS operator is producing unexpected result.
> The following was executed on postgres:
> {code:java}
> sqlancer=# create table if not exists emp(name text, age int);
> CREATE TABLE
> sqlancer=# insert into emp values ('a', 5), ('b', 15), ('c', 12);
> INSERT 0 3
> sqlancer=# select emp.age from emp where emp.age > 10;
>  age
> -
>   15
>   12
> (2 rows)sqlancer=# select emp.age > 10 is true from emp;
>  ?column?
> --
>  f
>  t
>  t
> (3 rows){code}
> This is happening because IS operator has higher precedence than comparison 
> operators in Hive. In most other databases, comparison operator has higher 
> precedence. The grammar needs to be changed to fix the precedence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25028) Hive: Select query with IS operator producing unexpected result

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25028:
--
Labels: pull-request-available  (was: )

> Hive: Select query with IS operator producing unexpected result
> ---
>
> Key: HIVE-25028
> URL: https://issues.apache.org/jira/browse/HIVE-25028
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive: Select query with IS operator is producing unexpected result.
> The following was executed on postgres:
> {code:java}
> sqlancer=# create table if not exists emp(name text, age int);
> CREATE TABLE
> sqlancer=# insert into emp values ('a', 5), ('b', 15), ('c', 12);
> INSERT 0 3
> sqlancer=# select emp.age from emp where emp.age > 10;
>  age
> -
>   15
>   12
> (2 rows)sqlancer=# select emp.age > 10 is true from emp;
>  ?column?
> --
>  f
>  t
>  t
> (3 rows){code}
> This is happening because IS operator has higher precedence than comparison 
> operators in Hive. In most other databases, comparison operator has higher 
> precedence. The grammar needs to be changed to fix the precedence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25028) Hive: Select query with IS operator producing unexpected result

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25028?focusedWorklogId=586603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586603
 ]

ASF GitHub Bot logged work on HIVE-25028:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 14:14
Start Date: 21/Apr/21 14:14
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2190:
URL: https://github.com/apache/hive/pull/2190


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586603)
Remaining Estimate: 0h
Time Spent: 10m

> Hive: Select query with IS operator producing unexpected result
> ---
>
> Key: HIVE-25028
> URL: https://issues.apache.org/jira/browse/HIVE-25028
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive: Select query with IS operator is producing unexpected result.
> The following was executed on postgres:
> {code:java}
> sqlancer=# create table if not exists emp(name text, age int);
> CREATE TABLE
> sqlancer=# insert into emp values ('a', 5), ('b', 15), ('c', 12);
> INSERT 0 3
> sqlancer=# select emp.age from emp where emp.age > 10;
>  age
> -
>   15
>   12
> (2 rows)sqlancer=# select emp.age > 10 is true from emp;
>  ?column?
> --
>  f
>  t
>  t
> (3 rows){code}
> This is happening because IS operator has higher precedence than comparison 
> operators in Hive. In most other databases, comparison operator has higher 
> precedence. The grammar needs to be changed to fix the precedence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23820) [HS2] Send tableId in request for get_table_request API

2021-04-21 Thread Ashish Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326537#comment-17326537
 ] 

Ashish Sharma commented on HIVE-23820:
--

[~kishendas] Could you please review the PR?

> [HS2] Send tableId in request for get_table_request API
> ---
>
> Key: HIVE-23820
> URL: https://issues.apache.org/jira/browse/HIVE-23820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586569
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 13:33
Start Date: 21/Apr/21 13:33
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617536842



##
File path: 
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
##
@@ -756,4 +756,40 @@ public KuduNegativeCliConfig() {
   }
 }
   }
+
+  public static class IcebergCliConfig extends AbstractCliConfig {
+
+public IcebergCliConfig() {
+  super(CoreCliDriver.class);
+  try {
+setQueryDir("iceberg/iceberg-handler/src/test/java/queries/positive");

Review comment:
   Right, but the q files need to be moved too I think




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586569)
Time Spent: 3.5h  (was: 3h 20m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586564
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 13:32
Start Date: 21/Apr/21 13:32
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617535040



##
File path: 
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
##
@@ -756,4 +756,40 @@ public KuduNegativeCliConfig() {
   }
 }
   }
+
+  public static class IcebergCliConfig extends AbstractCliConfig {
+
+public IcebergCliConfig() {
+  super(CoreCliDriver.class);
+  try {
+setQueryDir("iceberg/iceberg-handler/src/test/java/queries/positive");

Review comment:
   You are right, this is a typo




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586564)
Time Spent: 3h 20m  (was: 3h 10m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23820) [HS2] Send tableId in request for get_table_request API

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23820?focusedWorklogId=586565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586565
 ]

ASF GitHub Bot logged work on HIVE-23820:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 13:32
Start Date: 21/Apr/21 13:32
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2153:
URL: https://github.com/apache/hive/pull/2153#discussion_r617535115



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##
@@ -2381,82 +2381,141 @@ public Partition getPartitionWithAuthInfo(String 
catName, String dbName, String
 return 
deepCopy(FilterUtils.filterPartitionIfEnabled(isClientFilterEnabled, 
filterHook, p));
   }
 
+  /**
+   * @deprecated use getTable(GetTableRequest getTableRequest)
+   * @param dbname
+   * @param name
+   * @return
+   * @throws TException
+   */
   @Override
+  @Deprecated
   public Table getTable(String dbname, String name) throws TException {
-return getTable(getDefaultCatalog(conf), dbname, name);
+GetTableRequest req = new GetTableRequest(dbname, name);
+req.setCatName(getDefaultCatalog(conf));
+return getTable(req);
   }
 
+  /**
+   * @deprecated use getTable(GetTableRequest getTableRequest)
+   * @param dbname
+   * @param name
+   * @param getColumnStats
+   *  get the column stats, if available, when true
+   * @param engine engine sending the request
+   * @return
+   * @throws TException
+   */
   @Override
+  @Deprecated
   public Table getTable(String dbname, String name, boolean getColumnStats, 
String engine) throws TException {
-return getTable(getDefaultCatalog(conf), dbname, name, getColumnStats, 
engine);
+GetTableRequest req = new GetTableRequest(dbname, name);
+req.setCatName(getDefaultCatalog(conf));
+req.setGetColumnStats(getColumnStats);
+if (getColumnStats) {

Review comment:
   Because in Table getTable(GetTableRequest getTableRequest) assume that 
getTableRequest has valid data.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586565)
Time Spent: 1.5h  (was: 1h 20m)

> [HS2] Send tableId in request for get_table_request API
> ---
>
> Key: HIVE-23820
> URL: https://issues.apache.org/jira/browse/HIVE-23820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=586557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586557
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 13:23
Start Date: 21/Apr/21 13:23
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r617511852



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -126,7 +127,7 @@
 // skip HiveCatalog tests as they are added before
 for (TestTables.TestTableType testTableType : TestTables.ALL_TABLE_TYPES) {
   if (!TestTables.TestTableType.HIVE_CATALOG.equals(testTableType)) {
-testParams.add(new Object[]{FileFormat.PARQUET, "mr", testTableType});
+testParams.add(new Object[]{FileFormat.PARQUET, "tez", testTableType});

Review comment:
   The comment above the loop is not in sync with the logic. 

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -126,7 +127,7 @@
 // skip HiveCatalog tests as they are added before
 for (TestTables.TestTableType testTableType : TestTables.ALL_TABLE_TYPES) {
   if (!TestTables.TestTableType.HIVE_CATALOG.equals(testTableType)) {
-testParams.add(new Object[]{FileFormat.PARQUET, "mr", testTableType});
+testParams.add(new Object[]{FileFormat.PARQUET, "tez", testTableType});

Review comment:
   Anyway, this change is already present on master, so you might revert 
it. 

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -194,6 +210,54 @@ public boolean canProvideBasicStatistics() {
 return stats;
   }
 
+  public boolean 
addDynamicSplitPruningEdge(org.apache.hadoop.hive.ql.metadata.Table table,
+  ExprNodeDesc syntheticFilterPredicate) {
+try {
+  Collection partitionColumns = ((HiveIcebergSerDe) 
table.getDeserializer()).partitionColumns();
+  if (partitionColumns.size() > 0) {
+// Collect the column names from the predicate
+Set filterColumns = Sets.newHashSet();
+columns(syntheticFilterPredicate, filterColumns);
+
+// While Iceberg could handle multiple columns the current pruning 
only able to handle filters for a
+// single column. We keep the logic below to handle multiple columns 
so if pruning is available on executor
+// side the we can easily adapt to it as well.
+if (filterColumns.size() > 1) {

Review comment:
   If we know that we don't support multiple column filtering, wouldn't be 
possible to get a rough estimate of the filter size ( is >=1) before collecting 
every column name? 

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -50,9 +61,14 @@
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.base.Splitter;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
 import org.apache.iceberg.util.SerializationUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 public class HiveIcebergStorageHandler implements HiveStoragePredicateHandler, 
HiveStorageHandler {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergStorageHandler.class);
+
   private static final Splitter TABLE_NAME_SPLITTER = Splitter.on("..");

Review comment:
   nit: Splitter.on(TABLE_NAME_SEPARATOR)

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -275,4 +339,74 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
 // this is an exception to the interface documentation, but it's a safe 
operation to add this property
 props.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
   }
+
+  /**
+   * Recursively collects the column names from the predicate.
+   * @param node The node we are traversing
+   * @param columns The already collected column names
+   */
+  private void columns(ExprNodeDesc node, Set columns) {
+if (node instanceof ExprNodeColumnDesc) {
+  columns.add(((ExprNodeColumnDesc) node).getColumn());
+} else {
+  List children = node.getChildren();
+  if (children != null && !children.isEmpty()) {
+children.forEach(child -> columns(child, columns));
+  }
+}
+  }
+
+  /**
+   * Recursively replaces the ExprNodeDynamicListDesc nodes by a dummy 
ExprNodeConstantDesc so we can test if we can
+   * convert the

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586541
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 13:01
Start Date: 21/Apr/21 13:01
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617511043



##
File path: 
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
##
@@ -756,4 +756,40 @@ public KuduNegativeCliConfig() {
   }
 }
   }
+
+  public static class IcebergCliConfig extends AbstractCliConfig {
+
+public IcebergCliConfig() {
+  super(CoreCliDriver.class);
+  try {
+setQueryDir("iceberg/iceberg-handler/src/test/java/queries/positive");

Review comment:
   Shouldn't the `.q` files go under `src/test/queries/...` instead of 
`src/test/java/queries/...`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586541)
Time Spent: 3h 10m  (was: 3h)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25045) Let hive-precommit run against custom Hadoop/Tez/Orc/Calcite

2021-04-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-25045:
---

Assignee: László Bodor

> Let hive-precommit run against custom Hadoop/Tez/Orc/Calcite
> 
>
> Key: HIVE-25045
> URL: https://issues.apache.org/jira/browse/HIVE-25045
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=586533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586533
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 12:34
Start Date: 21/Apr/21 12:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r617491058



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -346,25 +372,23 @@ private static ExecutorService 
tableExecutor(Configuration conf, int maxThreadNu
 
   /**
* Get the committed data files for this table and job.
+   *
+   * @param numTasks Number of writer tasks that produced a forCommit file
* @param executor The executor used for reading the forCommit files parallel
* @param location The location of the table
* @param jobContext The job context
* @param io The FileIO used for reading a files generated for commit
* @param throwOnFailure If true then it throws an exception on 
failure
* @return The list of the committed data files
*/
-  private static Collection dataFiles(ExecutorService executor, 
String location, JobContext jobContext,
-  FileIO io, boolean throwOnFailure) {
+  private static Collection dataFiles(int numTasks, ExecutorService 
executor, String location,
+JobContext jobContext, FileIO 
io, boolean throwOnFailure) {

Review comment:
   Fun stuff 
   Whatever!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586533)
Time Spent: 3h 40m  (was: 3.5h)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=586532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586532
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 12:33
Start Date: 21/Apr/21 12:33
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r617490443



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -239,6 +252,16 @@ public void abortJob(JobContext originalContext, int 
status) throws IOException
 cleanup(jobContext, jobLocations);
   }
 
+  private Set listForCommits(JobConf jobConf, String jobLocation) 
throws IOException {

Review comment:
   Right, good idea




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586532)
Time Spent: 3.5h  (was: 3h 20m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=586529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586529
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 12:33
Start Date: 21/Apr/21 12:33
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r617490157



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -346,25 +372,23 @@ private static ExecutorService 
tableExecutor(Configuration conf, int maxThreadNu
 
   /**
* Get the committed data files for this table and job.
+   *
+   * @param numTasks Number of writer tasks that produced a forCommit file
* @param executor The executor used for reading the forCommit files parallel
* @param location The location of the table
* @param jobContext The job context
* @param io The FileIO used for reading a files generated for commit
* @param throwOnFailure If true then it throws an exception on 
failure
* @return The list of the committed data files
*/
-  private static Collection dataFiles(ExecutorService executor, 
String location, JobContext jobContext,
-  FileIO io, boolean throwOnFailure) {
+  private static Collection dataFiles(int numTasks, ExecutorService 
executor, String location,
+JobContext jobContext, FileIO 
io, boolean throwOnFailure) {

Review comment:
   They are a bit confusing about this. The 4 spaces padding is indeed the 
general rule for line continuation, but I've been asked earlier by Anton on 
other PRs not to do that for method parameter indentations and do it this way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586529)
Time Spent: 3h 20m  (was: 3h 10m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=586527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586527
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 12:29
Start Date: 21/Apr/21 12:29
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r617487622



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -239,6 +252,16 @@ public void abortJob(JobContext originalContext, int 
status) throws IOException
 cleanup(jobContext, jobLocations);
   }
 
+  private Set listForCommits(JobConf jobConf, String jobLocation) 
throws IOException {

Review comment:
   nit: Maybe a javadoc that do not use it for anything else than abort.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586527)
Time Spent: 3h 10m  (was: 3h)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=586526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586526
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 12:28
Start Date: 21/Apr/21 12:28
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r617486952



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -346,25 +372,23 @@ private static ExecutorService 
tableExecutor(Configuration conf, int maxThreadNu
 
   /**
* Get the committed data files for this table and job.
+   *
+   * @param numTasks Number of writer tasks that produced a forCommit file
* @param executor The executor used for reading the forCommit files parallel
* @param location The location of the table
* @param jobContext The job context
* @param io The FileIO used for reading a files generated for commit
* @param throwOnFailure If true then it throws an exception on 
failure
* @return The list of the committed data files
*/
-  private static Collection dataFiles(ExecutorService executor, 
String location, JobContext jobContext,
-  FileIO io, boolean throwOnFailure) {
+  private static Collection dataFiles(int numTasks, ExecutorService 
executor, String location,
+JobContext jobContext, FileIO 
io, boolean throwOnFailure) {

Review comment:
   nit: I have seen Iceberg reviewers asking for 4 space padding.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586526)
Time Spent: 3h  (was: 2h 50m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25022) Metric about incomplete compactions

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25022?focusedWorklogId=586525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586525
 ]

ASF GitHub Bot logged work on HIVE-25022:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 12:28
Start Date: 21/Apr/21 12:28
Worklog Time Spent: 10m 
  Work Description: klcopp closed pull request #2184:
URL: https://github.com/apache/hive/pull/2184


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586525)
Time Spent: 50m  (was: 40m)

> Metric about incomplete compactions
> ---
>
> Key: HIVE-25022
> URL: https://issues.apache.org/jira/browse/HIVE-25022
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> "Compactions in a state" metrics (for example compaction_num_working) count 
> the sum of tables/partitions where the last compaction is in that state.
> I propose introducing a new metric about incomplete compactions: i.e. the 
> number of tables/partitions where the last finished compaction* is 
> unsuccessful (failed or "did not initiate"), or where major compaction was 
> unsuccessful then minor compaction succeeded (compaction is not "complete" 
> since major compaction has not succeeded in the time since it should have 
> run).
> Example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major working
> major failed
> major initiated
> major working
> major failed
> major initiated
> major working
> The "compactions in a state" metrics will consider the state of this table: 
> working.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there have been failed compactions since the last succeeded compaction.
> {code}
> Another example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major failed
> minor failed
> minor succeeded
> The "compactions in a state" metrics will consider the state of this table: 
> succeeded.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there hasn't been a major succeeded since major failed.{code}
> Last example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> minor did not initiate
> The "compactions in a state" metrics will consider the state of this table: 
> did not initiate.
> The "incomplete compactions" metric will consider this: incomplete, since the 
> last compaction was "did not initiate"{code}
> *finished compaction: state in (succeeded, failed, attempted/did not initiate)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25022) Metric about incomplete compactions

2021-04-21 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-25022.
--
Resolution: Won't Fix

Not needed since multiple failures will cause the table to go into "did not 
initiate" state long-term.

> Metric about incomplete compactions
> ---
>
> Key: HIVE-25022
> URL: https://issues.apache.org/jira/browse/HIVE-25022
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> "Compactions in a state" metrics (for example compaction_num_working) count 
> the sum of tables/partitions where the last compaction is in that state.
> I propose introducing a new metric about incomplete compactions: i.e. the 
> number of tables/partitions where the last finished compaction* is 
> unsuccessful (failed or "did not initiate"), or where major compaction was 
> unsuccessful then minor compaction succeeded (compaction is not "complete" 
> since major compaction has not succeeded in the time since it should have 
> run).
> Example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major working
> major failed
> major initiated
> major working
> major failed
> major initiated
> major working
> The "compactions in a state" metrics will consider the state of this table: 
> working.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there have been failed compactions since the last succeeded compaction.
> {code}
> Another example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major failed
> minor failed
> minor succeeded
> The "compactions in a state" metrics will consider the state of this table: 
> succeeded.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there hasn't been a major succeeded since major failed.{code}
> Last example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> minor did not initiate
> The "compactions in a state" metrics will consider the state of this table: 
> did not initiate.
> The "incomplete compactions" metric will consider this: incomplete, since the 
> last compaction was "did not initiate"{code}
> *finished compaction: state in (succeeded, failed, attempted/did not initiate)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=586510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586510
 ]

ASF GitHub Bot logged work on HIVE-24974:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 12:00
Start Date: 21/Apr/21 12:00
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2148:
URL: https://github.com/apache/hive/pull/2148#discussion_r617468227



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,26 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+// Configs having to do with DeltaFilesMetricReporter, which collects 
lists of most recently active tables
+// with the most number of active/obsolete deltas.
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+
HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", 
"7200s",
+new TimeValidator(TimeUnit.SECONDS),
+"Maximum lifetime in seconds for an entry in the ACID metrics cache."),
+
HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval",
 "30s",
+new TimeValidator(TimeUnit.SECONDS),
+"Reporting period for ACID metrics in seconds."),
+
HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold",
 100,
+"The minimum number of active delta files a table/partition must be 
included in the ACID metrics report."),

Review comment:
   This is a bit more verbose and maybe more clear: The minimum number of 
active delta files a table/partition must have in order to be included in the 
ACID metrics report.
   (As in, a table/partition must have x number of files in order to be 
included  :) )




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586510)
Time Spent: 1.5h  (was: 1h 20m)

> Create new metrics about the number of delta files in the ACID table
> 
>
> Key: HIVE-24974
> URL: https://issues.apache.org/jira/browse/HIVE-24974
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 2 metrics should be collected by each table/partition that exceeds some limit.
>  * Number of used deltas
>  * Number of obsolete deltas
> Both of them should be collected in AcidUtils.getAcidstate call, and only be 
> published if they reached a configurable threshold (to not pollute metrics) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586471
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 10:37
Start Date: 21/Apr/21 10:37
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617418759



##
File path: data/conf/iceberg/hive-site.xml
##
@@ -0,0 +1,321 @@
+

Review comment:
   Sounds good! So that we don't forget, let's create a follow up Jira to 
remove the new iceberg-test flag once that locking PR has gone in.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586471)
Time Spent: 3h  (was: 2h 50m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586460
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 10:30
Start Date: 21/Apr/21 10:30
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617413721



##
File path: data/conf/iceberg/hive-site.xml
##
@@ -0,0 +1,321 @@
+

Review comment:
   I've discussed this with @deniskuzZ, and he advised that I should open a 
separate PR to remove this check around locking and update the unit tests which 
are expecting this exception. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586460)
Time Spent: 2h 50m  (was: 2h 40m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25044) Parallel edge fixer may not be able to process semijoin edges

2021-04-21 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25044:
---


> Parallel edge fixer may not be able to process semijoin edges
> -
>
> Key: HIVE-25044
> URL: https://issues.apache.org/jira/browse/HIVE-25044
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> SJ filter edges are removed from the main operator graph - which could cause 
> that a parallel edge remains after the remover was executed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25029) Remove travis builds

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25029?focusedWorklogId=586412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586412
 ]

ASF GitHub Bot logged work on HIVE-25029:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 08:37
Start Date: 21/Apr/21 08:37
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2192:
URL: https://github.com/apache/hive/pull/2192


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586412)
Time Spent: 20m  (was: 10m)

> Remove travis builds
> 
>
> Key: HIVE-25029
> URL: https://issues.apache.org/jira/browse/HIVE-25029
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> travis only compiles the project - we already do much more than that during 
> precommit testing.
> (and it it sometimes delays build because travis cant allocate executors/etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24954) MetastoreTransformer is disabled during testing

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24954?focusedWorklogId=586411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586411
 ]

ASF GitHub Bot logged work on HIVE-24954:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 08:37
Start Date: 21/Apr/21 08:37
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2139:
URL: https://github.com/apache/hive/pull/2139


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586411)
Time Spent: 20m  (was: 10m)

> MetastoreTransformer is disabled during testing
> ---
>
> Key: HIVE-24954
> URL: https://issues.apache.org/jira/browse/HIVE-24954
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> all calls are fortified with "isInTest" guards to avoid testing those calls 
> (!@#$#)
> https://github.com/apache/hive/blob/86fa9b30fe347c7fc78a2930f4d20ece2e124f03/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L1647
> this causes some wierd behaviour:
> out of the box hive installation creates TRANSLATED_TO_EXTERNAL external 
> tables for plain CREATE TABLE commands
> meanwhile during when most testing is executed CREATE table creates regular 
> MANAGED tables...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive where it's possible

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=586405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586405
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 08:13
Start Date: 21/Apr/21 08:13
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1778:
URL: https://github.com/apache/hive/pull/1778#issuecomment-823871712


   merged to master, thanks @pgaref for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586405)
Time Spent: 2h 50m  (was: 2h 40m)

> LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive 
> where it's possible
> -
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: dep.log
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is a possible performance improvement compared to Netty3. However, the 
> refactor is not trivial, TEZ-4157 covers that more or less (the code bases 
> are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24524) LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive where it's possible

2021-04-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24524:

Fix Version/s: 4.0.0

> LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive 
> where it's possible
> -
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: dep.log
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is a possible performance improvement compared to Netty3. However, the 
> refactor is not trivial, TEZ-4157 covers that more or less (the code bases 
> are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive where it's possible

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=586404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586404
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 08:13
Start Date: 21/Apr/21 08:13
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #1778:
URL: https://github.com/apache/hive/pull/1778


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586404)
Time Spent: 2h 40m  (was: 2.5h)

> LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive 
> where it's possible
> -
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: dep.log
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is a possible performance improvement compared to Netty3. However, the 
> refactor is not trivial, TEZ-4157 covers that more or less (the code bases 
> are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24524) LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive where it's possible

2021-04-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-24524.
-
Resolution: Fixed

> LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive 
> where it's possible
> -
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: dep.log
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is a possible performance improvement compared to Netty3. However, the 
> refactor is not trivial, TEZ-4157 covers that more or less (the code bases 
> are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24524) LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive where it's possible

2021-04-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24524:

Summary: LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 
dependency from hive where it's possible  (was: LLAP ShuffleHandler: upgrade to 
Netty4 and remove Netty3 dependency from hive)

> LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive 
> where it's possible
> -
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: dep.log
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is a possible performance improvement compared to Netty3. However, the 
> refactor is not trivial, TEZ-4157 covers that more or less (the code bases 
> are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-25043) Support custom UDF in Vectorized mode

2021-04-21 Thread Ryu Kobayashi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25043 started by Ryu Kobayashi.

> Support custom UDF in Vectorized mode
> -
>
> Key: HIVE-25043
> URL: https://issues.apache.org/jira/browse/HIVE-25043
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.vectorized.adaptor.usage.mode=chosen does not allow custom UDFs as far 
> as I can see in the code. So, change it to allow only the specified custom 
> UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25043) Support custom UDF in Vectorized mode

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25043:
--
Labels: pull-request-available  (was: )

> Support custom UDF in Vectorized mode
> -
>
> Key: HIVE-25043
> URL: https://issues.apache.org/jira/browse/HIVE-25043
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.vectorized.adaptor.usage.mode=chosen does not allow custom UDFs as far 
> as I can see in the code. So, change it to allow only the specified custom 
> UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25043) Support custom UDF in Vectorized mode

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25043?focusedWorklogId=586395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586395
 ]

ASF GitHub Bot logged work on HIVE-25043:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 08:02
Start Date: 21/Apr/21 08:02
Worklog Time Spent: 10m 
  Work Description: ryukobayashi opened a new pull request #2202:
URL: https://github.com/apache/hive/pull/2202


   
   
   ### What changes were proposed in this pull request?
   
   See the: https://issues.apache.org/jira/browse/HIVE-25043
   
   ### Why are the changes needed?
   
   That want to use a custom UDF that supports Vectorized when 
hive.vectorized.adaptor.usage.mode=chosen is specified.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Remote cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586395)
Remaining Estimate: 0h
Time Spent: 10m

> Support custom UDF in Vectorized mode
> -
>
> Key: HIVE-25043
> URL: https://issues.apache.org/jira/browse/HIVE-25043
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.vectorized.adaptor.usage.mode=chosen does not allow custom UDFs as far 
> as I can see in the code. So, change it to allow only the specified custom 
> UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586392
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:56
Start Date: 21/Apr/21 07:56
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617287262



##
File path: 
iceberg/iceberg-handler/src/test/java/queries/negative/create_iceberg_table_failure.q
##
@@ -0,0 +1,2 @@
+set hive.vectorized.execution.enabled=true;

Review comment:
   Yes, this PR provides the skeleton to run additional q tests from 
upcoming PRs. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586392)
Time Spent: 2.5h  (was: 2h 20m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586389
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:54
Start Date: 21/Apr/21 07:54
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617285043



##
File path: 
iceberg/iceberg-handler/src/test/java/queries/negative/create_iceberg_table_failure.q
##
@@ -0,0 +1,2 @@
+set hive.vectorized.execution.enabled=true;

Review comment:
   Or I suppose this current PR was not meant to be comprehensive but just 
to set up the iceberg qtest infra? In that case, we can tackle these in 
upcoming PRs that will beef up these test cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586389)
Time Spent: 2h 20m  (was: 2h 10m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586388
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:52
Start Date: 21/Apr/21 07:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617285043



##
File path: 
iceberg/iceberg-handler/src/test/java/queries/negative/create_iceberg_table_failure.q
##
@@ -0,0 +1,2 @@
+set hive.vectorized.execution.enabled=true;

Review comment:
   I suppose this was not meant to be comprehensive but just to set up the 
iceberg qtest infra? In that case, we can tackle these in upcoming PRs that 
beef up the test cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586388)
Time Spent: 2h 10m  (was: 2h)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586386
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:51
Start Date: 21/Apr/21 07:51
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617283836



##
File path: 
iceberg/iceberg-handler/src/test/java/queries/negative/create_iceberg_table_failure.q
##
@@ -0,0 +1,2 @@
+set hive.vectorized.execution.enabled=true;

Review comment:
   Can we add here other table creation failure scenarios too? One that 
comes to mind now is where you specify a PARTITIONED BY clause but also specify 
the json spec in the table properties. Another one could be related to type 
mismatches, i.e. using an unsupported Hive type, like interval in the columns




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586386)
Time Spent: 2h  (was: 1h 50m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=586383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586383
 ]

ASF GitHub Bot logged work on HIVE-24974:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:42
Start Date: 21/Apr/21 07:42
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2148:
URL: https://github.com/apache/hive/pull/2148#discussion_r617277648



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,26 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+// Configs having to do with DeltaFilesMetricReporter, which collects 
lists of most recently active tables
+// with the most number of active/obsolete deltas.
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+
HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", 
"7200s",
+new TimeValidator(TimeUnit.SECONDS),
+"Maximum lifetime in seconds for an entry in the ACID metrics cache."),
+
HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval",
 "30s",
+new TimeValidator(TimeUnit.SECONDS),
+"Reporting period for ACID metrics in seconds."),
+
HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold",
 100,
+"The minimum number of active delta files a table/partition must be 
included in the ACID metrics report."),
+
HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.obsolete.delta.num.threshold",
 100,
+"The minimum number of obsolete delta files a table/partition must be 
included in the ACID metrics report."),

Review comment:
   same as above :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586383)
Time Spent: 1h 20m  (was: 1h 10m)

> Create new metrics about the number of delta files in the ACID table
> 
>
> Key: HIVE-24974
> URL: https://issues.apache.org/jira/browse/HIVE-24974
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 2 metrics should be collected by each table/partition that exceeds some limit.
>  * Number of used deltas
>  * Number of obsolete deltas
> Both of them should be collected in AcidUtils.getAcidstate call, and only be 
> published if they reached a configurable threshold (to not pollute metrics) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=586382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586382
 ]

ASF GitHub Bot logged work on HIVE-24974:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:41
Start Date: 21/Apr/21 07:41
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2148:
URL: https://github.com/apache/hive/pull/2148#discussion_r617277318



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,26 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+// Configs having to do with DeltaFilesMetricReporter, which collects 
lists of most recently active tables
+// with the most number of active/obsolete deltas.
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+
HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", 
"7200s",
+new TimeValidator(TimeUnit.SECONDS),
+"Maximum lifetime in seconds for an entry in the ACID metrics cache."),
+
HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval",
 "30s",
+new TimeValidator(TimeUnit.SECONDS),
+"Reporting period for ACID metrics in seconds."),
+
HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold",
 100,
+"The minimum number of active delta files a table/partition must be 
included in the ACID metrics report."),

Review comment:
   :) I am not a native speaker but I ran this through Google translate and 
Grammarly and both have corrected "must have to be" -> "should be" / "must have 
been"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586382)
Time Spent: 1h 10m  (was: 1h)

> Create new metrics about the number of delta files in the ACID table
> 
>
> Key: HIVE-24974
> URL: https://issues.apache.org/jira/browse/HIVE-24974
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> 2 metrics should be collected by each table/partition that exceeds some limit.
>  * Number of used deltas
>  * Number of obsolete deltas
> Both of them should be collected in AcidUtils.getAcidstate call, and only be 
> published if they reached a configurable threshold (to not pollute metrics) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586381
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:40
Start Date: 21/Apr/21 07:40
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617276565



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -770,6 +770,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "If not set, defaults to the codec extension for text files (e.g. 
\".gz\"), or no extension otherwise."),
 
 HIVE_IN_TEST("hive.in.test", false, "internal usage only, true in test 
mode", true),
+HIVE_IN_TEST_ICEBERG("hive.in.iceberg.test", false, "internal usage only, 
true when " +

Review comment:
   This caused trouble when migrating the iceberg-mr code to the new 
iceberg-handler module. The TestHiveShell configs had to patched to get around 
this as well..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586381)
Time Spent: 1h 50m  (was: 1h 40m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25038) Increase Iceberg test timeout and remove mr tests

2021-04-21 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25038.
---
Resolution: Fixed

> Increase Iceberg test timeout and remove mr tests
> -
>
> Key: HIVE-25038
> URL: https://issues.apache.org/jira/browse/HIVE-25038
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=586378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586378
 ]

ASF GitHub Bot logged work on HIVE-24974:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:34
Start Date: 21/Apr/21 07:34
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2148:
URL: https://github.com/apache/hive/pull/2148#discussion_r617272518



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,26 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+// Configs having to do with DeltaFilesMetricReporter, which collects 
lists of most recently active tables
+// with the most number of active/obsolete deltas.
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+
HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", 
"7200s",
+new TimeValidator(TimeUnit.SECONDS),
+"Maximum lifetime in seconds for an entry in the ACID metrics cache."),
+
HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval",
 "30s",
+new TimeValidator(TimeUnit.SECONDS),
+"Reporting period for ACID metrics in seconds."),
+
HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold",
 100,
+"The minimum number of active delta files a table/partition must be 
included in the ACID metrics report."),
+
HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.obsolete.delta.num.threshold",
 100,
+"The minimum number of obsolete delta files a table/partition must be 
included in the ACID metrics report."),
+
HIVE_TXN_ACID_METRICS_DELTA_CHECK_THRESHOLD("hive.txn.acid.metrics.delta.check.threshold",
 "300s",
+new TimeValidator(TimeUnit.SECONDS),
+"Deltas not older than this value will not be included in the ACID 
metrics report."),

Review comment:
   no, it's the other way around. We don't want to include fresh deltas.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586378)
Time Spent: 1h  (was: 50m)

> Create new metrics about the number of delta files in the ACID table
> 
>
> Key: HIVE-24974
> URL: https://issues.apache.org/jira/browse/HIVE-24974
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> 2 metrics should be collected by each table/partition that exceeds some limit.
>  * Number of used deltas
>  * Number of obsolete deltas
> Both of them should be collected in AcidUtils.getAcidstate call, and only be 
> published if they reached a configurable threshold (to not pollute metrics) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=586377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586377
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:33
Start Date: 21/Apr/21 07:33
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r617272290



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .executeWith(tableExecutor)
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
-HiveIcebergRecordWriter writer = writers.get(output);
-DataFile[] closedFiles = writer != null ? writer.dataFiles() : new 
DataFile[0];
-String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-attemptID.getJobID(), attemptID.getTaskID().getId());
-
-// Creating the file containing the data files generated by this 
task for this table
-createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());
+if (table != null) {

Review comment:
   Yeah, I think you're right




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586377)
Time Spent: 2h 50m  (was: 2h 40m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25038) Increase Iceberg test timeout and remove mr tests

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25038?focusedWorklogId=586375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586375
 ]

ASF GitHub Bot logged work on HIVE-25038:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:33
Start Date: 21/Apr/21 07:33
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2198:
URL: https://github.com/apache/hive/pull/2198


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586375)
Time Spent: 20m  (was: 10m)

> Increase Iceberg test timeout and remove mr tests
> -
>
> Key: HIVE-25038
> URL: https://issues.apache.org/jira/browse/HIVE-25038
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25038) Increase Iceberg test timeout and remove mr tests

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25038?focusedWorklogId=586376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586376
 ]

ASF GitHub Bot logged work on HIVE-25038:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 07:33
Start Date: 21/Apr/21 07:33
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #2198:
URL: https://github.com/apache/hive/pull/2198#issuecomment-823846009


   Thanks @marton-bod for the patch and @pvary  for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586376)
Time Spent: 0.5h  (was: 20m)

> Increase Iceberg test timeout and remove mr tests
> -
>
> Key: HIVE-25038
> URL: https://issues.apache.org/jira/browse/HIVE-25038
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25043) Support custom UDF in Vectorized mode

2021-04-21 Thread Ryu Kobayashi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi reassigned HIVE-25043:



> Support custom UDF in Vectorized mode
> -
>
> Key: HIVE-25043
> URL: https://issues.apache.org/jira/browse/HIVE-25043
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>
> hive.vectorized.adaptor.usage.mode=chosen does not allow custom UDFs as far 
> as I can see in the code. So, change it to allow only the specified custom 
> UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24978) Optimise number of DROP_PARTITION events created.

2021-04-21 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi resolved HIVE-24978.

Resolution: Fixed

> Optimise number of DROP_PARTITION events created.
> -
>
> Key: HIVE-24978
> URL: https://issues.apache.org/jira/browse/HIVE-24978
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Even for drop partition with batches, presently there is one event for every 
> partition, optimise to merge them, to save the number of calls to HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24978) Optimise number of DROP_PARTITION events created.

2021-04-21 Thread Aasha Medhi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326284#comment-17326284
 ] 

Aasha Medhi commented on HIVE-24978:


+1 Committed to master. Thank you for the patch [~ayushtkn]

> Optimise number of DROP_PARTITION events created.
> -
>
> Key: HIVE-24978
> URL: https://issues.apache.org/jira/browse/HIVE-24978
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Even for drop partition with batches, presently there is one event for every 
> partition, optimise to merge them, to save the number of calls to HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24978) Optimise number of DROP_PARTITION events created.

2021-04-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24978?focusedWorklogId=586350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586350
 ]

ASF GitHub Bot logged work on HIVE-24978:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 06:13
Start Date: 21/Apr/21 06:13
Worklog Time Spent: 10m 
  Work Description: aasha merged pull request #2154:
URL: https://github.com/apache/hive/pull/2154


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586350)
Time Spent: 1h 10m  (was: 1h)

> Optimise number of DROP_PARTITION events created.
> -
>
> Key: HIVE-24978
> URL: https://issues.apache.org/jira/browse/HIVE-24978
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Even for drop partition with batches, presently there is one event for every 
> partition, optimise to merge them, to save the number of calls to HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

68 matches

Mail list logo