[jira] [Work logged] (HIVE-25039) Disable discovery.partitions config for external tables by default

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25039?focusedWorklogId=586340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586340
 ]

ASF GitHub Bot logged work on HIVE-25039:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 05:45
Start Date: 21/Apr/21 05:45
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2201:
URL: https://github.com/apache/hive/pull/2201


   
   
   ### What changes were proposed in this pull request?
   Removed the discover.partitions config for external tables with partittions.
   
   
   
   ### Why are the changes needed?
   HMS CPU usage is maximum because of this config.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. if the users want to enable this config, then they need to alter the 
table property using the command: ALTER TABLE exttbl SET TBLPROPERTIES 
('discover.partitions' = 'true');
   
   
   
   ### How was this patch tested?
   Local Machine, Remote cluster
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586340)
Remaining Estimate: 0h
Time Spent: 10m

> Disable discovery.partitions config for external tables by default
> --
>
> Key: HIVE-25039
> URL: https://issues.apache.org/jira/browse/HIVE-25039
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to disable the discovery.partitions config for the external tables 
> with partitions by default because every HMS API call to the external 
> partition (for example S3) is costly. We can selectively enable this config 
> for tables by: alter table set tblproperty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25039) Disable discovery.partitions config for external tables by default

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25039:
--
Labels: pull-request-available  (was: )

> Disable discovery.partitions config for external tables by default
> --
>
> Key: HIVE-25039
> URL: https://issues.apache.org/jira/browse/HIVE-25039
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to disable the discovery.partitions config for the external tables 
> with partitions by default because every HMS API call to the external 
> partition (for example S3) is costly. We can selectively enable this config 
> for tables by: alter table set tblproperty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally

2021-04-20 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326274#comment-17326274
 ] 

Zhihua Deng commented on HIVE-24944:


+1.  [~zhangqidong] Could you please also create a github pull request for 
this? thanks!

> When the default engine of the hiveserver is MR and the tez engine is set by 
> the client, the client TEZ progress log cannot be printed normally
> ---
>
> Key: HIVE-24944
> URL: https://issues.apache.org/jira/browse/HIVE-24944
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: ZhangQiDong
>Assignee: ZhangQiDong
>Priority: Major
> Attachments: HIVE-24944.001.patch
>
>
> HiveServer configuration parameter execution default MR. When set 
> hive.execution.engine = tez, the client cannot print the TEZ log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask

2021-04-20 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326271#comment-17326271
 ] 

Aasha Medhi commented on HIVE-25002:


+1 Committed to master. Thank you for the patch [~haymant]

> modify condition for target of replication in statsUpdaterThread and 
> PartitionManagementTask
> 
>
> Key: HIVE-25002
> URL: https://issues.apache.org/jira/browse/HIVE-25002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask

2021-04-20 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi resolved HIVE-25002.

Resolution: Fixed

> modify condition for target of replication in statsUpdaterThread and 
> PartitionManagementTask
> 
>
> Key: HIVE-25002
> URL: https://issues.apache.org/jira/browse/HIVE-25002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25002?focusedWorklogId=586339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586339
 ]

ASF GitHub Bot logged work on HIVE-25002:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 05:41
Start Date: 21/Apr/21 05:41
Worklog Time Spent: 10m 
  Work Description: aasha merged pull request #2167:
URL: https://github.com/apache/hive/pull/2167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586339)
Time Spent: 1.5h  (was: 1h 20m)

> modify condition for target of replication in statsUpdaterThread and 
> PartitionManagementTask
> 
>
> Key: HIVE-25002
> URL: https://issues.apache.org/jira/browse/HIVE-25002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23820) [HS2] Send tableId in request for get_table_request API

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23820?focusedWorklogId=586338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586338
 ]

ASF GitHub Bot logged work on HIVE-23820:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 05:29
Start Date: 21/Apr/21 05:29
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #2153:
URL: https://github.com/apache/hive/pull/2153#discussion_r617206197



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##
@@ -2381,82 +2381,141 @@ public Partition getPartitionWithAuthInfo(String 
catName, String dbName, String
 return 
deepCopy(FilterUtils.filterPartitionIfEnabled(isClientFilterEnabled, 
filterHook, p));
   }
 
+  /**
+   * @deprecated use getTable(GetTableRequest getTableRequest)
+   * @param dbname
+   * @param name
+   * @return
+   * @throws TException
+   */
   @Override
+  @Deprecated
   public Table getTable(String dbname, String name) throws TException {
-return getTable(getDefaultCatalog(conf), dbname, name);
+GetTableRequest req = new GetTableRequest(dbname, name);
+req.setCatName(getDefaultCatalog(conf));
+return getTable(req);
   }
 
+  /**
+   * @deprecated use getTable(GetTableRequest getTableRequest)
+   * @param dbname
+   * @param name
+   * @param getColumnStats
+   *  get the column stats, if available, when true
+   * @param engine engine sending the request
+   * @return
+   * @throws TException
+   */
   @Override
+  @Deprecated
   public Table getTable(String dbname, String name, boolean getColumnStats, 
String engine) throws TException {
-return getTable(getDefaultCatalog(conf), dbname, name, getColumnStats, 
engine);
+GetTableRequest req = new GetTableRequest(dbname, name);
+req.setCatName(getDefaultCatalog(conf));
+req.setGetColumnStats(getColumnStats);
+if (getColumnStats) {

Review comment:
   There is no equivalent condition in older code. Why is this done here?

##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
##
@@ -3304,14 +3304,20 @@ public Table getTable(String catName, String dbName, 
String tableName,
 String validWriteIdList) throws TException {
 throw new UnsupportedOperationException();
   }
-
+  

Review comment:
   nit: remove indent




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586338)
Time Spent: 1h 20m  (was: 1h 10m)

> [HS2] Send tableId in request for get_table_request API
> ---
>
> Key: HIVE-23820
> URL: https://issues.apache.org/jira/browse/HIVE-23820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586337
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 05:27
Start Date: 21/Apr/21 05:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617209417



##
File path: data/conf/iceberg/hive-site.xml
##
@@ -0,0 +1,321 @@
+

Review comment:
   I would rather get rid of the `hive.in.iceberg.test` altogether




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586337)
Time Spent: 1h 40m  (was: 1.5h)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586336
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 05:25
Start Date: 21/Apr/21 05:25
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617208933



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -770,6 +770,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "If not set, defaults to the codec extension for text files (e.g. 
\".gz\"), or no extension otherwise."),
 
 HIVE_IN_TEST("hive.in.test", false, "internal usage only, true in test 
mode", true),
+HIVE_IN_TEST_ICEBERG("hive.in.iceberg.test", false, "internal usage only, 
true when " +

Review comment:
   How I hate test code in the production codepath, I can not tell you...
   @deniskuzZ: How hard would it be to get rid of this? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586336)
Time Spent: 1.5h  (was: 1h 20m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586334
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 05:23
Start Date: 21/Apr/21 05:23
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r617208189



##
File path: itests/pom.xml
##
@@ -248,6 +248,17 @@
 ${project.version}
 tests
   
+  

Review comment:
   The Travis will be removed soon. That should not be a problem. My fear 
is that when the developers starts to run the the tests on their local machines 
without the `iceberg` flag, they will have a problem (maybe only if they are 
offline) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586334)
Time Spent: 1h 20m  (was: 1h 10m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=586332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586332
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 05:17
Start Date: 21/Apr/21 05:17
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r617206293



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .executeWith(tableExecutor)
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
-HiveIcebergRecordWriter writer = writers.get(output);
-DataFile[] closedFiles = writer != null ? writer.dataFiles() : new 
DataFile[0];
-String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-attemptID.getJobID(), attemptID.getTaskID().getId());
-
-// Creating the file containing the data files generated by this 
task for this table
-createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());
+if (table != null) {

Review comment:
   I think this is the same situation that we have been with some other Tez 
patches. It does not hurt to be there an we keep the two codes from diverging. 
But it's up to you to decide. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586332)
Time Spent: 2h 40m  (was: 2.5h)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally

2021-04-20 Thread ZhangQiDong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326231#comment-17326231
 ] 

ZhangQiDong commented on HIVE-24944:


[~gopalv] [~chinnalalam] , can you please help me review this?

> When the default engine of the hiveserver is MR and the tez engine is set by 
> the client, the client TEZ progress log cannot be printed normally
> ---
>
> Key: HIVE-24944
> URL: https://issues.apache.org/jira/browse/HIVE-24944
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: ZhangQiDong
>Assignee: ZhangQiDong
>Priority: Major
> Attachments: HIVE-24944.001.patch
>
>
> HiveServer configuration parameter execution default MR. When set 
> hive.execution.engine = tez, the client cannot print the TEZ log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=586276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586276
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 21/Apr/21 02:19
Start Date: 21/Apr/21 02:19
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2071:
URL: https://github.com/apache/hive/pull/2071#discussion_r617153970



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveMapComparator.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.io.WritableComparable;
+import org.apache.hadoop.io.WritableComparator;
+
+import java.util.Iterator;
+import java.util.Map;
+
+final class HiveMapComparator extends HiveWritableComparator {
+private WritableComparator comparatorValue = null;
+private WritableComparator comparatorKey = null;
+
+HiveMapComparator(boolean nullSafe, NullOrdering nullOrdering) {
+super(nullSafe, nullOrdering);
+}
+
+@Override
+public int compare(Object key1, Object key2) {
+int result = checkNull(key1, key2);
+if (result != not_null) {
+return result;
+}
+
+Map map1 = (Map) key1;
+Map map2 = (Map) key2;
+if (comparatorKey == null) {
+comparatorKey =
+
WritableComparatorFactory.get(map1.keySet().iterator().next(), nullSafe, 
nullOrdering);
+comparatorValue =
+
WritableComparatorFactory.get(map1.values().iterator().next(), nullSafe, 
nullOrdering);
+}
+
+Iterator map1KeyIterator = map1.keySet().iterator();
+Iterator map2KeyIterator = map2.keySet().iterator();
+Iterator map1ValueIterator = map1.values().iterator();
+Iterator map2ValueIterator = map2.values().iterator();
+
+// For map of size greater than 1, the ordering is based on the key 
value. If key values are same till the
+// size of smaller map, then the size is checked for ordering.
+int size = map1.size() > map2.size() ? map2.size() : map1.size();
+for (int i = 0; i < size; i++) {
+result = comparatorKey.compare(map1KeyIterator.next(), 
map2KeyIterator.next());
+if (result != 0) {
+return result;
+}
+result = comparatorValue.compare(map1ValueIterator.next(), 
map2ValueIterator.next());
+if (result != 0) {
+return result;
+}
+}
+return map1.size() == map2.size() ? 0 : map1.size() > map2.size() ? 1 
: -1;
+}

Review comment:
   For hash join, just equality is sufficient. But for merge join we need 
to judge the direction. So the comparison has to match the way the records are 
sorted by the mapper. As per the sorter used by mapper task, hash-maps with 
same key-value pair in different order are not equal. So the merge join also 
behaves the same way. But map join treats them as equal. We have to modify the 
pipelined sorter code to fix this issue. I will remove the support for map type 
from this patch and create a separate ticket with the above info.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586276)
Time Spent: 2h 50m  (was: 2h 40m)

> Add support for complex types columns in Hive Joins
> ---
>
> Key: HIVE-24883
> URL: https://issues.apache.org/jira/browse/HIVE-24883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: 

[jira] [Updated] (HIVE-25041) During "schematool --verbose -dbType derby -initSchema" I'm getting "utocommit on" (with a missing ''a").

2021-04-20 Thread NOELLE MILTON VEGA (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

NOELLE MILTON VEGA updated HIVE-25041:
--
Description: 
Hello Friends:

I'm issuing the below command, but am getting the exception shown. This is a 
*pseudo-distributed* mode setup of *HIVE* and *HADOOP* (simple), so I've edited 
a tiny few files (just following the vanilla instructions – nothing fancy).

Yet somewhere it looks like there's a typo, perhaps in this file:

 
{noformat}
hive-schema-3.1.0.derby.sql{noformat}
 

>From the below, {color:#0747a6}*utocommit on*{color} looks like it should be 
>{color:#0747a6}*autocommit on*{color}.
{code:java}
jdoe@fedora-33$ ${HIVE_HOME}/bin/schematool --verbose -dbType derby -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/hadoop/hadoop.d/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/hadoop/hive.d/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2021-04-20 21:09:57,605 INFO [main] conf.HiveConf 
(HiveConf.java:findConfigFile(187)) - Found configuration file 
file:/opt/hadoop/hive.d/conf/hive-site.xml
2021-04-20 21:09:58,013 INFO [main] tools.HiveSchemaHelper 
(HiveSchemaHelper.java:logAndPrintToStdout(117)) - Metastore connection URL: 
jdbc:derby:;databaseName=metastore_db;create=true
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
2021-04-20 21:09:58,014 INFO [main] tools.HiveSchemaHelper 
(HiveSchemaHelper.java:logAndPrintToStdout(117)) - Metastore Connection Driver 
: org.apache.derby.jdbc.EmbeddedDriver
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
2021-04-20 21:09:58,014 INFO [main] tools.HiveSchemaHelper 
(HiveSchemaHelper.java:logAndPrintToStdout(117)) - Metastore connection User: 
APP
Metastore connection User: APP
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.derby.sql
Connecting to jdbc:derby:;databaseName=metastore_db;create=true
Connected to: Apache Derby (version 10.14.1.0 - (1808820))
Driver: Apache Derby Embedded JDBC Driver (version 10.14.1.0 - (1808820))
Transaction isolation: TRANSACTION_READ_COMMITTED
0: jdbc:derby:> utocommit on
Error: Syntax error: Encountered "utocommit" at line 1, column 1. 
(state=42X01,code=3)
Closing: 0: jdbc:derby:;databaseName=metastore_db;create=true
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization 
FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization 
FAILED! Metastore state would be inconsistent !!
 at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:594)
 at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:567)
 at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1517)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Schema script failed, errorcode 2
 at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:1226)
 at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:1204)
 at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:590)
 ... 8 more
*** schemaTool failed ***{code}
 

Versions are:
{code:java}
Hive..: v3.1.2
Hadoop: v3.3.0{code}
Any ideas? Thank you.

  was:
Hello Friends:

I'm issuing the below command, but am getting the exception shown. This is a 
*pseudo-distributed* mode setup of *HIVE* and *HADOOP* (simple), so I've edited 
a tiny few files (just following the vanilla instructions -- nothing fancy).

Yet somewhere it looks like there's a typo, perhaps in this file*:*

 
{noformat}
hive-schema-3.1.0.derby.sql{noformat}
 

>From the below, {color:#0747a6}*utocommit on*{color} looks like it should be 
>{color:#0747a6}*autocommit on*{color}.
{code:java}
jdoe@fedora-33$ ${HIVE_HOME}/bin/schematool --verbose -dbType derby -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/hadoop/hadoop.d/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 

[jira] [Updated] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman updated HIVE-25040:

Status: Patch Available  (was: In Progress)

> Drop database cascade cannot remove persistent functions
> 
>
> Key: HIVE-25040
> URL: https://issues.apache.org/jira/browse/HIVE-25040
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>
> Add a persistent custom function to a database using a Jar file: CREATE 
> FUNCTION myfunction USING JAR 'x.jar';
> Restart the session and immediately issue DROP DATABASE mydb CASCADE. It 
> throws ClassCastException:
> {code:java}
> java.lang.ClassNotFoundException: DummyUDF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
>   at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_282]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  ~[hadoop-common-3.1.1.7.2.10.0-36.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_282]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_282]
>   at 
> 

[jira] [Work started] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25040 started by Mustafa İman.
---
> Drop database cascade cannot remove persistent functions
> 
>
> Key: HIVE-25040
> URL: https://issues.apache.org/jira/browse/HIVE-25040
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>
> Add a persistent custom function to a database using a Jar file: CREATE 
> FUNCTION myfunction USING JAR 'x.jar';
> Restart the session and immediately issue DROP DATABASE mydb CASCADE. It 
> throws ClassCastException:
> {code:java}
> java.lang.ClassNotFoundException: DummyUDF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
>   at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_282]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  ~[hadoop-common-3.1.1.7.2.10.0-36.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_282]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_282]
>   at 
> 

[jira] [Assigned] (HIVE-25040) Drop database cascade cannot remove persistent functions

2021-04-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman reassigned HIVE-25040:
---


> Drop database cascade cannot remove persistent functions
> 
>
> Key: HIVE-25040
> URL: https://issues.apache.org/jira/browse/HIVE-25040
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>
> Add a persistent custom function to a database using a Jar file: CREATE 
> FUNCTION myfunction USING JAR 'x.jar';
> Restart the session and immediately issue DROP DATABASE mydb CASCADE. It 
> throws ClassCastException:
> {code:java}
> java.lang.ClassNotFoundException: DummyUDF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_282]
>   at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.getPermanentUdfClass(Registry.java:549)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:586)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:577) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunctions(Registry.java:607)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunctions(FunctionRegistry.java:1731)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:62)
>  ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:748) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_282]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_282]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  ~[hadoop-common-3.1.1.7.2.10.0-36.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  ~[hive-service-3.1.3000.7.2.10.0-36.jar:3.1.3000.7.2.10.0-36]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_282]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_282]
>   at 
> 

[jira] [Assigned] (HIVE-25039) Disable discovery.partitions config for external tables by default

2021-04-20 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-25039:



> Disable discovery.partitions config for external tables by default
> --
>
> Key: HIVE-25039
> URL: https://issues.apache.org/jira/browse/HIVE-25039
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> We need to disable the discovery.partitions config for the external tables 
> with partitions by default because every HMS API call to the external 
> partition (for example S3) is costly. We can selectively enable this config 
> for tables by: alter table set tblproperty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=586208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586208
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 21:24
Start Date: 20/Apr/21 21:24
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r617040976



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .executeWith(tableExecutor)
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
-HiveIcebergRecordWriter writer = writers.get(output);
-DataFile[] closedFiles = writer != null ? writer.dataFiles() : new 
DataFile[0];
-String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-attemptID.getJobID(), attemptID.getTaskID().getId());
-
-// Creating the file containing the data files generated by this 
task for this table
-createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());
+if (table != null) {

Review comment:
   Actually, this issue does not occur with mr, only with tez, therefore it 
currently does not come up in upstream Iceberg. Will fix this once Tez writes 
have been enabled there too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586208)
Time Spent: 2.5h  (was: 2h 20m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586035
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 17:37
Start Date: 20/Apr/21 17:37
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r616901691



##
File path: data/conf/iceberg/hive-site.xml
##
@@ -0,0 +1,321 @@
+

Review comment:
   I needed a new conf, because of the `hive.in.iceberg.test=true` param. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586035)
Time Spent: 1h 10m  (was: 1h)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586033
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 17:35
Start Date: 20/Apr/21 17:35
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r616899607



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -770,6 +770,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "If not set, defaults to the codec extension for text files (e.g. 
\".gz\"), or no extension otherwise."),
 
 HIVE_IN_TEST("hive.in.test", false, "internal usage only, true in test 
mode", true),
+HIVE_IN_TEST_ICEBERG("hive.in.iceberg.test", false, "internal usage only, 
true when " +

Review comment:
   During the creation of the iceberg table an exclusive lock is requested 
on the hms table. This logic is guarded by the value of `HIVE_IN_TEST` and 
`HINVE_IN_TEZ_TEST`. If any of these values are true and the operationtype is 
unset, we get an exception, which is the case in q tests. 
   ```java
   f (lc.isSetOperationType() && lc.getOperationType() == 
DataOperationType.UNSET &&
   (MetastoreConf.getBoolVar(conf, ConfVars.HIVE_IN_TEST) || 
MetastoreConf.getBoolVar(conf, ConfVars.HIVE_IN_TEZ_TEST))) {
 throw new IllegalStateException("Bug: operationType=" + 
lc.getOperationType() + " for component "
 + lc + " agentInfo=" + rqst.getAgentInfo());
   }
   ```
   Changing the `HIVE_IN_TEST` to false is not the right solution, because it 
is evaluated in many other places as well, activating unwanted code paths. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586033)
Time Spent: 1h  (was: 50m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=586015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586015
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 17:16
Start Date: 20/Apr/21 17:16
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r616886326



##
File path: itests/pom.xml
##
@@ -248,6 +248,17 @@
 ${project.version}
 tests
   
+  

Review comment:
   I added the `iceberg` profile to the `.travis.yml`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586015)
Time Spent: 50m  (was: 40m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=585991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585991
 ]

ASF GitHub Bot logged work on HIVE-24974:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 16:36
Start Date: 20/Apr/21 16:36
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2148:
URL: https://github.com/apache/hive/pull/2148#discussion_r616843875



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -254,6 +255,10 @@ public int execute() {
 try {
   Set statusGetOpts = 
EnumSet.of(StatusGetOpts.GET_COUNTERS);
   TezCounters dagCounters = 
dagClient.getDAGStatus(statusGetOpts).getDAGCounters();
+
+  if (HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED)) {

Review comment:
   Maybe a feature flag for these specific metrics would be a good idea. 
What do you think?

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,26 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+// Configs having to do with DeltaFilesMetricReporter, which collects 
lists of most recently active tables
+// with the most number of active/obsolete deltas.
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+
HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", 
"7200s",
+new TimeValidator(TimeUnit.SECONDS),
+"Maximum lifetime in seconds for an entry in the ACID metrics cache."),
+
HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval",
 "30s",
+new TimeValidator(TimeUnit.SECONDS),
+"Reporting period for ACID metrics in seconds."),
+
HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold",
 100,
+"The minimum number of active delta files a table/partition must be 
included in the ACID metrics report."),
+
HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.obsolete.delta.num.threshold",
 100,
+"The minimum number of obsolete delta files a table/partition must be 
included in the ACID metrics report."),
+
HIVE_TXN_ACID_METRICS_DELTA_CHECK_THRESHOLD("hive.txn.acid.metrics.delta.check.threshold",
 "300s",
+new TimeValidator(TimeUnit.SECONDS),
+"Deltas not older than this value will not be included in the ACID 
metrics report."),

Review comment:
   Should be: Deltas older than this value will not be included in the ACID 
metrics report

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,26 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+// Configs having to do with DeltaFilesMetricReporter, which collects 
lists of most recently active tables
+// with the most number of active/obsolete deltas.
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+
HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", 
"7200s",
+new TimeValidator(TimeUnit.SECONDS),
+"Maximum lifetime in seconds for an entry in the ACID metrics cache."),
+
HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval",
 "30s",
+new TimeValidator(TimeUnit.SECONDS),
+"Reporting period for ACID metrics in seconds."),
+
HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold",
 100,
+"The minimum number of active delta files a table/partition must be 
included in the ACID metrics report."),
+
HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.obsolete.delta.num.threshold",
 100,
+"The minimum number of obsolete delta files a table/partition must be 
included in the ACID metrics report."),

Review comment:
   ...a table/partition must be ...
   should be: ...a table/partition must have to be ...

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed 

[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=585990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585990
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 16:36
Start Date: 20/Apr/21 16:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r616855155



##
File path: data/conf/iceberg/hive-site.xml
##
@@ -0,0 +1,321 @@
+

Review comment:
   do we need our own hive-site at this stage? Can not we get away with a 
general one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585990)
Time Spent: 40m  (was: 0.5h)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=585985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585985
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 16:33
Start Date: 20/Apr/21 16:33
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r616852978



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -770,6 +770,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "If not set, defaults to the codec extension for text files (e.g. 
\".gz\"), or no extension otherwise."),
 
 HIVE_IN_TEST("hive.in.test", false, "internal usage only, true in test 
mode", true),
+HIVE_IN_TEST_ICEBERG("hive.in.iceberg.test", false, "internal usage only, 
true when " +

Review comment:
   why is this needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585985)
Time Spent: 0.5h  (was: 20m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=585984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585984
 ]

ASF GitHub Bot logged work on HIVE-25010:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 16:32
Start Date: 20/Apr/21 16:32
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2193:
URL: https://github.com/apache/hive/pull/2193#discussion_r616852051



##
File path: itests/pom.xml
##
@@ -248,6 +248,17 @@
 ${project.version}
 tests
   
+  

Review comment:
   Wouldn't this dependency fail if the `hive-iceberg-handler` never 
compiled before on this machine?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585984)
Time Spent: 20m  (was: 10m)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25037) Create metric: Number of tables with > x aborts

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25037:
--
Labels: pull-request-available  (was: )

> Create metric: Number of tables with > x aborts
> ---
>
> Key: HIVE-25037
> URL: https://issues.apache.org/jira/browse/HIVE-25037
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Create metric about number of tables with > x aborts.
> x should be settable and default to 1500.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25037) Create metric: Number of tables with > x aborts

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25037?focusedWorklogId=585983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585983
 ]

ASF GitHub Bot logged work on HIVE-25037:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 16:31
Start Date: 20/Apr/21 16:31
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #2199:
URL: https://github.com/apache/hive/pull/2199


   
   
   ### What changes were proposed in this pull request?
   
   Introduce a new metric for tables with greater number of aborted 
transactions than x, where x comes from metastore conf.
   
   ### Why are the changes needed?
   
   Subtask is part of the compaction observability initiative.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585983)
Remaining Estimate: 0h
Time Spent: 10m

> Create metric: Number of tables with > x aborts
> ---
>
> Key: HIVE-25037
> URL: https://issues.apache.org/jira/browse/HIVE-25037
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Create metric about number of tables with > x aborts.
> x should be settable and default to 1500.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér updated HIVE-25010:
-
Description: We should create iceberg specific drivers to run iceberg 
qtests.  (was: We should create a qtest-iceberg module under itests. )

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-04-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér updated HIVE-25010:
-
Summary: Create TestIcebergCliDriver and TestIcebergNegativeCliDriver  
(was: Create qtest-iceberg module)

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should create a qtest-iceberg module under itests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25038) Increase Iceberg test timeout and remove mr tests

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25038:
--
Labels: pull-request-available  (was: )

> Increase Iceberg test timeout and remove mr tests
> -
>
> Key: HIVE-25038
> URL: https://issues.apache.org/jira/browse/HIVE-25038
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25038) Increase Iceberg test timeout and remove mr tests

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25038?focusedWorklogId=585862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585862
 ]

ASF GitHub Bot logged work on HIVE-25038:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 14:32
Start Date: 20/Apr/21 14:32
Worklog Time Spent: 10m 
  Work Description: marton-bod opened a new pull request #2198:
URL: https://github.com/apache/hive/pull/2198


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585862)
Remaining Estimate: 0h
Time Spent: 10m

> Increase Iceberg test timeout and remove mr tests
> -
>
> Key: HIVE-25038
> URL: https://issues.apache.org/jira/browse/HIVE-25038
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25038) Increase Iceberg test timeout and remove mr tests

2021-04-20 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25038:
-


> Increase Iceberg test timeout and remove mr tests
> -
>
> Key: HIVE-25038
> URL: https://issues.apache.org/jira/browse/HIVE-25038
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24665) Add commitAlterTable method to the HiveMetaHook interface

2021-04-20 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-24665:
-

Assignee: László Pintér  (was: Marton Bod)

> Add commitAlterTable method to the HiveMetaHook interface
> -
>
> Key: HIVE-24665
> URL: https://issues.apache.org/jira/browse/HIVE-24665
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: László Pintér
>Priority: Major
>
> Currently we have pre and post hooks for create table and drop table 
> commands, but only a pre hook for alter table commands. We should add a post 
> hook as well (with a default implementation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585823
 ]

ASF GitHub Bot logged work on HIVE-25026:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 14:08
Start Date: 20/Apr/21 14:08
Worklog Time Spent: 10m 
  Work Description: zhangheihei commented on pull request #2189:
URL: https://github.com/apache/hive/pull/2189#issuecomment-823304712


   > > hi @kgyrtkirk @pvary
   > > Please help check why the split18-postprocess failed
   > 
   > @zhangheihei: You can follow the `Details` link next to the failed CI run, 
and that will send you a page where you can see the test details. On the top 
right of the page you have a `Tests` link where you can see the test results.
   > 
   > In this specific case, you have a failure with the 
`TestNegativeCliDriver`, likely a flaky test. I would push a minimal commit 
without a real change, to retrigger the CI again.
   > 
   > Thanks,
   > Peter
   Thanks for your check review. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585823)
Time Spent: 50m  (was: 40m)

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-25026.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25035:
--
Labels: pull-request-available  (was: )

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=585802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585802
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 13:39
Start Date: 20/Apr/21 13:39
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2197:
URL: https://github.com/apache/hive/pull/2197


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585802)
Remaining Estimate: 0h
Time Spent: 10m

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25037) Create metric: Number of tables with > x aborts

2021-04-20 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25037 started by Antal Sinkovits.
--
> Create metric: Number of tables with > x aborts
> ---
>
> Key: HIVE-25037
> URL: https://issues.apache.org/jira/browse/HIVE-25037
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> Create metric about number of tables with > x aborts.
> x should be settable and default to 1500.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25037) Create metric: Number of tables with > x aborts

2021-04-20 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-25037:
--


> Create metric: Number of tables with > x aborts
> ---
>
> Key: HIVE-25037
> URL: https://issues.apache.org/jira/browse/HIVE-25037
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> Create metric about number of tables with > x aborts.
> x should be settable and default to 1500.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24986) Support aggregates on columns present in rollups

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24986?focusedWorklogId=585785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585785
 ]

ASF GitHub Bot logged work on HIVE-24986:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 13:07
Start Date: 20/Apr/21 13:07
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2159:
URL: https://github.com/apache/hive/pull/2159


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585785)
Time Spent: 20m  (was: 10m)

> Support aggregates on columns present in rollups
> 
>
> Key: HIVE-24986
> URL: https://issues.apache.org/jira/browse/HIVE-24986
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> SELECT key, value, count(key) FROM src GROUP BY key, value with rollup;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585781
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 12:59
Start Date: 20/Apr/21 12:59
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616658639



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .executeWith(tableExecutor)
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
-HiveIcebergRecordWriter writer = writers.get(output);
-DataFile[] closedFiles = writer != null ? writer.dataFiles() : new 
DataFile[0];
-String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-attemptID.getJobID(), attemptID.getTaskID().getId());
-
-// Creating the file containing the data files generated by this 
task for this table
-createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());
+if (table != null) {

Review comment:
   Sure, will do!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585781)
Time Spent: 2h 20m  (was: 2h 10m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-20 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25035:
---


> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585777
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 12:48
Start Date: 20/Apr/21 12:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616649742



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .executeWith(tableExecutor)
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
-HiveIcebergRecordWriter writer = writers.get(output);
-DataFile[] closedFiles = writer != null ? writer.dataFiles() : new 
DataFile[0];
-String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-attemptID.getJobID(), attemptID.getTaskID().getId());
-
-// Creating the file containing the data files generated by this 
task for this table
-createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());
+if (table != null) {

Review comment:
   Got it, then we should push this change to the Iceberg code as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585777)
Time Spent: 2h 10m  (was: 2h)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24992) Incremental rebuild of MV having aggregate in presence of delete operation

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24992:
--
Labels: pull-request-available  (was: )

> Incremental rebuild of MV having aggregate in presence of delete operation
> --
>
> Key: HIVE-24992
> URL: https://issues.apache.org/jira/browse/HIVE-24992
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Extension of HIVE-24854: handle cases when the Materialized view definition 
> has aggregation like
> {code}
> CREATE MATERIALIZED VIEW cmv_mat_view_n5 DISABLE REWRITE TBLPROPERTIES 
> ('transactional'='true') AS
>   SELECT cmv_basetable_n5.a, cmv_basetable_2_n2.c, sum(cmv_basetable_2_n2.d)
>   FROM cmv_basetable_n5 JOIN cmv_basetable_2_n2 ON (cmv_basetable_n5.a = 
> cmv_basetable_2_n2.a)
>   WHERE cmv_basetable_2_n2.c > 10.0
>   GROUP BY cmv_basetable_n5.a, cmv_basetable_2_n2.c;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24992) Incremental rebuild of MV having aggregate in presence of delete operation

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24992?focusedWorklogId=585774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585774
 ]

ASF GitHub Bot logged work on HIVE-24992:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 12:39
Start Date: 20/Apr/21 12:39
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2196:
URL: https://github.com/apache/hive/pull/2196


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585774)
Remaining Estimate: 0h
Time Spent: 10m

> Incremental rebuild of MV having aggregate in presence of delete operation
> --
>
> Key: HIVE-24992
> URL: https://issues.apache.org/jira/browse/HIVE-24992
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Extension of HIVE-24854: handle cases when the Materialized view definition 
> has aggregation like
> {code}
> CREATE MATERIALIZED VIEW cmv_mat_view_n5 DISABLE REWRITE TBLPROPERTIES 
> ('transactional'='true') AS
>   SELECT cmv_basetable_n5.a, cmv_basetable_2_n2.c, sum(cmv_basetable_2_n2.d)
>   FROM cmv_basetable_n5 JOIN cmv_basetable_2_n2 ON (cmv_basetable_n5.a = 
> cmv_basetable_2_n2.a)
>   WHERE cmv_basetable_2_n2.c > 10.0
>   GROUP BY cmv_basetable_n5.a, cmv_basetable_2_n2.c;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23763) Query based minor compaction produces wrong files when rows with different buckets Ids are processed by the same FileSinkOperator

2021-04-20 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325760#comment-17325760
 ] 

Karen Coppage commented on HIVE-23763:
--

A similar-looking situation can occur in MR-based minor compaction, in that 
case HIVE-17231 can help.

> Query based minor compaction produces wrong files when rows with different 
> buckets Ids are processed by the same FileSinkOperator
> -
>
> Key: HIVE-23763
> URL: https://issues.apache.org/jira/browse/HIVE-23763
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> How to reproduce:
> - Create an unbucketed ACID table
> - Insert a bigger amount of data into this table so there would be multiple 
> bucket files in the table
> The files in the table should look like this:
> /warehouse/tablespace/managed/hive/bubu_acid/delta_001_001_/bucket_0_0
> /warehouse/tablespace/managed/hive/bubu_acid/delta_001_001_/bucket_1_0
> /warehouse/tablespace/managed/hive/bubu_acid/delta_001_001_/bucket_2_0
> /warehouse/tablespace/managed/hive/bubu_acid/delta_001_001_/bucket_3_0
> /warehouse/tablespace/managed/hive/bubu_acid/delta_001_001_/bucket_4_0
> /warehouse/tablespace/managed/hive/bubu_acid/delta_001_001_/bucket_5_0
> - Do some delete on rows with different bucket Ids
> The files in a delete delta should look like this:
> /warehouse/tablespace/managed/hive/bubu_acid/delete_delta_002_002_/bucket_0
> /warehouse/tablespace/managed/hive/bubu_acid/delete_delta_006_006_/bucket_3
> /warehouse/tablespace/managed/hive/bubu_acid/delete_delta_006_006_/bucket_1
> - Run the query-based minor compaction
> - After the compaction the newly created delete delta containes only 1 bucket 
> file. This file contains rows from all buckets and the table becomes unusable
> /warehouse/tablespace/managed/hive/bubu_acid/delete_delta_001_007_v066/bucket_0
> The issue happens only if rows with different bucket Ids are processed by the 
> same FileSinkOperator. 
> In the FileSinkOperator.process method, the files for the compaction table 
> are created like this:
> {noformat}
> if (!bDynParts && !filesCreated) {
>   if (lbDirName != null) {
> if (valToPaths.get(lbDirName) == null) {
>   createNewPaths(null, lbDirName);
> }
>   } else {
> if (conf.isCompactionTable()) {
>   int bucketProperty = getBucketProperty(row);
>   bucketId = 
> BucketCodec.determineVersion(bucketProperty).decodeWriterId(bucketProperty);
> }
> createBucketFiles(fsp);
>   }
> }
> {noformat}
> When the first row is processed, the file is created and then the 
> filesCreated variable is set to true. Then when the other rows are processed, 
> the first if statement will be false, so no new file gets created, but the 
> row will be written into the file created for the first row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25018) Create new metrics about Initiator / Cleaner failures

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25018?focusedWorklogId=585754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585754
 ]

ASF GitHub Bot logged work on HIVE-25018:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 12:12
Start Date: 20/Apr/21 12:12
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2182:
URL: https://github.com/apache/hive/pull/2182#discussion_r616623561



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java
##
@@ -23,7 +23,9 @@
   public static final String COMPACTION_STATUS_PREFIX = "compaction_num_";
   public static final String COMPACTION_OLDEST_ENQUEUE_AGE = 
"compaction_oldest_enqueue_age_in_sec";
   public static final String COMPACTION_INITIATOR_CYCLE = 
"compaction_initiator_cycle";
+  public static final String COMPACTION_FAILED_INITIATOR_CYCLE = 
"compaction_failed_initiator_cycle";

Review comment:
   Renamed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585754)
Time Spent: 0.5h  (was: 20m)

> Create new metrics about Initiator / Cleaner failures
> -
>
> Key: HIVE-25018
> URL: https://issues.apache.org/jira/browse/HIVE-25018
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 2 new metrics should be defined:
> Failed Initiator cycles
> Failed Cleaner cycles
> They should be measured as part of the error handling in the services, the 
> lock timeout on AUX lock, should be ignored.
> These should be RatioGauges (fail / success)
> A RatioGauge implementation is available in the metrics package in common, a 
> similar one should be created in the metastore. The common is build on top of 
> MetricsVariable interface, where someone provides the metric from outside, in 
> the metastore it should be done like the Gauge implementation, where the 
> metrics class handles the AtomicIntegers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585745
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:57
Start Date: 20/Apr/21 11:57
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616613187



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -256,4 +265,74 @@ private static PartitionSpec spec(Schema schema, 
Properties properties,
   return PartitionSpec.unpartitioned();
 }
   }
+
+  @Override
+  public void commitInsertTable(org.apache.hadoop.hive.metastore.api.Table 
table, boolean overwrite)
+  throws MetaException {
+String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
+
+// check status to determine whether we need to commit or to abort
+JobConf jobConf = new JobConf(conf);
+String queryIdKey = jobConf.get("hive.query.id") + "." + tableName + 
".result";
+boolean success = jobConf.getBoolean(queryIdKey, false);
+
+// construct the job context
+JobID jobID = JobID.forName(jobConf.get(TezTask.HIVE_TEZ_COMMIT_JOB_ID + 
"." + tableName));
+int numTasks = conf.getInt(TezTask.HIVE_TEZ_COMMIT_TASK_COUNT + "." + 
tableName, -1);
+jobConf.setNumReduceTasks(numTasks);
+JobContext jobContext = new JobContextImpl(jobConf, jobID, null);
+
+// we should only commit this current table because
+// for multi-table inserts, this hook method will be called sequentially 
for each target table
+jobConf.set(InputFormatConfig.OUTPUT_TABLES, tableName);
+
+OutputCommitter committer = new HiveIcebergOutputCommitter();
+try {
+  if (success) {
+try {
+  committer.commitJob(jobContext);
+} catch (Exception commitExc) {
+  LOG.error("Error while trying to commit job (table: {}, jobID: {}). 
Will abort it now.",
+  tableName, jobID, commitExc);
+  abortJob(jobContext, committer, true);
+  throw new MetaException("Unable to commit job: " + 
commitExc.getMessage());
+}
+  } else {
+abortJob(jobContext, committer, false);
+  }
+} finally {
+  // avoid config pollution with prefixed/suffixed keys
+  cleanCommitConfig(queryIdKey, tableName);
+}
+  }
+
+  private void abortJob(JobContext jobContext, OutputCommitter committer, 
boolean suppressExc) throws MetaException {
+try {
+  committer.abortJob(jobContext, JobStatus.State.FAILED);
+} catch (IOException abortExc) {
+  LOG.error("Error while trying to abort failed job. There might be 
uncleaned data files.", abortExc);
+  if (!suppressExc) {
+throw new MetaException("Unable to abort job: " + 
abortExc.getMessage());
+  }
+}
+  }
+
+  private void cleanCommitConfig(String queryIdKey, String tableName) {
+conf.unset(TezTask.HIVE_TEZ_COMMIT_JOB_ID + "." + tableName);
+conf.unset(TezTask.HIVE_TEZ_COMMIT_TASK_COUNT + "." + tableName);
+conf.unset(InputFormatConfig.SERIALIZED_TABLE_PREFIX + tableName);
+conf.unset(queryIdKey);
+  }
+
+  @Override
+  public void preInsertTable(org.apache.hadoop.hive.metastore.api.Table table, 
boolean overwrite)
+  throws MetaException {
+// do nothing
+  }
+
+  @Override
+  public void rollbackInsertTable(org.apache.hadoop.hive.metastore.api.Table 
table, boolean overwrite)
+  throws MetaException {
+// do nothing

Review comment:
   I didn't put it there because f there is an execution error we should 
get a non-0 return code from Tez AM, instead of an exception. But now that I'm 
thinking about it, I suppose Hive will throw an exception at the end if the 
code was non-0, but I will test it out to make sure




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585745)
Time Spent: 2h  (was: 1h 50m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to 

[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585741
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:52
Start Date: 20/Apr/21 11:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616608877



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .executeWith(tableExecutor)
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
-HiveIcebergRecordWriter writer = writers.get(output);
-DataFile[] closedFiles = writer != null ? writer.dataFiles() : new 
DataFile[0];
-String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-attemptID.getJobID(), attemptID.getTaskID().getId());
-
-// Creating the file containing the data files generated by this 
task for this table
-createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());
+if (table != null) {

Review comment:
   This happens during task commit, so before the commitInsert hook is 
called. 
   
   The essential problem here is that `OUTPUT_TABLES` contains all the tables, 
however, only those tables are serialized into the jobconfig that are relevant 
for the given task. So it tries to iterate over 1...N tables (based on 
`OUTPUT_TABLES`), but only has access to serialized Table 1 (hence the if). The 
whole parallel commit logic for multitable inserts on both the task commit and 
job commit side are broken I think, if there is more than one vertex writing to 
target tables. Currently the tests pass because it creates a single writer 
vertex, which will have both tables serialized into its config.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585741)
Time Spent: 1h 50m  (was: 1h 40m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585739
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:50
Start Date: 20/Apr/21 11:50
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616608877



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .executeWith(tableExecutor)
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
-HiveIcebergRecordWriter writer = writers.get(output);
-DataFile[] closedFiles = writer != null ? writer.dataFiles() : new 
DataFile[0];
-String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-attemptID.getJobID(), attemptID.getTaskID().getId());
-
-// Creating the file containing the data files generated by this 
task for this table
-createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());
+if (table != null) {

Review comment:
   This happens during task commit, so before the commitInsert hook is 
called. 
   
   The essential problem here is that `OUTPUT_TABLES` contains all the tables, 
however, only those tables are serialized into the jobconfig that are relevant 
for the given task. So it tries to iterate over 1...N tables, but only has 
access to serialized Table 1 (hence the if).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585739)
Time Spent: 1h 40m  (was: 1.5h)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585732
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:42
Start Date: 20/Apr/21 11:42
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616603785



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
##
@@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .executeWith(tableExecutor)
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
-HiveIcebergRecordWriter writer = writers.get(output);
-DataFile[] closedFiles = writer != null ? writer.dataFiles() : new 
DataFile[0];
-String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-attemptID.getJobID(), attemptID.getTaskID().getId());
-
-// Creating the file containing the data files generated by this 
task for this table
-createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());
+if (table != null) {

Review comment:
   Could we do this check in `HiveIcebergMetaHook.commitInsertTable`?
   Then we would not need any change here...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585732)
Time Spent: 1.5h  (was: 1h 20m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585731
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:41
Start Date: 20/Apr/21 11:41
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616603042



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -256,4 +265,74 @@ private static PartitionSpec spec(Schema schema, 
Properties properties,
   return PartitionSpec.unpartitioned();
 }
   }
+
+  @Override
+  public void commitInsertTable(org.apache.hadoop.hive.metastore.api.Table 
table, boolean overwrite)
+  throws MetaException {
+String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
+
+// check status to determine whether we need to commit or to abort
+JobConf jobConf = new JobConf(conf);
+String queryIdKey = jobConf.get("hive.query.id") + "." + tableName + 
".result";
+boolean success = jobConf.getBoolean(queryIdKey, false);
+
+// construct the job context
+JobID jobID = JobID.forName(jobConf.get(TezTask.HIVE_TEZ_COMMIT_JOB_ID + 
"." + tableName));
+int numTasks = conf.getInt(TezTask.HIVE_TEZ_COMMIT_TASK_COUNT + "." + 
tableName, -1);
+jobConf.setNumReduceTasks(numTasks);
+JobContext jobContext = new JobContextImpl(jobConf, jobID, null);
+
+// we should only commit this current table because
+// for multi-table inserts, this hook method will be called sequentially 
for each target table
+jobConf.set(InputFormatConfig.OUTPUT_TABLES, tableName);
+
+OutputCommitter committer = new HiveIcebergOutputCommitter();
+try {
+  if (success) {
+try {
+  committer.commitJob(jobContext);
+} catch (Exception commitExc) {
+  LOG.error("Error while trying to commit job (table: {}, jobID: {}). 
Will abort it now.",
+  tableName, jobID, commitExc);
+  abortJob(jobContext, committer, true);
+  throw new MetaException("Unable to commit job: " + 
commitExc.getMessage());
+}
+  } else {
+abortJob(jobContext, committer, false);
+  }
+} finally {
+  // avoid config pollution with prefixed/suffixed keys
+  cleanCommitConfig(queryIdKey, tableName);
+}
+  }
+
+  private void abortJob(JobContext jobContext, OutputCommitter committer, 
boolean suppressExc) throws MetaException {
+try {
+  committer.abortJob(jobContext, JobStatus.State.FAILED);
+} catch (IOException abortExc) {
+  LOG.error("Error while trying to abort failed job. There might be 
uncleaned data files.", abortExc);
+  if (!suppressExc) {
+throw new MetaException("Unable to abort job: " + 
abortExc.getMessage());
+  }
+}
+  }
+
+  private void cleanCommitConfig(String queryIdKey, String tableName) {
+conf.unset(TezTask.HIVE_TEZ_COMMIT_JOB_ID + "." + tableName);
+conf.unset(TezTask.HIVE_TEZ_COMMIT_TASK_COUNT + "." + tableName);
+conf.unset(InputFormatConfig.SERIALIZED_TABLE_PREFIX + tableName);
+conf.unset(queryIdKey);
+  }
+
+  @Override
+  public void preInsertTable(org.apache.hadoop.hive.metastore.api.Table table, 
boolean overwrite)
+  throws MetaException {
+// do nothing
+  }
+
+  @Override
+  public void rollbackInsertTable(org.apache.hadoop.hive.metastore.api.Table 
table, boolean overwrite)
+  throws MetaException {
+// do nothing

Review comment:
   Shouldn't we call abortJob here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585731)
Time Spent: 1h 20m  (was: 1h 10m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585728
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:37
Start Date: 20/Apr/21 11:37
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616599645



##
File path: 
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt
##
@@ -34,20 +34,17 @@ public class  extends VectorExpression {
 
   private static final long serialVersionUID = 1L;
 
-  private final int colNum1;
   private final int colNum2;

Review comment:
   I agree that the current solution is not really clean by having only the 
first column put into VectorExpression
   a couple of notes here, which needs to be discussed before proceeding with 
this huge refactor (which I'm happy to do once we're 100% certain about the 
"perfect" solution):
   
   1. unary, binary is not enough, unfortunately, we have even expressions 
involving even more cols, this is not a problem, we have the language support 
for that :) tertiary, quaternary...
   
   2. what's confusing is, how to show with simple class names that 
unary/binary/... is only a story about the input columns? an expression can 
have constants too, e.g. in IfExprScalarScalar.txt:
   ```
this.arg1Column = arg1Column;
this.arg2Scalar = arg2Scalar;
this.arg3Scalar = arg3Scalar;
   ```
   in our terminology here, this is a unary expression because of arg1Column + 
scalars, but in reality, it's obviously not a unary function...
   
   3. with subclasses, we'll have to implement a general 
VectorExpression.setInputColumnNum(int i, int j, int k, ...vararg), otherwise, 
we won't be able to change the input column numbers (which is important, this 
was the intention of this huge vector expression refactor), I think this will 
work by simply overriding vararg method in subclasses




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585728)
Time Spent: 1h 50m  (was: 1h 40m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585726
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:36
Start Date: 20/Apr/21 11:36
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616599645



##
File path: 
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt
##
@@ -34,20 +34,17 @@ public class  extends VectorExpression {
 
   private static final long serialVersionUID = 1L;
 
-  private final int colNum1;
   private final int colNum2;

Review comment:
   I agree that the current solution is not really clean by having only the 
first column put into VectorExpression
   a couple of notes here, which needs to be discussed before proceeding with 
this huge refactor (which I'm happy to do once we're 100% certain about the 
"perfect" solution):
   
   1. unary, binary is not enough, unfortunately, we have even expressions 
involving even more cols, this is not a problem, we have the language support 
for that :) tertiary, quaternary...
   
   2. what's confusing is, how to show with simple class names that 
unary/binary/... is only a story about the input columns? an expression can 
have constants too, e.g. in IfExprScalarScalar.txt:
   ```
this.arg1Column = arg1Column;
this.arg2Scalar = arg2Scalar;
this.arg3Scalar = arg3Scalar;
   ```
   in our terminology here, this is a unary expression because of arg1Column + 
scalars, but in reality, it's obviously not a unary function...
   
   3. with subclasses, we'll have to implement a general 
VectorExpression.setInputColumnNum(int i, int j, int k, ...vararg), otherwise, 
we won't be able to change the input column numbers (which is important, this 
was the intention of this huge vector expression refactor), I think this will 
simply work by simply overriding vararg method in subclasses




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585726)
Time Spent: 1h 40m  (was: 1.5h)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585725
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:35
Start Date: 20/Apr/21 11:35
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616599645



##
File path: 
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt
##
@@ -34,20 +34,17 @@ public class  extends VectorExpression {
 
   private static final long serialVersionUID = 1L;
 
-  private final int colNum1;
   private final int colNum2;

Review comment:
   I agree that the current solution is not really clean by having only the 
first column put into VectorExpression
   a couple of notes here, which needs to be discussed before proceeding with 
this huge refactor (which I'm happy to do once we 100% certain about the 
"perfect" solution):
   
   1. unary, binary is not enough, unfortunately, we have even expressions 
involving even more cols, this is not a problem, we have the language support 
for that :) tertiary, quaternary...
   
   2. what's confusing is, how to show with simple class names that 
unary/binary/... is only a story about the input columns? an expression can 
have constants too, e.g. in IfExprScalarScalar.txt:
   ```
this.arg1Column = arg1Column;
this.arg2Scalar = arg2Scalar;
this.arg3Scalar = arg3Scalar;
   ```
   in our terminology here, this is a unary expression because of arg1Column + 
scalars, but in reality, it's obviously not a unary function...
   
   3. with subclasses, we'll have to implement a general 
VectorExpression.setInputColumnNum(int i, int j, int k, ...vararg), otherwise, 
we won't be able to change the input column numbers (which is important, this 
was the intention of this huge vector expression refactor), I think this will 
simply work by simply overriding vararg method in subclasses




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585725)
Time Spent: 1.5h  (was: 1h 20m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25022) Metric about incomplete compactions

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25022?focusedWorklogId=585708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585708
 ]

ASF GitHub Bot logged work on HIVE-25022:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:05
Start Date: 20/Apr/21 11:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #2184:
URL: https://github.com/apache/hive/pull/2184#issuecomment-823185836


   > > LGTM, however, I don't see real benefit from this metric.
   > > ```
   > > major succeeded
   > > major failed
   > > minor failed
   > > minor succeeded
   > > ```
   > > 
   > > 
   > > This would be reported as incomplete compaction. What action do you 
expect from the end-users in this case?
   > 
   > In this case end users should re-run major compaction. Major compaction 
should have run (at 2. major failed) but hasn't since that failure.
   
   How would they know tables/partitions to re-run major compaction? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585708)
Time Spent: 40m  (was: 0.5h)

> Metric about incomplete compactions
> ---
>
> Key: HIVE-25022
> URL: https://issues.apache.org/jira/browse/HIVE-25022
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> "Compactions in a state" metrics (for example compaction_num_working) count 
> the sum of tables/partitions where the last compaction is in that state.
> I propose introducing a new metric about incomplete compactions: i.e. the 
> number of tables/partitions where the last finished compaction* is 
> unsuccessful (failed or "did not initiate"), or where major compaction was 
> unsuccessful then minor compaction succeeded (compaction is not "complete" 
> since major compaction has not succeeded in the time since it should have 
> run).
> Example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major working
> major failed
> major initiated
> major working
> major failed
> major initiated
> major working
> The "compactions in a state" metrics will consider the state of this table: 
> working.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there have been failed compactions since the last succeeded compaction.
> {code}
> Another example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major failed
> minor failed
> minor succeeded
> The "compactions in a state" metrics will consider the state of this table: 
> succeeded.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there hasn't been a major succeeded since major failed.{code}
> Last example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> minor did not initiate
> The "compactions in a state" metrics will consider the state of this table: 
> did not initiate.
> The "incomplete compactions" metric will consider this: incomplete, since the 
> last compaction was "did not initiate"{code}
> *finished compaction: state in (succeeded, failed, attempted/did not initiate)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25018) Create new metrics about Initiator / Cleaner failures

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25018?focusedWorklogId=585707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585707
 ]

ASF GitHub Bot logged work on HIVE-25018:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 11:00
Start Date: 20/Apr/21 11:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2182:
URL: https://github.com/apache/hive/pull/2182#discussion_r616575762



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java
##
@@ -23,7 +23,9 @@
   public static final String COMPACTION_STATUS_PREFIX = "compaction_num_";
   public static final String COMPACTION_OLDEST_ENQUEUE_AGE = 
"compaction_oldest_enqueue_age_in_sec";
   public static final String COMPACTION_INITIATOR_CYCLE = 
"compaction_initiator_cycle";
+  public static final String COMPACTION_FAILED_INITIATOR_CYCLE = 
"compaction_failed_initiator_cycle";

Review comment:
   I would rename both metrics to 
COMPACTION_FAILED_INITIATOR_RATIO(compaction_failed_initiator_ratio), 
COMPACTION_FAILED_CLEANER_RATIO(compaction_failed_cleaner_ratio)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585707)
Time Spent: 20m  (was: 10m)

> Create new metrics about Initiator / Cleaner failures
> -
>
> Key: HIVE-25018
> URL: https://issues.apache.org/jira/browse/HIVE-25018
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 2 new metrics should be defined:
> Failed Initiator cycles
> Failed Cleaner cycles
> They should be measured as part of the error handling in the services, the 
> lock timeout on AUX lock, should be ignored.
> These should be RatioGauges (fail / success)
> A RatioGauge implementation is available in the metrics package in common, a 
> similar one should be created in the metastore. The common is build on top of 
> MetricsVariable interface, where someone provides the metric from outside, in 
> the metastore it should be done like the Gauge implementation, where the 
> metrics class handles the AtomicIntegers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585701
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 10:46
Start Date: 20/Apr/21 10:46
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616566788



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
##
@@ -3005,6 +3003,19 @@ private boolean validatePTFOperator(PTFOperator op, 
VectorizationContext vContex
   }
 }
   }
+  if (vectorPTFDesc.getOrderExprNodeDescs().length > 1) {
+/*
+ * Currently, we need to rule out here all cases where a range 
boundary scanner can run,
+ * basically: 1. bounded start 2. bounded end which is not current row
+ */
+if (windowFrameDef.getWindowType() == WindowType.RANGE
+&& (!windowFrameDef.isStartUnbounded() || 
!windowFrameDef.getEnd().isCurrentRow())) {

Review comment:
   good catch, I'm going to fix the condition to enable unbounded end
   FYI, the connection between this part and boundary scanner is the 
unimplemented method:
   {code}
 @Override
 public boolean isDistanceGreater(Object v1, Object v2, int amt) {
   throw new UnsupportedOperationException("Only unbounded ranges 
supported");
 }
   {code}

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
##
@@ -3005,6 +3003,19 @@ private boolean validatePTFOperator(PTFOperator op, 
VectorizationContext vContex
   }
 }
   }
+  if (vectorPTFDesc.getOrderExprNodeDescs().length > 1) {
+/*
+ * Currently, we need to rule out here all cases where a range 
boundary scanner can run,
+ * basically: 1. bounded start 2. bounded end which is not current row
+ */
+if (windowFrameDef.getWindowType() == WindowType.RANGE
+&& (!windowFrameDef.isStartUnbounded() || 
!windowFrameDef.getEnd().isCurrentRow())) {

Review comment:
   good catch! I'm going to fix the condition to enable unbounded end
   FYI, the connection between this part and boundary scanner is the 
unimplemented method:
   ```
 @Override
 public boolean isDistanceGreater(Object v1, Object v2, int amt) {
   throw new UnsupportedOperationException("Only unbounded ranges 
supported");
 }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585701)
Time Spent: 1h 20m  (was: 1h 10m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25022) Metric about incomplete compactions

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25022?focusedWorklogId=585692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585692
 ]

ASF GitHub Bot logged work on HIVE-25022:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 10:30
Start Date: 20/Apr/21 10:30
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #2184:
URL: https://github.com/apache/hive/pull/2184#issuecomment-823166632


   > LGTM, however, I don't see real benefit from this metric.
   > 
   > ```
   > major succeeded
   > major failed
   > minor failed
   > minor succeeded
   > ```
   > 
   > This would be reported as incomplete compaction. What action do you expect 
from the end-users in this case?
   
   In this case end users should re-run major compaction. Major compaction 
should have run (at 2. major failed) but hasn't since that failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585692)
Time Spent: 0.5h  (was: 20m)

> Metric about incomplete compactions
> ---
>
> Key: HIVE-25022
> URL: https://issues.apache.org/jira/browse/HIVE-25022
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> "Compactions in a state" metrics (for example compaction_num_working) count 
> the sum of tables/partitions where the last compaction is in that state.
> I propose introducing a new metric about incomplete compactions: i.e. the 
> number of tables/partitions where the last finished compaction* is 
> unsuccessful (failed or "did not initiate"), or where major compaction was 
> unsuccessful then minor compaction succeeded (compaction is not "complete" 
> since major compaction has not succeeded in the time since it should have 
> run).
> Example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major working
> major failed
> major initiated
> major working
> major failed
> major initiated
> major working
> The "compactions in a state" metrics will consider the state of this table: 
> working.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there have been failed compactions since the last succeeded compaction.
> {code}
> Another example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major failed
> minor failed
> minor succeeded
> The "compactions in a state" metrics will consider the state of this table: 
> succeeded.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there hasn't been a major succeeded since major failed.{code}
> Last example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> minor did not initiate
> The "compactions in a state" metrics will consider the state of this table: 
> did not initiate.
> The "incomplete compactions" metric will consider this: incomplete, since the 
> last compaction was "did not initiate"{code}
> *finished compaction: state in (succeeded, failed, attempted/did not initiate)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25022) Metric about incomplete compactions

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25022?focusedWorklogId=585689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585689
 ]

ASF GitHub Bot logged work on HIVE-25022:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 10:26
Start Date: 20/Apr/21 10:26
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2184:
URL: https://github.com/apache/hive/pull/2184#discussion_r616544766



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/AcidMetricService.java
##
@@ -108,19 +112,56 @@ private void updateDBMetrics() throws MetaException {
   @VisibleForTesting
   public static void updateMetricsFromShowCompact(ShowCompactResponse 
showCompactResponse) {
 Map lastElements = new HashMap<>();
+Map lastUnsuccessfulMajor = new HashMap<>();
+Map lastUnsuccessfulMinor = new HashMap<>();
 long oldestEnqueueTime = Long.MAX_VALUE;
 
-// Get the last compaction for each db/table/partition
-for(ShowCompactResponseElement element : 
showCompactResponse.getCompacts()) {
+// sort compactions by ID. This is not done in TxnHandler.
+List compactions = 
showCompactResponse.getCompacts().stream()
+.sorted((o1, o2) -> (int) (o1.getId() - 
o2.getId())).collect(Collectors.toList());
+for (ShowCompactResponseElement element : compactions) {
   String key = element.getDbname() + "/" + element.getTablename() +
   (element.getPartitionname() != null ? "/" + 
element.getPartitionname() : "");
+
+  // Get the last compaction for each db/table/partition
   // If new key, add the element, if there is an existing one, change to 
the element if the element.id is greater than old.id
   lastElements.compute(key, (k, old) -> (old == null) ? element : 
(element.getId() > old.getId() ? element : old));
   if (TxnStore.INITIATED_RESPONSE.equals(element.getState()) && 
oldestEnqueueTime > element.getEnqueueTime()) {
 oldestEnqueueTime = element.getEnqueueTime();
   }
+
+  // Count incomplete compactions
+  CompactionType type = element.getType();
+  lastUnsuccessfulMajor.compute(key, (k, old) -> {
+// Add newest unsuccessful compaction to the map
+if (wasUnsuccessful(element) && type == MAJOR) {

Review comment:
   Could it be simplified ?
   ```
   if (type == MAJOR){
   if (wasUnsuccessful(element)) return element.getId();
   if (wasSuccessful(element)) return null;
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585689)
Time Spent: 20m  (was: 10m)

> Metric about incomplete compactions
> ---
>
> Key: HIVE-25022
> URL: https://issues.apache.org/jira/browse/HIVE-25022
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> "Compactions in a state" metrics (for example compaction_num_working) count 
> the sum of tables/partitions where the last compaction is in that state.
> I propose introducing a new metric about incomplete compactions: i.e. the 
> number of tables/partitions where the last finished compaction* is 
> unsuccessful (failed or "did not initiate"), or where major compaction was 
> unsuccessful then minor compaction succeeded (compaction is not "complete" 
> since major compaction has not succeeded in the time since it should have 
> run).
> Example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major working
> major failed
> major initiated
> major working
> major failed
> major initiated
> major working
> The "compactions in a state" metrics will consider the state of this table: 
> working.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there have been failed compactions since the last succeeded compaction.
> {code}
> Another example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> major failed
> minor failed
> minor succeeded
> The "compactions in a state" metrics will consider the state of this table: 
> succeeded.
> The "incomplete compactions" metric will consider this: incomplete, since 
> there hasn't been a major succeeded since major failed.{code}
> Last example:
> {code:java}
> These compactions ran on a partition:
> major succeeded
> minor did not initiate
> The "compactions in a state" metrics will consider the state of 

[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=585686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585686
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 10:15
Start Date: 20/Apr/21 10:15
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2071:
URL: https://github.com/apache/hive/pull/2071#discussion_r616536470



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveMapComparator.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.util.NullOrdering;
+import org.apache.hadoop.io.WritableComparable;
+import org.apache.hadoop.io.WritableComparator;
+
+import java.util.Iterator;
+import java.util.Map;
+
+final class HiveMapComparator extends HiveWritableComparator {
+private WritableComparator comparatorValue = null;
+private WritableComparator comparatorKey = null;
+
+HiveMapComparator(boolean nullSafe, NullOrdering nullOrdering) {
+super(nullSafe, nullOrdering);
+}
+
+@Override
+public int compare(Object key1, Object key2) {
+int result = checkNull(key1, key2);
+if (result != not_null) {
+return result;
+}
+
+Map map1 = (Map) key1;
+Map map2 = (Map) key2;
+if (comparatorKey == null) {
+comparatorKey =
+
WritableComparatorFactory.get(map1.keySet().iterator().next(), nullSafe, 
nullOrdering);
+comparatorValue =
+
WritableComparatorFactory.get(map1.values().iterator().next(), nullSafe, 
nullOrdering);
+}
+
+Iterator map1KeyIterator = map1.keySet().iterator();
+Iterator map2KeyIterator = map2.keySet().iterator();
+Iterator map1ValueIterator = map1.values().iterator();
+Iterator map2ValueIterator = map2.values().iterator();
+
+// For map of size greater than 1, the ordering is based on the key 
value. If key values are same till the
+// size of smaller map, then the size is checked for ordering.
+int size = map1.size() > map2.size() ? map2.size() : map1.size();
+for (int i = 0; i < size; i++) {
+result = comparatorKey.compare(map1KeyIterator.next(), 
map2KeyIterator.next());
+if (result != 0) {
+return result;
+}
+result = comparatorValue.compare(map1ValueIterator.next(), 
map2ValueIterator.next());
+if (result != 0) {
+return result;
+}
+}
+return map1.size() == map2.size() ? 0 : map1.size() > map2.size() ? 1 
: -1;
+}

Review comment:
   If I understand well this code will not consider equal two maps that 
have the same key/value pairs but in different order. In other words 
`map1.equals(map2)` may return `true` while this code could return `-1 or 1` 
for instance. I am not sure how we treat MAP equalities in other places of the 
code (other joins, etc.) so just want to make sure that we are consistent. 

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveMapComparator.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec;
+

[jira] [Assigned] (HIVE-25034) Implement CTAS for Iceberg

2021-04-20 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25034:
-


> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25021) Divide oldest_open_txn into oldest replication and non-replication transactions

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25021:
--
Labels: pull-request-available  (was: )

> Divide oldest_open_txn into oldest replication and non-replication 
> transactions
> ---
>
> Key: HIVE-25021
> URL: https://issues.apache.org/jira/browse/HIVE-25021
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should have different metrics (age and txn id) for 
> oldest replication txn (TXN_TYPE==1)
> oldest non-replication txn (TXN_TYPE!=1)
> so recommendations can be tailored to the different cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25021) Divide oldest_open_txn into oldest replication and non-replication transactions

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25021?focusedWorklogId=585678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585678
 ]

ASF GitHub Bot logged work on HIVE-25021:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:56
Start Date: 20/Apr/21 09:56
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #2195:
URL: https://github.com/apache/hive/pull/2195


   
   
   ### What changes were proposed in this pull request?
   
   Divide the open transaction related metrics (number of open transactions, 
oldest open transaction id, age of the oldest open transaction) based on if 
it's a replication related transaction or not.
   This will give us an ability to give recommendations to the different cases.
   
   ### Why are the changes needed?
   
   Subtask is part of the compaction observability initiative.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests were added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585678)
Remaining Estimate: 0h
Time Spent: 10m

> Divide oldest_open_txn into oldest replication and non-replication 
> transactions
> ---
>
> Key: HIVE-25021
> URL: https://issues.apache.org/jira/browse/HIVE-25021
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should have different metrics (age and txn id) for 
> oldest replication txn (TXN_TYPE==1)
> oldest non-replication txn (TXN_TYPE!=1)
> so recommendations can be tailored to the different cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585671
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:44
Start Date: 20/Apr/21 09:44
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616521343



##
File path: ql/src/test/results/clientpositive/llap/windowing_udaf.q.out
##
@@ -503,7 +503,7 @@ alice brown 25.2587496
 alice brown25.5293748
 alice brown25.63012987012987
 alice brown26.472439024390237
-alice brown27.100638297872322
+alice brown27.27881720430106

Review comment:
   sure, I confirmed this manually, and I found that the new, vectorized 
average is correct (27.27881720430106)
   here is how I checked:
   1. table and original query
   ```
   create table over10k_n4(
   t tinyint,
   si smallint,
   i int,
   b bigint,
   f float,
   d double,
   bo boolean,
   s string,
   ts timestamp, 
   `dec` decimal,  
   bin binary)
   row format delimited
   fields terminated by '|';
   
   load data local inpath '../../data/files/over10k' into table over10k_n4;
   
   select t, f, d, avg(d) over (partition by t order by f) a from over10k_n4 
order by s, a limit 100;
   ```
   
   2.  the original query is problematic to represent the problem, doesn't 
contain all the important rows due to limit 100, but here is a cleaner scenario
   ```
   select t, f, d, avg(d) over (partition by t order by f) a from over10k_n4 
where t = 114;
   ```
   result:
   ```
   
   | 114  | 95.01  | 13.77  | 27.31472527472526   |
   | 114  | 95.09  | 45.37  | 27.510978260869546  |
   | 114  | 97.94  | 5.92   | 27.27881720430106   | <- this is the 
changed value
   | 114  | 97.94  | 10.53  | 27.100638297872322  |
   +--+++-+
   ```
   so we can see the avg in the row before the last row is 27.2788, how can we 
check this?
   let's calculate the sum and count for this row to get the average:
   1. sum (sum of all rows, the subtract the last one)
   ```
   select sum(d) - 10.53 from over10k_n4 where t = 114;
   2536.92994
   ```
   
   2. count (count all, we can subtract 1 while calculating the average in the 
next step)
   ```
   select count(d) from over10k_n4 where t = 114;
   94
   ```
   
   3. average:
   ```
 2536.92994 / 93 = 27.2788172043
   ```
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585671)
Time Spent: 1h 10m  (was: 1h)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585670
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:43
Start Date: 20/Apr/21 09:43
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r616521343



##
File path: ql/src/test/results/clientpositive/llap/windowing_udaf.q.out
##
@@ -503,7 +503,7 @@ alice brown 25.2587496
 alice brown25.5293748
 alice brown25.63012987012987
 alice brown26.472439024390237
-alice brown27.100638297872322
+alice brown27.27881720430106

Review comment:
   sure, I confirmed this manually, and I found that the new, vectorized 
average is correct (27.27881720430106)
   here is how I checked:
   1. table and original query
   ```
   create table over10k_n4(
   t tinyint,
   si smallint,
   i int,
   b bigint,
   f float,
   d double,
   bo boolean,
   s string,
   ts timestamp, 
   `dec` decimal,  
   bin binary)
   row format delimited
   fields terminated by '|';
   
   load data local inpath '../../data/files/over10k' into table over10k_n4;
   
   select t, f, d, avg(d) over (partition by t order by f) a from over10k_n4 
order by s, a limit 100;
   ```
   
   2.  the original query is problematic to represent the problem, doesn't 
contain all the important rows due to limit 100, but here is a cleaner scenario
   ```
   select t, f, d, avg(d) over (partition by t order by f) a from over10k_n4 
where t = 114;
   ```
   result:
   ```
   
   | 114  | 95.01  | 13.77  | 27.31472527472526   |
   | 114  | 95.09  | 45.37  | 27.510978260869546  |
   | 114  | 97.94  | 5.92   | 27.27881720430106   | <- this is the 
changed value
   | 114  | 97.94  | 10.53  | 27.100638297872322  |
   +--+++-+
   
   so we can see the avg in the row before the last row is 27.2788, how can we 
check this?
   let's calculate the sum and count for this row to get the average:
   1. sum (sum of all rows, the subtract the last one)
   ```
   select sum(d) - 10.53 from over10k_n4 where t = 114;
   2536.92994
   ```
   
   2. count (count all, we can subtract 1 while calculating the average in the 
next step)
   ```
   select count(d) from over10k_n4 where t = 114;
   94
   ```
   
   3. average:
   ```
 2536.92994 / 93 = 27.2788172043
   ```
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585670)
Time Spent: 1h  (was: 50m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585661
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:29
Start Date: 20/Apr/21 09:29
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616510710



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -256,4 +265,74 @@ private static PartitionSpec spec(Schema schema, 
Properties properties,
   return PartitionSpec.unpartitioned();
 }
   }
+
+  @Override
+  public void commitInsertTable(org.apache.hadoop.hive.metastore.api.Table 
table, boolean overwrite)
+  throws MetaException {
+String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
+
+// check status to determine whether we need to commit or to abort
+JobConf jobConf = new JobConf(conf);
+String queryIdKey = jobConf.get("hive.query.id") + "." + tableName + 
".result";

Review comment:
   Makes sense. I was thinking of replacing the query id with a constant 
like `HIVE_TEZ_COMMIT_JOB_RESULT` so it would be `HIVE_TEZ_COMMIT_JOB_RESULT + 
"." + tableName`, like for job id and task num to make things consisent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585661)
Time Spent: 1h 10m  (was: 1h)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585660=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585660
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:29
Start Date: 20/Apr/21 09:29
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616510710



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -256,4 +265,74 @@ private static PartitionSpec spec(Schema schema, 
Properties properties,
   return PartitionSpec.unpartitioned();
 }
   }
+
+  @Override
+  public void commitInsertTable(org.apache.hadoop.hive.metastore.api.Table 
table, boolean overwrite)
+  throws MetaException {
+String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
+
+// check status to determine whether we need to commit or to abort
+JobConf jobConf = new JobConf(conf);
+String queryIdKey = jobConf.get("hive.query.id") + "." + tableName + 
".result";

Review comment:
   Makes sense. I was thinking of replacing the query id with a constant 
like `HIVE_TEZ_COMMIT_JOB_RESULT` so it would be `HIVE_TEZ_COMMIT_JOB_ID + "." 
+ tableName`, like for job id and task num to make things consisent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585660)
Time Spent: 1h  (was: 50m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585652
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:11
Start Date: 20/Apr/21 09:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616497039



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -256,4 +265,74 @@ private static PartitionSpec spec(Schema schema, 
Properties properties,
   return PartitionSpec.unpartitioned();
 }
   }
+
+  @Override
+  public void commitInsertTable(org.apache.hadoop.hive.metastore.api.Table 
table, boolean overwrite)
+  throws MetaException {
+String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
+
+// check status to determine whether we need to commit or to abort
+JobConf jobConf = new JobConf(conf);
+String queryIdKey = jobConf.get("hive.query.id") + "." + tableName + 
".result";

Review comment:
   Maybe even "hive.query.id.%.result"? Or 
"hive.query.id.result."+tableName?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585652)
Time Spent: 50m  (was: 40m)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585651
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:10
Start Date: 20/Apr/21 09:10
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r616496048



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -256,4 +265,74 @@ private static PartitionSpec spec(Schema schema, 
Properties properties,
   return PartitionSpec.unpartitioned();
 }
   }
+
+  @Override
+  public void commitInsertTable(org.apache.hadoop.hive.metastore.api.Table 
table, boolean overwrite)
+  throws MetaException {
+String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
+
+// check status to determine whether we need to commit or to abort
+JobConf jobConf = new JobConf(conf);
+String queryIdKey = jobConf.get("hive.query.id") + "." + tableName + 
".result";

Review comment:
   Could we use constants here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585651)
Time Spent: 40m  (was: 0.5h)

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=585649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585649
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:04
Start Date: 20/Apr/21 09:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r616491881



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicPartitionPruner.java
##
@@ -514,4 +569,29 @@ private boolean checkForSourceCompletion(String name) {
 }
 return false;
   }
+
+  /**
+   * Recursively replaces the ExprNodeDynamicListDesc to the list of the 
actual values. As a result of this call the
+   * original expression is modified so it can be used for pushing down to the 
TableScan for filtering the data at the
+   * source.
+   * 
+   * Please make sure to clone the predicate if needed since the original node 
will be modified.
+   * @param node The node we are traversing
+   * @param dynArgs The constant values we are substituting
+   */
+  private void replaceDynamicLists(ExprNodeDesc node, 
Collection dynArgs) {
+List children = node.getChildren();
+if (children != null && !children.isEmpty()) {
+  ListIterator iterator = node.getChildren().listIterator();
+  while (iterator.hasNext()) {
+ExprNodeDesc child = iterator.next();
+if (child instanceof ExprNodeDynamicListDesc) {
+  iterator.remove();
+  dynArgs.forEach(iterator::add);

Review comment:
   Added a check for `addDynamicSplitPruningEdge` which returns `false` if 
there are multiple columns in the expression to filter




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585649)
Time Spent: 4h  (was: 3h 50m)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=585647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585647
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:03
Start Date: 20/Apr/21 09:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r616490821



##
File path: 
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -250,4 +305,74 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
 // this is an exception to the interface documentation, but it's a safe 
operation to add this property
 props.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
   }
+
+  /**
+   * Recursively collects the column names from the predicate.
+   * @param node The node we are traversing
+   * @param columns The already collected column names
+   */
+  private void columns(ExprNodeDesc node, Collection columns) {

Review comment:
   In the end moved to `Set` since it was needed to make sure than only a 
single column is there in the pruning expression




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585647)
Time Spent: 3h 40m  (was: 3.5h)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=585648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585648
 ]

ASF GitHub Bot logged work on HIVE-24962:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:03
Start Date: 20/Apr/21 09:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r616491306



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java
##
@@ -171,7 +171,7 @@ private void prepare(InputInitializerContext 
initializerContext) throws IOExcept
   // perform dynamic partition pruning
   if (pruner != null) {
 pruner.initialize(getContext(), work, jobConf);
-pruner.prune();
+pruner.prune(jobConf);

Review comment:
   Removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585648)
Time Spent: 3h 50m  (was: 3h 40m)

> Enable partition pruning for Iceberg tables
> ---
>
> Key: HIVE-24962
> URL: https://issues.apache.org/jira/browse/HIVE-24962
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25033?focusedWorklogId=585646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585646
 ]

ASF GitHub Bot logged work on HIVE-25033:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 09:00
Start Date: 20/Apr/21 09:00
Worklog Time Spent: 10m 
  Work Description: zeroflag opened a new pull request #2194:
URL: https://github.com/apache/hive/pull/2194


   Thrift doesn't support returning null-s from functions.
   
   This patch also puts back the changes which were accidentally removed by 
   
   
https://github.com/apache/hive/commit/6de2d51ac812a9393e8ff5e6de7bd911cb83f237#diff-90c669c961ab250c01087a409b31126cce7ac0672c40f465e7ee262ed7bcdadd


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585646)
Remaining Estimate: 0h
Time Spent: 10m

> HPL/SQL thrift call fails when returning null
> -
>
> Key: HIVE-25033
> URL: https://issues.apache.org/jira/browse/HIVE-25033
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25033:
--
Labels: pull-request-available  (was: )

> HPL/SQL thrift call fails when returning null
> -
>
> Key: HIVE-25033
> URL: https://issues.apache.org/jira/browse/HIVE-25033
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24980) Add timeout for failed and did not initiate compaction cleanup

2021-04-20 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-24980:
-
Summary: Add timeout for failed and did not initiate compaction cleanup  
(was: Add timeout for failed and "not initiated" compaction cleanup)

> Add timeout for failed and did not initiate compaction cleanup
> --
>
> Key: HIVE-24980
> URL: https://issues.apache.org/jira/browse/HIVE-24980
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Clear failed and not initiated compactions from COMPLETED_COMPACTIONS that 
> are older than a week (configurable) if there already is a newer successful 
> compaction on the table/partition and either (1) the succeeded compaction is 
> major or (2) it is minor and the not initiated or failed compaction is also 
> minor –– so a minor succeeded compaction will not cause the deletion of a 
> major not initiated or failed compaction from history.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24980) Add timeout for failed and did not initiate compaction cleanup

2021-04-20 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-24980.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master branch. Thanks for reviewing [~dkuzmenko]!

> Add timeout for failed and did not initiate compaction cleanup
> --
>
> Key: HIVE-24980
> URL: https://issues.apache.org/jira/browse/HIVE-24980
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Clear failed and not initiated compactions from COMPLETED_COMPACTIONS that 
> are older than a week (configurable) if there already is a newer successful 
> compaction on the table/partition and either (1) the succeeded compaction is 
> major or (2) it is minor and the not initiated or failed compaction is also 
> minor –– so a minor succeeded compaction will not cause the deletion of a 
> major not initiated or failed compaction from history.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24980) Add timeout for failed and "not initiated" compaction cleanup

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24980?focusedWorklogId=585643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585643
 ]

ASF GitHub Bot logged work on HIVE-24980:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 08:42
Start Date: 20/Apr/21 08:42
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #2156:
URL: https://github.com/apache/hive/pull/2156


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585643)
Time Spent: 50m  (was: 40m)

> Add timeout for failed and "not initiated" compaction cleanup
> -
>
> Key: HIVE-24980
> URL: https://issues.apache.org/jira/browse/HIVE-24980
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Clear failed and not initiated compactions from COMPLETED_COMPACTIONS that 
> are older than a week (configurable) if there already is a newer successful 
> compaction on the table/partition and either (1) the succeeded compaction is 
> major or (2) it is minor and the not initiated or failed compaction is also 
> minor –– so a minor succeeded compaction will not cause the deletion of a 
> major not initiated or failed compaction from history.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-20 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-25033:



> HPL/SQL thrift call fails when returning null
> -
>
> Key: HIVE-25033
> URL: https://issues.apache.org/jira/browse/HIVE-25033
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25027) Hide Iceberg module behind a profile

2021-04-20 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25027.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for the review [~mbod] and [~lpinter]!

> Hide Iceberg module behind a profile
> 
>
> Key: HIVE-25027
> URL: https://issues.apache.org/jira/browse/HIVE-25027
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After creating {{patched-iceberg-core}} and {{patched-iceberg-api}} modules 
> the maven build works fine, but IntelliJ needs manual classpath setup for the 
> build in the IntelliJ to succeed.
> Most of the community does not use Iceberg and eventually the "patched" 
> modules will be removed as the Hive-Iceberg integration stabilizes and the 
> Iceberg project releases the changes we need. In the meantime we just hide 
> the whole {{Iceberg}} module behind a profile which is only used on the CI 
> and if the developer specifically sets it. 
> It could be used like"
> {code:java}
>  mvn clean install -DskipTests -Piceberg{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25027) Hide Iceberg module behind a profile

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25027?focusedWorklogId=585629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585629
 ]

ASF GitHub Bot logged work on HIVE-25027:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 07:49
Start Date: 20/Apr/21 07:49
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2188:
URL: https://github.com/apache/hive/pull/2188


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585629)
Time Spent: 20m  (was: 10m)

> Hide Iceberg module behind a profile
> 
>
> Key: HIVE-25027
> URL: https://issues.apache.org/jira/browse/HIVE-25027
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After creating {{patched-iceberg-core}} and {{patched-iceberg-api}} modules 
> the maven build works fine, but IntelliJ needs manual classpath setup for the 
> build in the IntelliJ to succeed.
> Most of the community does not use Iceberg and eventually the "patched" 
> modules will be removed as the Hive-Iceberg integration stabilizes and the 
> Iceberg project releases the changes we need. In the meantime we just hide 
> the whole {{Iceberg}} module behind a profile which is only used on the CI 
> and if the developer specifically sets it. 
> It could be used like"
> {code:java}
>  mvn clean install -DskipTests -Piceberg{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)