[jira] [Work logged] (HIVE-22224) Support Parquet-Avro Timestamp Type

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-4?focusedWorklogId=754371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754371
 ]

ASF GitHub Bot logged work on HIVE-4:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 00:19
Start Date: 08/Apr/22 00:19
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3002:
URL: https://github.com/apache/hive/pull/3002#issuecomment-1092321630

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 754371)
Time Spent: 50m  (was: 40m)

> Support Parquet-Avro Timestamp Type
> ---
>
> Key: HIVE-4
> URL: https://issues.apache.org/jira/browse/HIVE-4
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 2.3.5, 2.3.6
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: parquet, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When user create an external table and import a parquet-avro data with 1.8.2 
> version which supported logical_type in Hive2.3 or before version, Hive can 
> not read timestamp type column data correctly.
> Hive will read it as LongWritable which it actually stores as 
> long(logical_type=timestamp-millis).So we may add some codes in 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.java
>  to let Hive cast long type to timestamp type.
> Some code like below:
>  
> public Timestamp getPrimitiveJavaObject(Object o) {
>   if (o instanceof LongWritable) {
>     return new Timestamp(((LongWritable) o).get());
>   }
>   return o == null ? null : ((TimestampWritable) o).getTimestamp();
> }
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754304
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 19:38
Start Date: 07/Apr/22 19:38
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r845501219


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HmsThriftHttpServlet.java:
##
@@ -0,0 +1,116 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.Enumeration;
+
+import javax.servlet.ServletException;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.thrift.TProcessor;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.server.TServlet;
+
+public class HmsThriftHttpServlet extends TServlet {
+
+  private static final Logger LOG = LoggerFactory
+  .getLogger(HmsThriftHttpServlet.class);
+
+  private static final String X_USER = MetaStoreUtils.USER_NAME_HTTP_HEADER;
+
+  private final boolean isSecurityEnabled;
+
+  public HmsThriftHttpServlet(TProcessor processor,
+  TProtocolFactory inProtocolFactory, TProtocolFactory outProtocolFactory) 
{
+super(processor, inProtocolFactory, outProtocolFactory);
+// This should ideally be reveiving an instance of the Configuration which 
is used for the check
+isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
+  }
+
+  public HmsThriftHttpServlet(TProcessor processor,
+  TProtocolFactory protocolFactory) {
+super(processor, protocolFactory);
+isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
+  }
+
+  @Override
+  protected void doPost(HttpServletRequest request,
+  HttpServletResponse response) throws ServletException, IOException {
+
+Enumeration headerNames = request.getHeaderNames();
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Logging headers in request");
+  while (headerNames.hasMoreElements()) {
+String headerName = headerNames.nextElement();
+LOG.debug("Header: [{}], Value: [{}]", headerName,
+request.getHeader(headerName));
+  }
+}
+String userFromHeader = request.getHeader(X_USER);
+if (userFromHeader == null || userFromHeader.isEmpty()) {
+  LOG.error("No user header: {} found", X_USER);
+  response.sendError(HttpServletResponse.SC_FORBIDDEN,
+  "User Header missing");
+  return;
+}
+
+// TODO: These should ideally be in some kind of a Cache with Weak 
referencse.
+// If HMS were to set up some kind of a session, this would go into the 
session by having
+// this filter work with a custom Processor / or set the username into the 
session
+// as is done for HS2.
+// In case of HMS, it looks like each request is independent, and there is 
no session
+// information, so the UGI needs to be set up in the Connection layer 
itself.
+UserGroupInformation clientUgi;
+// Temporary, and useless for now. Here only to allow this to work on an 
otherwise kerberized
+// server.
+if (isSecurityEnabled) {
+  LOG.info("Creating proxy user for: {}", userFromHeader);
+  clientUgi = UserGroupInformation.createProxyUser(userFromHeader, 
UserGroupInformation.getLoginUser());
+} else {
+  LOG.info("Creating remote user for: {}", userFromHeader);
+  clientUgi = UserGroupInformation.createRemoteUser(userFromHeader);
+}
+
+
+PrivilegedExceptionAction action = new 
PrivilegedExceptionAction() {
+  @Override
+  public Void run() throws Exception {
+HmsThriftHttpServlet.super.doPost(request, response);
+return null;

[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26123:
--
Labels: pull-request-available  (was: )

> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26123?focusedWorklogId=754234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754234
 ]

ASF GitHub Bot logged work on HIVE-26123:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 17:03
Start Date: 07/Apr/22 17:03
Worklog Time Spent: 10m 
  Work Description: asolimando opened a new pull request, #3196:
URL: https://github.com/apache/hive/pull/3196

   …tores
   
   See the JIRA ticket for details.
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   `mvn test -Dtest=TestMssqlMetastoreCliDriver -Dtest.output.overwrite -pl 
itests/qtest -Pitests`
   `mvn test -Dtest=TestOracleMetastoreCliDriver -Dtest.output.overwrite -pl 
itests/qtest -Pitests`
   `mvn test -Dtest=TestMariadbMetastoreCliDriver -Dtest.output.overwrite -pl 
itests/qtest -Pitests`
   `mvn test -Dtest=TestMysqlMetastoreCliDriver -Dtest.output.overwrite -pl 
itests/qtest -Pitests`
   `mvn test -Dtest=TestPostgresMetastoreCliDriver -Dtest.output.overwrite -pl 
itests/qtest -Pitests`




Issue Time Tracking
---

Worklog Id: (was: 754234)
Remaining Estimate: 0h
Time Spent: 10m

> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26125) sysdb fails with mysql as metastore db

2022-04-07 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26125:

Description: 
_sysdb.q_ and _strict_managed_tables_sysdb.q_ fail when using MySQL as 
standalone metastore db.

The issue can be reproduced with the following command:

{code:java}
mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile="sysdb.q,strict_managed_tables_sysdb.q" -Dtest.metastore.db=mysql -pl 
itests/qtest -Pitests
{code}

The errors are as follows:

{noformat}
---
Test set: org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver
---
Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 282.638 s <<< 
FAILURE! - in org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver
org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver.testCliDriver[strict_managed_tables_sysdb]
  Time elapsed: 41.104 s  <<< FAILURE!
java.lang.AssertionError: 
Client execution failed with error code = 2 
running 

select tbl_name, tbl_type from tbls where tbl_name like 'smt_sysdb%' order by 
tbl_name 
fname=strict_managed_tables_sysdb.q

See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
vertexName=Map 1, vertexId=vertex_1649344918728_0001_33_00, diagnostics=[Task 
failed, taskId=task_1649344918728_0001_33_00_00, diagnostics=[TaskAttempt 0 
failed, info=[Error: Error while running task ( failure ) : 
attempt_1649344918728_0001_33_00_00_0:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
java.io.IOException: 
org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught 
exception while trying to execute query:You have an error in your SQL syntax; 
check the manual that corresponds to your MySQL server version for the right 
syntax to use near '"TBLS"' at line 14
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.IOException: 
org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught 
exception while trying to execute query:You have an error in your SQL syntax; 
check the manual that corresponds to your MySQL server version for the right 
syntax to use near '"TBLS"' at line 14
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:89)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 15 more
Caused by: java.io.IOException: java.io.IOException: 
org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught 
exception while trying to execute query:You have an error in your SQL syntax; 
check the manual that corresponds to your MySQL server version for the right 
syntax to use near '"TBLS"' at line 14
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 

[jira] [Commented] (HIVE-20205) Upgrade HBase dependencies off alpha4 release

2022-04-07 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518984#comment-17518984
 ] 

Naveen Gangam commented on HIVE-20205:
--

Based on analysis in HIVE-26124, HBase 2 is incompatible with Hadoop3.

> Upgrade HBase dependencies off alpha4 release
> -
>
> Key: HIVE-20205
> URL: https://issues.apache.org/jira/browse/HIVE-20205
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-20205.1.patch, HIVE-20205.1.patch, 
> HIVE-20205.2.patch, HIVE-20205.2.patch, HIVE-20205.3.patch, HIVE-20205.patch, 
> HIVE-20205.patch
>
>
> Appears Hive has dependencies on hbase 2.0.0-alpha4 releases. HBase 2.0.0 and 
> 2.0.1 have been released. HBase team recommends 2.0.1 and says there shouldnt 
> be any API surprises. (but we never know)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-20205) Upgrade HBase dependencies off alpha4 release

2022-04-07 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-20205:
-
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Upgrade HBase dependencies off alpha4 release
> -
>
> Key: HIVE-20205
> URL: https://issues.apache.org/jira/browse/HIVE-20205
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-20205.1.patch, HIVE-20205.1.patch, 
> HIVE-20205.2.patch, HIVE-20205.2.patch, HIVE-20205.3.patch, HIVE-20205.patch, 
> HIVE-20205.patch
>
>
> Appears Hive has dependencies on hbase 2.0.0-alpha4 releases. HBase 2.0.0 and 
> 2.0.1 have been released. HBase team recommends 2.0.1 and says there shouldnt 
> be any API surprises. (but we never know)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518980#comment-17518980
 ] 

Naveen Gangam commented on HIVE-26124:
--

Thanks Peter. I will close the other jira as well.

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26124.
---
Resolution: Won't Fix

HBase 2 and Hadoop 3 is incompatible.
We might have to move forward to HBase 3 if it becomes available.

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518978#comment-17518978
 ] 

Peter Vary commented on HIVE-26124:
---

Talked to [~stoty], and he pointed out that he already did this exercise on 
HIVE-24473.

The short story is that HBase 2.x is compiled against Hadoop 2, and it could 
not be used for testing with any Hadoop 3 artifacts. The root cause is 
HBASE-22394 BTW.

Thanks [~stoty] for the pointers!

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26124?focusedWorklogId=754203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754203
 ]

ASF GitHub Bot logged work on HIVE-26124:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 15:55
Start Date: 07/Apr/22 15:55
Worklog Time Spent: 10m 
  Work Description: pvary closed pull request #3186: HIVE-26124: Upgrade 
HBase from 2.0.0-alpha4 to 2.0.0
URL: https://github.com/apache/hive/pull/3186




Issue Time Tracking
---

Worklog Id: (was: 754203)
Time Spent: 20m  (was: 10m)

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?focusedWorklogId=754195=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754195
 ]

ASF GitHub Bot logged work on HIVE-26117:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 15:47
Start Date: 07/Apr/22 15:47
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3179:
URL: https://github.com/apache/hive/pull/3179#discussion_r845281998


##
ql/src/test/results/clientnegative/joinneg.q.out:
##
@@ -1 +1 @@
-FAILED: SemanticException [Error 10004]: Line 6:12 Invalid table alias or 
column reference 'b': (possible column names are: x.key, x.value, y.key, 
y.value)
+FAILED: SemanticException [Error 10009]: Line 6:12 Invalid table alias 'b'

Review Comment:
   The original error message was more informative.



##
ql/src/test/results/clientpositive/llap/views_explain_ddl.q.out:
##
@@ -305,7 +305,7 @@ TBLPROPERTIES (
 ALTER TABLE db1.table2_n13 UPDATE STATISTICS 
SET('numRows'='0','rawDataSize'='0' );
 ALTER TABLE db1.table1_n19 UPDATE STATISTICS 
SET('numRows'='0','rawDataSize'='0' );
 
-CREATE VIEW `db1`.`v3_n3` AS SELECT `t1`.`key`, `t1`.`value`, `t2`.`key` `k` 
FROM `db1`.`table1_n19` `t1` JOIN `db1`.`table2_n13` `t2` ON `t1`.`key` = 
`t2`.`key`;
+CREATE VIEW `db1`.`v3_n3` AS SELECT `t1`.`key`, `t1`.`value`, `t2`.`key` `k` 
FROM `db1`.`table1_n19` `t1` JOIN `db1`.`table2_n13` `t2` ON t1.key = t2.key;

Review Comment:
   View expanded text changed: quotation removed from table and column names in 
join condition: 
   ```
   t1.key = t2.key
   ```
   should remain 
   ```
   `t1`.`key` = `t2`.`key`
   ```





Issue Time Tracking
---

Worklog Id: (was: 754195)
Time Spent: 20m  (was: 10m)

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=754173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754173
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 15:13
Start Date: 07/Apr/22 15:13
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r845256354


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##
@@ -422,21 +413,46 @@ void findUnknownPartitions(Table table, Set 
partPaths, byte[] filterExp,
   }
   allPartDirs = partDirs;
 }
-// don't want the table dir
-allPartDirs.remove(tablePath);
-
-// remove the partition paths we know about
-allPartDirs.removeAll(partPaths);
-
 Set partColNames = Sets.newHashSet();
 for(FieldSchema fSchema : getPartCols(table)) {
   partColNames.add(fSchema.getName());
 }
 
 Map partitionColToTypeMap = 
getPartitionColtoTypeMap(table.getPartitionKeys());
+
+Set correctPartPathsInMS = new HashSet<>(partPathsInMS);
+// remove partition paths in partPathsInMS, to getPartitionsNotOnFs
+partPathsInMS.removeAll(allPartDirs);
+FileSystem fs = tablePath.getFileSystem(conf);
+// There can be edge case where user can define partition directory 
outside of table directory
+// to avoid eviction of such partitions
+// we check for partition path not exists and add to result for 
getPartitionsNotOnFs.
+for (Path partPath : partPathsInMS) {
+  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
+  pr.setTableName(table.getTableName());
+  pr.setPartitionName(getPartitionName(fs.makeQualified(tablePath),
+  partPath, partColNames, partitionColToTypeMap));
+  if (!fs.exists(partPath)) {
+result.getPartitionsNotOnFs().add(pr);
+correctPartPathsInMS.remove(partPath);
+  }
+}
+for (Path partPath : correctPartPathsInMS) {
+  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
+  pr.setTableName(table.getTableName());
+  pr.setPartitionName(getPartitionName(fs.makeQualified(tablePath),
+  partPath, partColNames, partitionColToTypeMap));
+  result.getCorrectPartitions().add(pr);
+}
+
+// don't want the table dir
+allPartDirs.remove(tablePath);
+
+// remove the partition paths we know about
+allPartDirs.removeAll(partPaths);

Review Comment:
   Does allPartDirs contain non-full path objects? Do we need them there?





Issue Time Tracking
---

Worklog Id: (was: 754173)
Time Spent: 5h 10m  (was: 5h)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754159
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 15:01
Start Date: 07/Apr/22 15:01
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845243489


##
iceberg/iceberg-handler/src/test/queries/positive/delete_iceberg_partitioned_avro.q:
##
@@ -0,0 +1,26 @@
+set hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+
+drop table if exists tbl_ice;
+create external table tbl_ice(a int, b string, c int) partitioned by spec 
(bucket(16, a), truncate(3, b)) stored by iceberg stored as avro tblproperties 
('format-version'='2');
+
+

Issue Time Tracking
---

Worklog Id: (was: 754159)
Time Spent: 12h 50m  (was: 12h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754152
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 14:59
Start Date: 07/Apr/22 14:59
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845240720


##
iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q:
##
@@ -0,0 +1,10 @@
+set hive.vectorized.execution.enabled=true;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

Review Comment:
   I've tried to address this with this commit: 
[a7fb7f9](https://github.com/apache/hive/pull/3131/commits/a7fb7f90a2fcc3c69b9e533de35b16eda99e3719)





Issue Time Tracking
---

Worklog Id: (was: 754152)
Time Spent: 12h 40m  (was: 12.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-07 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26123:

Description: 
_sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. 

Existing tests are running only against Derby, meaning that any change against 
sysdb query mapping is not covered by CI.

The present ticket aims at bridging this gap by introducing test coverage for 
the different supported metastore for sydb.

  was:
_sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. 

Existing tests are running only against Derby, meaning that any change against 
sysdb query mapping is not covered by CI.

The present ticket aims at bridging this gap by introducing test coverage for 
the different supported metastore for sydb.


> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26119) Remove unnecessary Exceptions from DDLPlanUtils

2022-04-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-26119.

Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/71b62c68ef76e90ee53281102870d570c8f50834. 
Thanks for the PR [~soumyakanti.das]!

> Remove unnecessary Exceptions from DDLPlanUtils
> ---
>
> Key: HIVE-26119
> URL: https://issues.apache.org/jira/browse/HIVE-26119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are a few {{HiveExceptions}} which were added to a few methods like 
> {{getCreateTableCommand}}, {{getColumns}}, {{formatType}}, etc, which can be 
> removed. Some methods in {{ExplainTask}} can also be cleaned up which are 
> related.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26119) Remove unnecessary Exceptions from DDLPlanUtils

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26119:
--
Labels: pull-request-available  (was: )

> Remove unnecessary Exceptions from DDLPlanUtils
> ---
>
> Key: HIVE-26119
> URL: https://issues.apache.org/jira/browse/HIVE-26119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are a few {{HiveExceptions}} which were added to a few methods like 
> {{getCreateTableCommand}}, {{getColumns}}, {{formatType}}, etc, which can be 
> removed. Some methods in {{ExplainTask}} can also be cleaned up which are 
> related.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26019) Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26019?focusedWorklogId=754134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754134
 ]

ASF GitHub Bot logged work on HIVE-26019:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 14:41
Start Date: 07/Apr/22 14:41
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #3075: HIVE-26019 
HIVE-26020: Improvements around transitive dependencies from calcite-core
URL: https://github.com/apache/hive/pull/3075




Issue Time Tracking
---

Worklog Id: (was: 754134)
Time Spent: 0.5h  (was: 20m)

> Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0
> ---
>
> Key: HIVE-26019
> URL: https://issues.apache.org/jira/browse/HIVE-26019
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26119) Remove unnecessary Exceptions from DDLPlanUtils

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26119?focusedWorklogId=754135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754135
 ]

ASF GitHub Bot logged work on HIVE-26119:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 14:41
Start Date: 07/Apr/22 14:41
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #3184: HIVE-26119: Remove 
unnecessary Exceptions from DDLPlanUtils
URL: https://github.com/apache/hive/pull/3184




Issue Time Tracking
---

Worklog Id: (was: 754135)
Remaining Estimate: 0h
Time Spent: 10m

> Remove unnecessary Exceptions from DDLPlanUtils
> ---
>
> Key: HIVE-26119
> URL: https://issues.apache.org/jira/browse/HIVE-26119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are a few {{HiveExceptions}} which were added to a few methods like 
> {{getCreateTableCommand}}, {{getColumns}}, {{formatType}}, etc, which can be 
> removed. Some methods in {{ExplainTask}} can also be cleaned up which are 
> related.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26020) Set dependency scope for json-path, commons-compiler and janino to runtime

2022-04-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-26020.

Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b. 
Thanks for the reviews [~asolimando], [~kkasa]!

> Set dependency scope for json-path, commons-compiler and janino to runtime
> --
>
> Key: HIVE-26020
> URL: https://issues.apache.org/jira/browse/HIVE-26020
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> The dependencies are necessary only when running Hive. They are not required 
> during compilation since Hive does not depend on them directly but 
> transitively through Calcite.
> 
> Changing the scope to runtime makes the intention clear and guards against 
> accidental usages in Hive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26019) Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0

2022-04-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-26019.

Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/73cbab65eafd58c07f5658a163a331dcdac8046d. 
Thanks for the reviews [~asolimando] [~kkasa]!

> Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0
> ---
>
> Key: HIVE-26019
> URL: https://issues.apache.org/jira/browse/HIVE-26019
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26019) Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0

2022-04-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26019:
---
Fix Version/s: 4.0.0-alpha-2

> Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0
> ---
>
> Key: HIVE-26019
> URL: https://issues.apache.org/jira/browse/HIVE-26019
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518896#comment-17518896
 ] 

Peter Vary commented on HIVE-26124:
---

Now I back at the first step:
{code}
[ERROR] Please refer to 
/Users/pvary/dev/upstream/hive/hbase-handler/target/surefire-reports for the 
individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, 
[date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/pvary/dev/upstream/hive/hbase-handler 
&& 
/usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/bin/java
 -Xmx2048m -jar 
/Users/pvary/dev/upstream/hive/hbase-handler/target/surefire/surefirebooter1320893522602873596.jar
 /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire 
2022-04-07T15-55-06_090-jvmRun1 surefire4212888302150641194tmp 
surefire_04095119596947982877tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hive.hbase.TestHBaseQueries
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/pvary/dev/upstream/hive/hbase-handler 
&& 
/usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/bin/java
 -Xmx2048m -jar 
/Users/pvary/dev/upstream/hive/hbase-handler/target/surefire/surefirebooter1320893522602873596.jar
 /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire 
2022-04-07T15-55-06_090-jvmRun1 surefire4212888302150641194tmp 
surefire_04095119596947982877tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hive.hbase.TestHBaseQueries
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:513)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:460)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:301)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:249)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1217)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1063)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:889)
[ERROR] at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:972)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:293)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:196)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
[ERROR] 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754095
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 13:56
Start Date: 07/Apr/22 13:56
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845167903


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java:
##
@@ -224,7 +232,7 @@ public Writable serialize(Object o, ObjectInspector 
objectInspector) {
 Deserializer deserializer = deserializers.get(objectInspector);
 if (deserializer == null) {
   deserializer = new Deserializer.Builder()
-  .schema(tableSchema)
+  .schema(isDelete ? deleteSchema : tableSchema)

Review Comment:
   Yes, I think that's a good idea





Issue Time Tracking
---

Worklog Id: (was: 754095)
Time Spent: 12.5h  (was: 12h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518870#comment-17518870
 ] 

Peter Vary commented on HIVE-26124:
---

That would be nice.
There is some config changes in the test utils.

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518869#comment-17518869
 ] 

Naveen Gangam commented on HIVE-26124:
--

got you. Shocking that alpha4 release has no issues but GA does. We need some 
HBase help on this then?

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518866#comment-17518866
 ] 

Peter Vary commented on HIVE-26124:
---

I think I am struggling with the same test failures on the PR.
{code}
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.(InetSocketAddress.java:224)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1217)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1184)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:723)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:561)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.(MiniHBaseCluster.java:147)
{code}

I was expecting some issues, so I was trying to be conservative. If we can fix 
the issues, I would be happy to move as high as possible with the dependency

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518865#comment-17518865
 ] 

Naveen Gangam commented on HIVE-26124:
--

[~pvary] https://issues.apache.org/jira/browse/HIVE-20205 never got committed 
due to not having a clean test run. I can close it as duplicate of this.
But is there a reason we are using 2.0.0 (it looks like my 3-year old patch was 
using 2.1.0). ? Thanks

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754020
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 12:34
Start Date: 07/Apr/22 12:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845080500


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java:
##
@@ -224,7 +232,7 @@ public Writable serialize(Object o, ObjectInspector 
objectInspector) {
 Deserializer deserializer = deserializers.get(objectInspector);
 if (deserializer == null) {
   deserializer = new Deserializer.Builder()
-  .schema(tableSchema)
+  .schema(isDelete ? deleteSchema : tableSchema)

Review Comment:
   would it make sense to keep the `projectedSchema` attribute and remove the 
`isDelete` and the `deleteSchema`?





Issue Time Tracking
---

Worklog Id: (was: 754020)
Time Spent: 12h 20m  (was: 12h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754003
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 12:09
Start Date: 07/Apr/22 12:09
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845056909


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergDeleteWriter.class);
+
+  private final ClusteredPositionDeleteWriter deleteWriter;

Review Comment:
   Yes, we can use `PartitioningWriter` as the common ancestor, which has the 
`write()` and `close()` methods conveniently. I've moved the writer object into 
the parent class, and now the children don't need to override the `close()` 
method anymore. However, in `files()` we need to cast to `DataWriteResult` and 
`DeleteWriteResult`





Issue Time Tracking
---

Worklog Id: (was: 754003)
Time Spent: 12h 10m  (was: 12h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754002
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 12:09
Start Date: 07/Apr/22 12:09
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845056909


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergDeleteWriter.class);
+
+  private final ClusteredPositionDeleteWriter deleteWriter;

Review Comment:
   Yes, we can use `PartitioningWriter` as the common ancestor, which has the 
`write()` and `close()` methods conveniently. I've moved the writer object into 
the parent class, and now the children don't need to override the `close()` 
method anymore





Issue Time Tracking
---

Worklog Id: (was: 754002)
Time Spent: 12h  (was: 11h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753995
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:52
Start Date: 07/Apr/22 11:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845042863


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/FilesForCommit.java:
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.ContentFile;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+
+public class FilesForCommit implements Serializable {
+
+  private final List dataFiles;

Review Comment:
   I would expect the difference is not very significant, probably `DataFile[]` 
is a bit more performant, but not sure





Issue Time Tracking
---

Worklog Id: (was: 753995)
Time Spent: 11h 50m  (was: 11h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753993
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:37
Start Date: 07/Apr/22 11:37
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845029968


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/FilesForCommit.java:
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.ContentFile;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+
+public class FilesForCommit implements Serializable {
+
+  private final List dataFiles;
+  private final List deleteFiles;
+
+  public FilesForCommit(List dataFiles, List 
deleteFiles) {
+this.dataFiles = dataFiles;
+this.deleteFiles = deleteFiles;
+  }
+
+  public static FilesForCommit onlyDelete(List deleteFiles) {
+return new FilesForCommit(Collections.emptyList(), deleteFiles);
+  }
+
+  public static FilesForCommit onlyData(List dataFiles) {
+return new FilesForCommit(dataFiles, Collections.emptyList());
+  }
+
+  public static FilesForCommit empty() {
+return new FilesForCommit(Collections.emptyList(), 
Collections.emptyList());
+  }
+
+  public List getDataFiles() {

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 753993)
Time Spent: 11h 40m  (was: 11.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753992
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:37
Start Date: 07/Apr/22 11:37
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845029785


##
ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java:
##
@@ -97,12 +100,22 @@ private void reparseAndSuperAnalyze(ASTNode tree) throws 
SemanticException {
 Table mTable = getTargetTable(tabName);
 validateTargetTable(mTable);
 
+// save the operation type into the query state
+SessionStateUtil.addResource(conf, 
Context.Operation.class.getSimpleName(), operation.name());
+
 StringBuilder rewrittenQueryStr = new StringBuilder();
 rewrittenQueryStr.append("insert into table ");
 rewrittenQueryStr.append(getFullTableNameForSQL(tabName));
 addPartitionColsToInsert(mTable.getPartCols(), rewrittenQueryStr);
 
-rewrittenQueryStr.append(" select ROW__ID");
+boolean nonNativeAcid = mTable.getStorageHandler() != null && 
mTable.getStorageHandler().supportsAcidOperations();

Review Comment:
   Sure, makes sense! I've added a util method to AcidUtils





Issue Time Tracking
---

Worklog Id: (was: 753992)
Time Spent: 11.5h  (was: 11h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=753989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753989
 ]

ASF GitHub Bot logged work on HIVE-26121:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:30
Start Date: 07/Apr/22 11:30
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3181:
URL: https://github.com/apache/hive/pull/3181#issuecomment-1091622189

   I have missed this before, but do we really need to synchronize 
`DriverTxnHandler.endTransactionAndCleanup` and 
`DbTxnManager.java.stopHeartbeat` too?
   
   Otherwise LGTM




Issue Time Tracking
---

Worklog Id: (was: 753989)
Time Spent: 40m  (was: 0.5h)

> Hive transaction rollback should be thread-safe
> ---
>
> Key: HIVE-26121
> URL: https://issues.apache.org/jira/browse/HIVE-26121
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When Hive query is being interrupted via cancel request, both the background 
> pool thread (HiveServer2-Background) executing the query and the HttpHandler 
> thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic 
> will eventually trigger the below method:
> {code}
> DriverTxnHandler.endTransactionAndCleanup(boolean commit)
> {code}
> Since this method could be invoked concurrently we need to synchronize access 
> to it, so that only 1 thread would attempt to abort the transaction and stop 
> the heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753986
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:24
Start Date: 07/Apr/22 11:24
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845020372


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7822,9 +7824,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
 
 List vecCol = new ArrayList();
 
-if (updating(dest) || deleting(dest)) {
+boolean nonNativeAcid = Optional.ofNullable(destinationTable)
+.map(Table::getStorageHandler)
+.map(HiveStorageHandler::supportsAcidOperations)
+.orElse(false);
+boolean isUpdateDelete = updating(dest) || deleting(dest);
+if (!nonNativeAcid && isUpdateDelete) {

Review Comment:
   I agree, that's more readable





Issue Time Tracking
---

Worklog Id: (was: 753986)
Time Spent: 11h 20m  (was: 11h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753982
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:19
Start Date: 07/Apr/22 11:19
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845016196


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java:
##
@@ -224,7 +232,7 @@ public Writable serialize(Object o, ObjectInspector 
objectInspector) {
 Deserializer deserializer = deserializers.get(objectInspector);
 if (deserializer == null) {
   deserializer = new Deserializer.Builder()
-  .schema(tableSchema)
+  .schema(isDelete ? deleteSchema : tableSchema)

Review Comment:
   `projectedSchema` is only local variable inside `initialize()` and not 
available here





Issue Time Tracking
---

Worklog Id: (was: 753982)
Time Spent: 11h 10m  (was: 11h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753981
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:17
Start Date: 07/Apr/22 11:17
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845014886


##
itests/qtest-iceberg/pom.xml:
##
@@ -122,6 +122,12 @@
   jersey-servlet
   test
 
+
+  org.roaringbitmap

Review Comment:
   The q test fails with ClassNotFoundException if this is not here. It's the 
same dependency included into the handler module: 
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/pom.xml#L103-L108





Issue Time Tracking
---

Worklog Id: (was: 753981)
Time Spent: 11h  (was: 10h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753977
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:00
Start Date: 07/Apr/22 11:00
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845001375


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -484,10 +500,35 @@ private static Schema readSchema(Configuration conf, 
Schema tableSchema, boolean
 
   String[] selectedColumns = InputFormatConfig.selectedColumns(conf);
   if (selectedColumns == null) {
-return tableSchema;
+return table.schema();
+  }
+
+  readSchema = caseSensitive ? table.schema().select(selectedColumns) :
+  table.schema().caseInsensitiveSelect(selectedColumns);
+
+  // for DELETE queries, add additional metadata columns into the read 
schema
+  if (HiveIcebergStorageHandler.isDelete(conf, conf.get(Catalogs.NAME))) {
+readSchema = 
IcebergAcidUtil.createFileReadSchemaForDelete(readSchema.columns(), table);
   }
 
-  return caseSensitive ? tableSchema.select(selectedColumns) : 
tableSchema.caseInsensitiveSelect(selectedColumns);
+  return readSchema;
+}
+
+private Schema schemaWithoutConstantsAndMeta(Schema readSchema, 
Map idToConstant) {

Review Comment:
   Yes!





Issue Time Tracking
---

Worklog Id: (was: 753977)
Time Spent: 10h 40m  (was: 10.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753978
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:00
Start Date: 07/Apr/22 11:00
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845001571


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.hadoop.hive.ql.io.PositionDeleteInfo;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap();

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 753978)
Time Spent: 10h 50m  (was: 10h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26074:
--
Labels: pull-request-available  (was: )

> PTF Vectorization: BoundaryScanner for varchar
> --
>
> Key: HIVE-26074
> URL: https://issues.apache.org/jira/browse/HIVE-26074
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-24761 should be extended for varchar, otherwise it fails on varchar type
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> attempt to setup a Window for typeString: 'varchar(170)'
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner.  (ValueBoundaryScanner.java:1257)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26074?focusedWorklogId=753976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753976
 ]

ASF GitHub Bot logged work on HIVE-26074:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 11:00
Start Date: 07/Apr/22 11:00
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request, #3187:
URL: https://github.com/apache/hive/pull/3187

   HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.




Issue Time Tracking
---

Worklog Id: (was: 753976)
Remaining Estimate: 0h
Time Spent: 10m

> PTF Vectorization: BoundaryScanner for varchar
> --
>
> Key: HIVE-26074
> URL: https://issues.apache.org/jira/browse/HIVE-26074
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-24761 should be extended for varchar, otherwise it fails on varchar type
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> attempt to setup a Window for typeString: 'varchar(170)'
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner.  (ValueBoundaryScanner.java:1257)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753973
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 10:57
Start Date: 07/Apr/22 10:57
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844995858


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -234,22 +240,23 @@ private static void checkResiduals(CombinedScanTask task) 
{
 private CloseableIterator currentIterator;
 private FileIO io;
 private EncryptionManager encryptionManager;
+private Table table;

Review Comment:
   We need the whole table object for this call:
   ```
   MetadataColumns#metadataColumn(Table table, String name)
   ```
   (which is called inside IcebergAcidUtil#createFileReadSchemaForDelete)
   -> this gives us the _partition metadata column during file read





Issue Time Tracking
---

Worklog Id: (was: 753973)
Time Spent: 10.5h  (was: 10h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753971
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 10:56
Start Date: 07/Apr/22 10:56
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844997990


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -234,22 +240,23 @@ private static void checkResiduals(CombinedScanTask task) 
{
 private CloseableIterator currentIterator;
 private FileIO io;
 private EncryptionManager encryptionManager;
+private Table table;

Review Comment:
   I'll remove those fields which are easily derivable from table, such as io 
and encryption





Issue Time Tracking
---

Worklog Id: (was: 753971)
Time Spent: 10h 20m  (was: 10h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753967
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 10:53
Start Date: 07/Apr/22 10:53
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844995858


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -234,22 +240,23 @@ private static void checkResiduals(CombinedScanTask task) 
{
 private CloseableIterator currentIterator;
 private FileIO io;
 private EncryptionManager encryptionManager;
+private Table table;

Review Comment:
   We need the whole table object for this call:
   ```
   public static NestedField metadataColumn(Table table, String name)
   ```
   (which is called inside IcebergAcidUtil#createFileReadSchemaForDelete)
   -> this gives us the _partition metadata column during file read





Issue Time Tracking
---

Worklog Id: (was: 753967)
Time Spent: 10h 10m  (was: 10h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753966
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 10:51
Start Date: 07/Apr/22 10:51
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844994035


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.hadoop.hive.ql.io.PositionDeleteInfo;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILEREAD_META_COLS.put(MetadataColumns.SPEC_ID, 0);

Review Comment:
   I chose a linked hashmap so that the iteration order is always deterministic 
when I extend the schema here:
   ```
   DELETE_FILE_READ_META_COLS.forEach((col, index) -> ...
   ```



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.hadoop.hive.ql.io.PositionDeleteInfo;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILEREAD_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILEREAD_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILEREAD_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILEREAD_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753965
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 10:49
Start Date: 07/Apr/22 10:49
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844992530


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();

Review Comment:
   Sure





Issue Time Tracking
---

Worklog Id: (was: 753965)
Time Spent: 9h 50m  (was: 9h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753942
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:59
Start Date: 07/Apr/22 09:59
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844950764


##
itests/qtest-iceberg/pom.xml:
##
@@ -122,6 +122,12 @@
   jersey-servlet
   test
 
+
+  org.roaringbitmap

Review Comment:
   Where is this used?





Issue Time Tracking
---

Worklog Id: (was: 753942)
Time Spent: 9h 40m  (was: 9.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753941
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:57
Start Date: 07/Apr/22 09:57
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844948586


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -484,10 +500,35 @@ private static Schema readSchema(Configuration conf, 
Schema tableSchema, boolean
 
   String[] selectedColumns = InputFormatConfig.selectedColumns(conf);
   if (selectedColumns == null) {
-return tableSchema;
+return table.schema();
+  }
+
+  readSchema = caseSensitive ? table.schema().select(selectedColumns) :
+  table.schema().caseInsensitiveSelect(selectedColumns);
+
+  // for DELETE queries, add additional metadata columns into the read 
schema
+  if (HiveIcebergStorageHandler.isDelete(conf, conf.get(Catalogs.NAME))) {
+readSchema = 
IcebergAcidUtil.createFileReadSchemaForDelete(readSchema.columns(), table);
   }
 
-  return caseSensitive ? tableSchema.select(selectedColumns) : 
tableSchema.caseInsensitiveSelect(selectedColumns);
+  return readSchema;
+}
+
+private Schema schemaWithoutConstantsAndMeta(Schema readSchema, 
Map idToConstant) {

Review Comment:
   Could this be a static method?





Issue Time Tracking
---

Worklog Id: (was: 753941)
Time Spent: 9.5h  (was: 9h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753940
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:56
Start Date: 07/Apr/22 09:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844947260


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -234,22 +240,23 @@ private static void checkResiduals(CombinedScanTask task) 
{
 private CloseableIterator currentIterator;
 private FileIO io;
 private EncryptionManager encryptionManager;
+private Table table;

Review Comment:
   Do we need the whole table here? Or we just need the partition objects and 
schema and...
   
   Either we remove the table, and set the specific values in `initialize`, or 
keep the table and remove the ones which are easily accessible, and do not need 
calculation. 





Issue Time Tracking
---

Worklog Id: (was: 753940)
Time Spent: 9h 20m  (was: 9h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753938
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:52
Start Date: 07/Apr/22 09:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844943512


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();

Review Comment:
   We can set it as an object attribute instead of getting it again and again





Issue Time Tracking
---

Worklog Id: (was: 753938)
Time Spent: 9h 10m  (was: 9h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753934
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:48
Start Date: 07/Apr/22 09:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844939837


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.hadoop.hive.ql.io.PositionDeleteInfo;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILEREAD_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILEREAD_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILEREAD_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILEREAD_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);

Review Comment:
   Maybe use ImmutableMap?





Issue Time Tracking
---

Worklog Id: (was: 753934)
Time Spent: 9h  (was: 8h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753933
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:47
Start Date: 07/Apr/22 09:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844939281


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.hadoop.hive.ql.io.PositionDeleteInfo;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILEREAD_META_COLS.put(MetadataColumns.SPEC_ID, 0);

Review Comment:
   Maybe use `ImmutableMap`?





Issue Time Tracking
---

Worklog Id: (was: 753933)
Time Spent: 8h 50m  (was: 8h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753932
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:47
Start Date: 07/Apr/22 09:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844938722


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.hadoop.hive.ql.io.PositionDeleteInfo;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap();

Review Comment:
   nit: FILE_READ?
   When you are using camelcase you write FileRead :D





Issue Time Tracking
---

Worklog Id: (was: 753932)
Time Spent: 8h 40m  (was: 8.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753929
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:37
Start Date: 07/Apr/22 09:37
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844929235


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java:
##
@@ -224,7 +232,7 @@ public Writable serialize(Object o, ObjectInspector 
objectInspector) {
 Deserializer deserializer = deserializers.get(objectInspector);
 if (deserializer == null) {
   deserializer = new Deserializer.Builder()
-  .schema(tableSchema)
+  .schema(isDelete ? deleteSchema : tableSchema)

Review Comment:
   Why not use the `projectedSchema` here?





Issue Time Tracking
---

Worklog Id: (was: 753929)
Time Spent: 8.5h  (was: 8h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753928
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:34
Start Date: 07/Apr/22 09:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844925536


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputFormat.java:
##
@@ -83,9 +83,20 @@ private static HiveIcebergRecordWriter writer(JobConf jc) {
 .operationId(operationId)
 .build();
 String tableName = jc.get(Catalogs.NAME);
-HiveFileWriterFactory hfwf = new HiveFileWriterFactory(table, fileFormat, 
schema,
-null, fileFormat, null, null, null, null);
-return new HiveIcebergRecordWriter(schema, spec, fileFormat,
-hfwf, outputFileFactory, io, targetFileSize, taskAttemptID, tableName);
+HiveFileWriterFactory writerFactory = new HiveFileWriterFactory(table, 
fileFormat, schema, null, fileFormat,
+null, null, null, getPositionDeleteRowSchema(schema, fileFormat));
+if (HiveIcebergStorageHandler.isDelete(jc, tableName)) {
+  return new HiveIcebergDeleteWriter(schema, spec, fileFormat, 
writerFactory, outputFileFactory, io, targetFileSize,
+  taskAttemptID, tableName);
+} else {
+  return new HiveIcebergRecordWriter(schema, spec, fileFormat, 
writerFactory, outputFileFactory, io, targetFileSize,
+  taskAttemptID, tableName);
+}
+  }
+
+  private static Schema getPositionDeleteRowSchema(Schema schema, FileFormat 
fileFormat) {
+// TODO: remove this Avro-specific logic once we have Avro writer function 
ready

Review Comment:
   Is it implemented in the Iceberg project? Is there an existing PR or issue 
for it?





Issue Time Tracking
---

Worklog Id: (was: 753928)
Time Spent: 8h 20m  (was: 8h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26111) FULL JOIN returns incorrect result

2022-04-07 Thread Youjun Yuan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Youjun Yuan updated HIVE-26111:
---
Description: 
we hit a query which FULL JOINs two tables, hive produces incorrect results, 
for a single value of join key, it produces two records, each record has a 
valid value for one table and NULL for the other table.

The query is:
{code:java}
SET mapreduce.job.reduces=2;
SELECT d.id, u.id
FROM (
       SELECT id
       FROM   airflow.tableA rud
       WHERE  rud.dt = '2022-04-02-1row'
) d
FULL JOIN (
       SELECT id
       FROM   default.tableB
       WHERE  dt = '2022-04-01' and device_token='blabla'
 ) u
ON u.id = d.id
; {code}
According to the job log, the two reducers each get an input record, and output 
a record.

And produces two records for id=350570497
{code:java}
350570497    NULL
NULL    350570497
Time taken: 62.692 seconds, Fetched: 2 row(s) {code}
I am sure tableB has only one row where device_token='blabla'

And we tried:

1, SET mapreduce.job.reduces=1; then it produces right result;

-2, SET hive.execution.engine=mr; then it produces right result;- mr also has 
the issue.

3, JOIN (instead of FULL JOIN) worked as expected

4, in sub query u, change filter device_token='blabla' to id=350570497, it 
worked ok

5, flatten the sub queries, then it works ok, like below:
{code:java}
SELECT  d.id, u.id 
from airflow.rds_users_delta d full join default.users u
on (u.id = d.id)
where d.dt = '2022-04-02-1row' and u.dt = '2022-04-01' and 
u.device_token='blabla' {code}
Below is the explain output of the query:
{code:java}
Plan optimized by CBO.Vertex dependency in root stage
Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3
      File Output Operator [FS_10]
        Map Join Operator [MAPJOIN_13] (rows=2 width=8)
          
Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"]
        <-Map 1 [CUSTOM_SIMPLE_EDGE]
          PARTITION_ONLY_SHUFFLE [RS_6]
            PartitionCols:_col0
            Select Operator [SEL_2] (rows=1 width=4)
              Output:["_col0"]
              TableScan [TS_0] (rows=1 width=4)
                
airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]
        <-Map 2 [CUSTOM_SIMPLE_EDGE]
          PARTITION_ONLY_SHUFFLE [RS_7]
            PartitionCols:_col0
            Select Operator [SEL_5] (rows=1 width=4)
              Output:["_col0"]
              Filter Operator [FIL_12] (rows=1 width=110)
                predicate:(device_token = 'blabla')
                TableScan [TS_3] (rows=215192362 width=109)
                  
default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"]  
{code}
I can't generate a small enough result set to reproduce the issue, I have 
minimized the tableA to only 1 row, tableB has ~200m rows, but if I further 
reduce the size of tableB, then the issue can't be reproduced.

Any suggestion would be highly appreciated, regarding the root cause of the 
issue, how to work around it, or how to reproduce it with small enough dataset. 

 

below is the log found in hive.log
{code:java}
220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17 : STAGE DEPENDENCIES:
  Stage-1 is a root stage [MAPRED]
  Stage-0 depends on stages: Stage-1 [FETCH]STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1
      Edges:
        Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)
      DagName: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: rud
                  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                  GatherStats: false
                  Select Operator
                    expressions: id (type: int)
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int)
                      null sort order: a
                      sort order: +
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                      tag: 0
                      auto parallelism: true
            Path -> Alias:
              s3a://.../rds_users_delta/dt=2022-04-02-1row/hh=00 [rud]
            Path -> Partition:
              s3a://.../rds_users_delta/dt=2022-04-02-1row/hh=00
                Partition
                  base file name: hh=00
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: 

[jira] [Commented] (HIVE-26111) FULL JOIN returns incorrect result

2022-04-07 Thread Youjun Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518731#comment-17518731
 ] 

Youjun Yuan commented on HIVE-26111:


should a duplication of https://issues.apache.org/jira/browse/HIVE-22098, the 
bucketing_version issue.

> FULL JOIN returns incorrect result
> --
>
> Key: HIVE-26111
> URL: https://issues.apache.org/jira/browse/HIVE-26111
> Project: Hive
>  Issue Type: Bug
> Environment: aws EMR (hive 3.1.2 + Tez 0.10.1)
>Reporter: Youjun Yuan
>Priority: Blocker
>
> we hit a query which FULL JOINs two tables, hive produces incorrect results, 
> for a single value of join key, it produces two records, each record has a 
> valid value for one table and NULL for the other table.
> The query is:
> {code:java}
> SET mapreduce.job.reduces=2;
> SELECT d.id, u.id
> FROM (
>        SELECT id
>        FROM   airflow.tableA rud
>        WHERE  rud.dt = '2022-04-02-1row'
> ) d
> FULL JOIN (
>        SELECT id
>        FROM   default.tableB
>        WHERE  dt = '2022-04-01' and device_token='blabla'
>  ) u
> ON u.id = d.id
> ; {code}
> According to the job log, the two reducers each get an input record, and 
> output a record.
> And produces two records for id=350570497
> {code:java}
> 350570497    NULL
> NULL    350570497
> Time taken: 62.692 seconds, Fetched: 2 row(s) {code}
> I am sure tableB has only one row where device_token='blabla'
> And we tried:
> 1, SET mapreduce.job.reduces=1; then it produces right result;
> -2, SET hive.execution.engine=mr; then it produces right result;- mr also has 
> the issue.
> 3, JOIN (instead of FULL JOIN) worked as expected
> 4, in sub query u, change filter device_token='blabla' to id=350570497, it 
> worked ok
> 5, flatten the sub queries, then it works ok, like below:
> {code:java}
> SELECT  d.id, u.id 
> from airflow.rds_users_delta d full join default.users u
> on (u.id = d.id)
> where d.dt = '2022-04-02-1row' and u.dt = '2022-04-01' and 
> u.device_token='blabla' {code}
> Below is the explain output of the query:
> {code:java}
> Plan optimized by CBO.Vertex dependency in root stage
> Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Reducer 3
>       File Output Operator [FS_10]
>         Map Join Operator [MAPJOIN_13] (rows=2 width=8)
>           
> Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"]
>         <-Map 1 [CUSTOM_SIMPLE_EDGE]
>           PARTITION_ONLY_SHUFFLE [RS_6]
>             PartitionCols:_col0
>             Select Operator [SEL_2] (rows=1 width=4)
>               Output:["_col0"]
>               TableScan [TS_0] (rows=1 width=4)
>                 
> airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]
>         <-Map 2 [CUSTOM_SIMPLE_EDGE]
>           PARTITION_ONLY_SHUFFLE [RS_7]
>             PartitionCols:_col0
>             Select Operator [SEL_5] (rows=1 width=4)
>               Output:["_col0"]
>               Filter Operator [FIL_12] (rows=1 width=110)
>                 predicate:(device_token = 'blabla')
>                 TableScan [TS_3] (rows=215192362 width=109)
>                   
> default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"]  
> {code}
> I can't generate a small enough result set to reproduce the issue, I have 
> minimized the tableA to only 1 row, tableB has ~200m rows, but if I further 
> reduce the size of tableB, then the issue can't be reproduced.
> Any suggestion would be highly appreciated, regarding the root cause of the 
> issue, how to work around it, or how to reproduce it with small enough 
> dataset. 
>  
> below is the log found in hive.log
> {code:java}
> 220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17 : STAGE DEPENDENCIES:
>   Stage-1 is a root stage [MAPRED]
>   Stage-0 depends on stages: Stage-1 [FETCH]STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1
>       Edges:
>         Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)
>       DagName: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: rud
>                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   GatherStats: false
>                   Select Operator
>                     expressions: id (type: int)
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 1 Data size: 4 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: int)
>             

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753926
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:28
Start Date: 07/Apr/22 09:28
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844919325


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergV2.java:
##
@@ -228,6 +230,104 @@ public void 
testReadAndWriteFormatV2Partitioned_PosDelete_RowSupplied() throws I
 Assert.assertArrayEquals(new Object[] {2L, "Trudy", "Pink"}, 
objects.get(3));
   }
 
+  @Test
+  public void testDeleteStatementUnpartitioned() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+
+// create and insert an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+// insert one more batch so that we have multiple data files within the 
same partition
+
shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1,
+TableIdentifier.of("default", "customers"), false));
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or 
first_name='Joanna'");
+
+List objects = shell.executeStatement("SELECT * FROM customers 
ORDER BY customer_id, last_name");
+Assert.assertEquals(6, objects.size());
+List expected = 
TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.add(1L, "Sharon", "Taylor")
+.add(2L, "Jake", "Donnel")
+.add(2L, "Susan", "Morrison")
+.add(2L, "Bob", "Silver")
+.add(4L, "Laci", "Zold")
+.add(5L, "Peti", "Rozsaszin")
+.build();
+HiveIcebergTestUtils.validateData(expected,
+
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 objects), 0);
+  }
+
+  @Test
+  public void testDeleteStatementPartitioned() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+PartitionSpec spec = 
PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.identity("last_name").bucket("customer_id", 16).build();
+
+// create and insert an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+// insert one more batch so that we have multiple data files within the 
same partition
+
shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1,
+TableIdentifier.of("default", "customers"), false));
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or 
first_name='Joanna'");
+
+List objects = shell.executeStatement("SELECT * FROM customers 
ORDER BY customer_id, last_name");
+Assert.assertEquals(6, objects.size());
+List expected = 
TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.add(1L, "Sharon", "Taylor")
+.add(2L, "Jake", "Donnel")
+.add(2L, "Susan", "Morrison")
+.add(2L, "Bob", "Silver")
+.add(4L, "Laci", "Zold")
+.add(5L, "Peti", "Rozsaszin")
+.build();
+HiveIcebergTestUtils.validateData(expected,
+
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 objects), 0);
+  }
+
+  @Test
+  public void testDeleteStatementWithOtherTable() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+PartitionSpec spec = 
PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.identity("last_name").bucket("customer_id", 16).build();
+
+// create a couple of tables, with an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+testTables.createTable(shell, "other", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, 2);
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id in (select 
t1.customer_id from customers t1 join " +
+"other t2 on t1.customer_id = t2.customer_id) or " +
+"first_name in (select first_name from 

[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753923
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:19
Start Date: 07/Apr/22 09:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r844910675


##
standalone-metastore/metastore-server/pom.xml:
##
@@ -474,23 +474,6 @@
   
 
   
-  
-generate-version-annotation
-generate-sources
-
-  
-
-  
-  
-  
-  
-
-  
-
-
-  run
-
-  

Review Comment:
   Also removed the script too, as it was duplicated as well





Issue Time Tracking
---

Worklog Id: (was: 753923)
Time Spent: 1h 20m  (was: 1h 10m)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at 

[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753922
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:18
Start Date: 07/Apr/22 09:18
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r844910181


##
standalone-metastore/pom.xml:
##
@@ -531,6 +531,30 @@
 
   
 
+
+  javadoc
+  
+
+  
+org.apache.maven.plugins
+maven-javadoc-plugin
+
+  none
+  -Xdoclint:none

Review Comment:
   Removed the unnecessary line





Issue Time Tracking
---

Worklog Id: (was: 753922)
Time Spent: 1h 10m  (was: 1h)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> 

[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753921
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:17
Start Date: 07/Apr/22 09:17
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r844909281


##
pom.xml:
##
@@ -1810,6 +1810,7 @@
 org.apache.maven.plugins
 maven-javadoc-plugin
 
+  none
   -Xdoclint:none

Review Comment:
   Removed the unnecessary line





Issue Time Tracking
---

Worklog Id: (was: 753921)
Time Spent: 1h  (was: 50m)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/../bin/javadoc
>  @options @packages
> [ERROR] 
> [ERROR] Refer to the generated Javadoc files in 
> 

[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753918
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:07
Start Date: 07/Apr/22 09:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r844899419


##
standalone-metastore/metastore-server/pom.xml:
##
@@ -474,23 +474,6 @@
   
 
   
-  
-generate-version-annotation
-generate-sources
-
-  
-
-  
-  
-  
-  
-
-  
-
-
-  run
-
-  

Review Comment:
   I think HIVE-20188 made the mistake to duplicate the code instead of moving 
it





Issue Time Tracking
---

Worklog Id: (was: 753918)
Time Spent: 50m  (was: 40m)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   

[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753915
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 09:03
Start Date: 07/Apr/22 09:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r844895657


##
pom.xml:
##
@@ -1810,6 +1810,7 @@
 org.apache.maven.plugins
 maven-javadoc-plugin
 
+  none
   -Xdoclint:none

Review Comment:
   I thought  this depends on the maven version, but found that this depends on 
the maven-javadoc-plugin version
   
   https://blog.joda.org/2014/02/turning-off-doclint-in-jdk-8-javadoc.html
   





Issue Time Tracking
---

Worklog Id: (was: 753915)
Time Spent: 40m  (was: 0.5h)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753913
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 08:57
Start Date: 07/Apr/22 08:57
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844890096


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/FilesForCommit.java:
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.ContentFile;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+
+public class FilesForCommit implements Serializable {
+
+  private final List dataFiles;

Review Comment:
   Which ones is easier/smaller to serialize `List` or `DataFile[]`? 





Issue Time Tracking
---

Worklog Id: (was: 753913)
Time Spent: 8h  (was: 7h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753912
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 08:57
Start Date: 07/Apr/22 08:57
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r844887792


##
standalone-metastore/metastore-server/pom.xml:
##
@@ -474,23 +474,6 @@
   
 
   
-  
-generate-version-annotation
-generate-sources
-
-  
-
-  
-  
-  
-  
-
-  
-
-
-  run
-
-  

Review Comment:
   Do we know why was this introduced in the first place and if it is safe to 
remove?
   If I understood well this is the main point of the fix, can you confirm?



##
standalone-metastore/pom.xml:
##
@@ -531,6 +531,30 @@
 
   
 
+
+  javadoc
+  
+
+  
+org.apache.maven.plugins
+maven-javadoc-plugin
+
+  none
+  -Xdoclint:none

Review Comment:
   Do we need both?



##
pom.xml:
##
@@ -1810,6 +1810,7 @@
 org.apache.maven.plugins
 maven-javadoc-plugin
 
+  none
   -Xdoclint:none

Review Comment:
   Is this change mandatory for building javadocs? 
   Aren't these two lines somewhat equivalent? Why do we need both?





Issue Time Tracking
---

Worklog Id: (was: 753912)
Time Spent: 0.5h  (was: 20m)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753910
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 08:55
Start Date: 07/Apr/22 08:55
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844888405


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergDeleteWriter.class);
+
+  private final ClusteredPositionDeleteWriter deleteWriter;

Review Comment:
   Do we have a common ancestor for ClusteredPositionDeleteWriter and 
ClusteredDataWriter, which we could use?





Issue Time Tracking
---

Worklog Id: (was: 753910)
Time Spent: 7h 50m  (was: 7h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753909
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 08:53
Start Date: 07/Apr/22 08:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844885993


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/FilesForCommit.java:
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.ContentFile;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+
+public class FilesForCommit implements Serializable {
+
+  private final List dataFiles;
+  private final List deleteFiles;
+
+  public FilesForCommit(List dataFiles, List 
deleteFiles) {
+this.dataFiles = dataFiles;
+this.deleteFiles = deleteFiles;
+  }
+
+  public static FilesForCommit onlyDelete(List deleteFiles) {
+return new FilesForCommit(Collections.emptyList(), deleteFiles);
+  }
+
+  public static FilesForCommit onlyData(List dataFiles) {
+return new FilesForCommit(dataFiles, Collections.emptyList());
+  }
+
+  public static FilesForCommit empty() {
+return new FilesForCommit(Collections.emptyList(), 
Collections.emptyList());
+  }
+
+  public List getDataFiles() {

Review Comment:
   In the Iceberg related code we usually try to avoid `get`. We might want to 
use `dataFiles()` instead





Issue Time Tracking
---

Worklog Id: (was: 753909)
Time Spent: 7h 40m  (was: 7.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly

2022-04-07 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518682#comment-17518682
 ] 

Naveen Gangam commented on HIVE-26118:
--

Fix has been merged to master. Thank you for the review [~dengzh]

> [Standalone Beeline] Jar name mismatch between build and assembly
> -
>
> Key: HIVE-26118
> URL: https://issues.apache.org/jira/browse/HIVE-26118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Fix from HIVE-25750 has an issue where the beeline builds a jar named 
> "jar-with-dependencies.jar" but the assembly looks for a jar name 
> "original-jar-with-dependencies.jar". Thus this uber jar never gets included 
> in the distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly

2022-04-07 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-26118.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> [Standalone Beeline] Jar name mismatch between build and assembly
> -
>
> Key: HIVE-26118
> URL: https://issues.apache.org/jira/browse/HIVE-26118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Fix from HIVE-25750 has an issue where the beeline builds a jar named 
> "jar-with-dependencies.jar" but the assembly looks for a jar name 
> "original-jar-with-dependencies.jar". Thus this uber jar never gets included 
> in the distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26118?focusedWorklogId=753900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753900
 ]

ASF GitHub Bot logged work on HIVE-26118:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 08:19
Start Date: 07/Apr/22 08:19
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on PR #3180:
URL: https://github.com/apache/hive/pull/3180#issuecomment-1091280878

   The 2 test failures seem random as these 2 have passed in the prior run 
where there was a different test failure. I do not see a connection between the 
failures and the fix.




Issue Time Tracking
---

Worklog Id: (was: 753900)
Time Spent: 40m  (was: 0.5h)

> [Standalone Beeline] Jar name mismatch between build and assembly
> -
>
> Key: HIVE-26118
> URL: https://issues.apache.org/jira/browse/HIVE-26118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Fix from HIVE-25750 has an issue where the beeline builds a jar named 
> "jar-with-dependencies.jar" but the assembly looks for a jar name 
> "original-jar-with-dependencies.jar". Thus this uber jar never gets included 
> in the distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly

2022-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26118?focusedWorklogId=753899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753899
 ]

ASF GitHub Bot logged work on HIVE-26118:
-

Author: ASF GitHub Bot
Created on: 07/Apr/22 08:18
Start Date: 07/Apr/22 08:18
Worklog Time Spent: 10m 
  Work Description: nrg4878 merged PR #3180:
URL: https://github.com/apache/hive/pull/3180




Issue Time Tracking
---

Worklog Id: (was: 753899)
Time Spent: 0.5h  (was: 20m)

> [Standalone Beeline] Jar name mismatch between build and assembly
> -
>
> Key: HIVE-26118
> URL: https://issues.apache.org/jira/browse/HIVE-26118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Fix from HIVE-25750 has an issue where the beeline builds a jar named 
> "jar-with-dependencies.jar" but the assembly looks for a jar name 
> "original-jar-with-dependencies.jar". Thus this uber jar never gets included 
> in the distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)