[jira] [Work logged] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26127?focusedWorklogId=754884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754884
 ]

ASF GitHub Bot logged work on HIVE-26127:
-

Author: ASF GitHub Bot
Created on: 09/Apr/22 03:31
Start Date: 09/Apr/22 03:31
Worklog Time Spent: 10m 
  Work Description: hsnusonic opened a new pull request, #3198:
URL: https://github.com/apache/hive/pull/3198

   …tition is deleted
   
   
   
   ### What changes were proposed in this pull request?
   
   Catch FileNotFoundException when a directory is cleaned up for insert 
overwrite.
   
   ### Why are the changes needed?
   
   For external tables, any partition could be deleted out of Hive's control. 
Insert overwrite should not fail because a partition directory is removed.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert_overwrite.q




Issue Time Tracking
---

Worklog Id: (was: 754884)
Remaining Estimate: 0h
Time Spent: 10m

> Insert overwrite throws FileNotFound when destination partition is deleted 
> ---
>
> Key: HIVE-26127
> URL: https://issues.apache.org/jira/browse/HIVE-26127
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # create external table src (col int) partitioned by (year int);
>  # create external table dest (col int) partitioned by (year int);
>  # insert into src partition (year=2022) values (1);
>  # insert into dest partition (year=2022) values (2);
>  # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
>  # insert overwrite table dest select * from src;
> We will get FileNotFoundException as below.
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1
>  could not be cleaned up.
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387)
>     at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) {code}
> It is because it call listStatus on a path doesn't exist. We should not fail 
> insert overwrite because there is nothing to be clean up.
> {code:java}
> fs.listStatus(path, pathFilter){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26127:
--
Labels: pull-request-available  (was: )

> Insert overwrite throws FileNotFound when destination partition is deleted 
> ---
>
> Key: HIVE-26127
> URL: https://issues.apache.org/jira/browse/HIVE-26127
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # create external table src (col int) partitioned by (year int);
>  # create external table dest (col int) partitioned by (year int);
>  # insert into src partition (year=2022) values (1);
>  # insert into dest partition (year=2022) values (2);
>  # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
>  # insert overwrite table dest select * from src;
> We will get FileNotFoundException as below.
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1
>  could not be cleaned up.
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387)
>     at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) {code}
> It is because it call listStatus on a path doesn't exist. We should not fail 
> insert overwrite because there is nothing to be clean up.
> {code:java}
> fs.listStatus(path, pathFilter){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted

2022-04-08 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai updated HIVE-26127:
--
Description: 
Steps to reproduce:
 # create external table src (col int) partitioned by (year int);
 # create external table dest (col int) partitioned by (year int);
 # insert into src partition (year=2022) values (1);
 # insert into dest partition (year=2022) values (2);
 # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
 # insert overwrite table dest select * from src;

We will get FileNotFoundException as below.
{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1
 could not be cleaned up.
    at 
org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387)
    at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748) {code}
It is because it call listStatus on a path doesn't exist. We should not fail 
insert overwrite because there is nothing to be clean up.
{code:java}
fs.listStatus(path, pathFilter){code}
 

  was:
Steps to reproduce:
 # create external table src (col int) partitioned by (year int);
 # create external table dest (col int) partitioned by (year int);
 # insert into src partition (year=2022) values (1);
 # insert into dest partition (year=2022) values (2);
 # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
 # insert overwrite table dest select * from src;

We will get FileNotFoundException when it tries to call 
{code:java}
fs.listStatus(path, pathFilter){code}
We should not fail insert overwrite because there is nothing to be clean up.


> Insert overwrite throws FileNotFound when destination partition is deleted 
> ---
>
> Key: HIVE-26127
> URL: https://issues.apache.org/jira/browse/HIVE-26127
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
> Steps to reproduce:
>  # create external table src (col int) partitioned by (year int);
>  # create external table dest (col int) partitioned by (year int);
>  # insert into src partition (year=2022) values (1);
>  # insert into dest partition (year=2022) values (2);
>  # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
>  # insert overwrite table dest select * from src;
> We will get FileNotFoundException as below.
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1
>  could not be cleaned up.
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387)
>     at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) {code}
> It is because it call listStatus on a path doesn't exist. We should not fail 
> insert overwrite because there is nothing to be clean up.
> {code:java}
> fs.listStatus(path, pathFilter){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted

2022-04-08 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai reassigned HIVE-26127:
-


> Insert overwrite throws FileNotFound when destination partition is deleted 
> ---
>
> Key: HIVE-26127
> URL: https://issues.apache.org/jira/browse/HIVE-26127
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
> Steps to reproduce:
>  # create external table src (col int) partitioned by (year int);
>  # create external table dest (col int) partitioned by (year int);
>  # insert into src partition (year=2022) values (1);
>  # insert into dest partition (year=2022) values (2);
>  # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
>  # insert overwrite table dest select * from src;
> We will get FileNotFoundException when it tries to call 
> {code:java}
> fs.listStatus(path, pathFilter){code}
> We should not fail insert overwrite because there is nothing to be clean up.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26096) Select on single column MultiDelimitSerDe table throws AIOBE

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26096?focusedWorklogId=754863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754863
 ]

ASF GitHub Bot logged work on HIVE-26096:
-

Author: ASF GitHub Bot
Created on: 09/Apr/22 00:22
Start Date: 09/Apr/22 00:22
Worklog Time Spent: 10m 
  Work Description: ramesh0201 merged PR #3158:
URL: https://github.com/apache/hive/pull/3158




Issue Time Tracking
---

Worklog Id: (was: 754863)
Time Spent: 0.5h  (was: 20m)

> Select on single column MultiDelimitSerDe table throws AIOBE
> 
>
> Key: HIVE-26096
> URL: https://issues.apache.org/jira/browse/HIVE-26096
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Repro details
>  
> {code:java}
> create table test_multidelim(col string)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe'
> with serdeproperties('field.delim'='!^') STORED AS TEXTFILE;
> insert into test_multidelim values('aa'),('bb'),('cc'),('dd');
> select * from test_multidelim;
> {code}
> Exception:
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>         at 
> org.apache.hadoop.hive.serde2.lazy.LazyStruct.parseMultiDelimit(LazyStruct.java:303)
>         at 
> org.apache.hadoop.hive.serde2.MultiDelimitSerDe.doDeserialize(MultiDelimitSerDe.java:160)
>         at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.deserialize(AbstractEncodingAwareSerDe.java:74)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:603){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25840) Prevent duplicate paths in the fileList while adding an entry to NotifcationLog

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25840?focusedWorklogId=754860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754860
 ]

ASF GitHub Bot logged work on HIVE-25840:
-

Author: ASF GitHub Bot
Created on: 09/Apr/22 00:18
Start Date: 09/Apr/22 00:18
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2913: 
HIVE-25840: Prevent duplicate paths in the fileList while adding an e…
URL: https://github.com/apache/hive/pull/2913




Issue Time Tracking
---

Worklog Id: (was: 754860)
Time Spent: 0.5h  (was: 20m)

> Prevent duplicate paths in the fileList while adding an entry to 
> NotifcationLog
> ---
>
> Key: HIVE-25840
> URL: https://issues.apache.org/jira/browse/HIVE-25840
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As of now, while adding entries to notification logs, in case of retries, 
> sometimes the same path gets added to the notification log entry, which 
> during replication leads to failures during copy.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25827) Parquet file footer is read multiple times, when multiple splits are created in same file

2022-04-08 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519777#comment-17519777
 ] 

Steve Loughran commented on HIVE-25827:
---

is this per input stream, or are separate streams opened to read it

if its the same opened file, HADOOP-18028 will mitigate this on s3

> Parquet file footer is read multiple times, when multiple splits are created 
> in same file
> -
>
> Key: HIVE-25827
> URL: https://issues.apache.org/jira/browse/HIVE-25827
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
> Attachments: image-2021-12-21-03-19-38-577.png
>
>
> With large files, it is possible that multiple splits are created in the same 
> file. With current codebase, "ParquetRecordReaderBase" ends up reading file 
> footer for each split. 
> It can be optimized not to read footer information multiple times for the 
> same file.
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java#L160]
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L91]
>  
>  
> !image-2021-12-21-03-19-38-577.png|width=1363,height=1256!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754689
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 16:12
Start Date: 08/Apr/22 16:12
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r846286761


##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestRemoteHiveHttpMetaStore.java:
##
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import org.apache.hadoop.hive.metastore.annotation.MetastoreUnitTest;
+import org.junit.experimental.categories.Category;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+
+@Category(MetastoreCheckinTest.class)
+public class TestRemoteHiveHttpMetaStore extends TestRemoteHiveMetaStore {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(TestRemoteHiveHttpMetaStore.class);
+
+  @Override
+  public void start() throws Exception {
+MetastoreConf.setVar(conf, ConfVars.THRIFT_TRANSPORT_MODE, "http");
+LOG.info("Attempting to start test remote metastore in http mode");
+super.start();
+LOG.info("Successfully started test remote metastore in http mode");
+  }
+
+  @Override
+  protected HiveMetaStoreClient createClient() throws Exception {
+MetastoreConf.setVar(conf, 
ConfVars.METASTORE_CLIENT_THRIFT_TRANSPORT_MODE, "http");
+return super.createClient();
+  }
+}

Review Comment:
   Nit: Add a new line at the end of the file.





Issue Time Tracking
---

Worklog Id: (was: 754689)
Time Spent: 3h 50m  (was: 3h 40m)

> Hive Metastore Thrift over HTTP
> ---
>
> Key: HIVE-21456
> URL: https://issues.apache.org/jira/browse/HIVE-21456
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Standalone Metastore
>Reporter: Amit Khanna
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, 
> HIVE-21456.4.patch, HIVE-21456.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Hive Metastore currently doesn't have support for HTTP transport because of 
> which it is not possible to access it via Knox. Adding support for Thrift 
> over HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754691
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 16:12
Start Date: 08/Apr/22 16:12
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r846287153


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HmsThriftHttpServlet.java:
##
@@ -0,0 +1,116 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.Enumeration;
+
+import javax.servlet.ServletException;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.thrift.TProcessor;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.server.TServlet;
+
+public class HmsThriftHttpServlet extends TServlet {
+
+  private static final Logger LOG = LoggerFactory
+  .getLogger(HmsThriftHttpServlet.class);
+
+  private static final String X_USER = MetaStoreUtils.USER_NAME_HTTP_HEADER;
+
+  private final boolean isSecurityEnabled;
+
+  public HmsThriftHttpServlet(TProcessor processor,
+  TProtocolFactory inProtocolFactory, TProtocolFactory outProtocolFactory) 
{
+super(processor, inProtocolFactory, outProtocolFactory);
+// This should ideally be reveiving an instance of the Configuration which 
is used for the check
+isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
+  }
+
+  public HmsThriftHttpServlet(TProcessor processor,
+  TProtocolFactory protocolFactory) {
+super(processor, protocolFactory);
+isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
+  }
+
+  @Override
+  protected void doPost(HttpServletRequest request,
+  HttpServletResponse response) throws ServletException, IOException {
+
+Enumeration headerNames = request.getHeaderNames();
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Logging headers in request");
+  while (headerNames.hasMoreElements()) {
+String headerName = headerNames.nextElement();
+LOG.debug("Header: [{}], Value: [{}]", headerName,
+request.getHeader(headerName));
+  }
+}
+String userFromHeader = request.getHeader(X_USER);
+if (userFromHeader == null || userFromHeader.isEmpty()) {
+  LOG.error("No user header: {} found", X_USER);
+  response.sendError(HttpServletResponse.SC_FORBIDDEN,
+  "User Header missing");
+  return;
+}
+
+// TODO: These should ideally be in some kind of a Cache with Weak 
referencse.
+// If HMS were to set up some kind of a session, this would go into the 
session by having
+// this filter work with a custom Processor / or set the username into the 
session
+// as is done for HS2.
+// In case of HMS, it looks like each request is independent, and there is 
no session
+// information, so the UGI needs to be set up in the Connection layer 
itself.
+UserGroupInformation clientUgi;
+// Temporary, and useless for now. Here only to allow this to work on an 
otherwise kerberized
+// server.
+if (isSecurityEnabled) {
+  LOG.info("Creating proxy user for: {}", userFromHeader);
+  clientUgi = UserGroupInformation.createProxyUser(userFromHeader, 
UserGroupInformation.getLoginUser());
+} else {
+  LOG.info("Creating remote user for: {}", userFromHeader);
+  clientUgi = UserGroupInformation.createRemoteUser(userFromHeader);
+}
+
+
+PrivilegedExceptionAction action = new 
PrivilegedExceptionAction() {
+  @Override
+  public Void run() throws Exception {
+HmsThriftHttpServlet.super.doPost(request, response);
+

[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754688
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 16:11
Start Date: 08/Apr/22 16:11
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r846286006


##
standalone-metastore/pom.xml:
##
@@ -361,6 +362,12 @@
 runtime
 true
   
+   Hive Metastore Thrift over HTTP
> ---
>
> Key: HIVE-21456
> URL: https://issues.apache.org/jira/browse/HIVE-21456
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Standalone Metastore
>Reporter: Amit Khanna
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, 
> HIVE-21456.4.patch, HIVE-21456.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Hive Metastore currently doesn't have support for HTTP transport because of 
> which it is not possible to access it via Knox. Adding support for Thrift 
> over HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754681
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 16:05
Start Date: 08/Apr/22 16:05
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r846281556


##
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestSSL.java:
##
@@ -437,15 +439,36 @@ public void testConnectionWrongCertCN() throws Exception {
* Test HMS server with SSL
* @throws Exception
*/
+  @Ignore
   @Test
   public void testMetastoreWithSSL() throws Exception {
 testSSLHMS(true);
   }
 
+  /**
+   * Test HMS server with Http + SSL
+   * @throws Exception
+   */
+  @Test
+  public void testMetastoreWithHttps() throws Exception {
+// MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.EVENT_DB_NOTIFICATION_API_AUTH, false);
+//MetastoreConf.setVar(conf, 
MetastoreConf.ConfVars.METASTORE_CLIENT_TRANSPORT_MODE, "http");
+SSLTestUtils.setMetastoreHttpsConf(conf);
+MetastoreConf.setVar(conf, 
MetastoreConf.ConfVars.SSL_TRUSTMANAGERFACTORY_ALGORITHM,
+KEY_MANAGER_FACTORY_ALGORITHM);
+MetastoreConf.setVar(conf, MetastoreConf.ConfVars.SSL_TRUSTSTORE_TYPE, 
KEY_STORE_TRUST_STORE_TYPE);
+MetastoreConf.setVar(conf, MetastoreConf.ConfVars.SSL_KEYSTORE_TYPE, 
KEY_STORE_TRUST_STORE_TYPE);
+MetastoreConf.setVar(conf, 
MetastoreConf.ConfVars.SSL_KEYMANAGERFACTORY_ALGORITHM,
+KEY_MANAGER_FACTORY_ALGORITHM);
+
+testSSLHMS(false);

Review Comment:
   Why are we passing false here? This value is used in testSSLHMS()#L459-461 
to set the keystore for HMS and HS2. You are already setting this for HMS in 
L461 here and we don't need to set for HS2. So why don't we just pass the value 
true?





Issue Time Tracking
---

Worklog Id: (was: 754681)
Time Spent: 3.5h  (was: 3h 20m)

> Hive Metastore Thrift over HTTP
> ---
>
> Key: HIVE-21456
> URL: https://issues.apache.org/jira/browse/HIVE-21456
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Standalone Metastore
>Reporter: Amit Khanna
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, 
> HIVE-21456.4.patch, HIVE-21456.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Hive Metastore currently doesn't have support for HTTP transport because of 
> which it is not possible to access it via Knox. Adding support for Thrift 
> over HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754672
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 15:45
Start Date: 08/Apr/22 15:45
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846262788


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.TimeUnit;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final long RECORD_CACHE_EXPIRE_MILLI = 10 * 60 * 1000;
+  private static final long RECORD_CACHE_MAX_SIZE = 1000;
+  private static final Cache RECORD_CACHE = 
Caffeine.newBuilder()
+  .expireAfterAccess(RECORD_CACHE_EXPIRE_MILLI, TimeUnit.MILLISECONDS)
+  .maximumSize(RECORD_CACHE_MAX_SIZE)
+  .build();
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  public static PositionDelete 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754669
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 15:33
Start Date: 08/Apr/22 15:33
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846252270


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.TimeUnit;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final long RECORD_CACHE_EXPIRE_MILLI = 10 * 60 * 1000;
+  private static final long RECORD_CACHE_MAX_SIZE = 1000;
+  private static final Cache RECORD_CACHE = 
Caffeine.newBuilder()
+  .expireAfterAccess(RECORD_CACHE_EXPIRE_MILLI, TimeUnit.MILLISECONDS)
+  .maximumSize(RECORD_CACHE_MAX_SIZE)
+  .build();
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  public static PositionDelete 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754665
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 15:30
Start Date: 08/Apr/22 15:30
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846249301


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.TimeUnit;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final long RECORD_CACHE_EXPIRE_MILLI = 10 * 60 * 1000;
+  private static final long RECORD_CACHE_MAX_SIZE = 1000;
+  private static final Cache RECORD_CACHE = 
Caffeine.newBuilder()
+  .expireAfterAccess(RECORD_CACHE_EXPIRE_MILLI, TimeUnit.MILLISECONDS)
+  .maximumSize(RECORD_CACHE_MAX_SIZE)
+  .build();
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  public static PositionDelete 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754662
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 15:30
Start Date: 08/Apr/22 15:30
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r84624


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.TimeUnit;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final long RECORD_CACHE_EXPIRE_MILLI = 10 * 60 * 1000;
+  private static final long RECORD_CACHE_MAX_SIZE = 1000;
+  private static final Cache RECORD_CACHE = 
Caffeine.newBuilder()
+  .expireAfterAccess(RECORD_CACHE_EXPIRE_MILLI, TimeUnit.MILLISECONDS)
+  .maximumSize(RECORD_CACHE_MAX_SIZE)
+  .build();
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  public static PositionDelete 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754627
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 14:07
Start Date: 08/Apr/22 14:07
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846148875


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  public static PositionDelete getPositionDelete(Schema schema, Record 
rec) {
+PositionDelete positionDelete = PositionDelete.create();
+String filePath = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class);
+long filePosition = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class);
+
+int dataOffset = DELETE_SERDE_META_COLS.size(); // position in the rec 
where the actual row data begins
+Record rowData = GenericRecord.create(schema);


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754621
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 13:55
Start Date: 08/Apr/22 13:55
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846137472


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  public static PositionDelete getPositionDelete(Schema schema, Record 
rec) {
+PositionDelete positionDelete = PositionDelete.create();
+String filePath = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class);
+long filePosition = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class);
+
+int dataOffset = DELETE_SERDE_META_COLS.size(); // position in the rec 
where the actual row data begins
+Record rowData = GenericRecord.create(schema);


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754608
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 13:47
Start Date: 08/Apr/22 13:47
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846128892


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergWriter.java:
##
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.hive.ql.exec.FileSinkOperator;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.PartitionKey;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.InternalRecordWrapper;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.PartitioningWriter;
+import org.apache.iceberg.mr.mapred.Container;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.util.Tasks;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@SuppressWarnings("checkstyle:VisibilityModifier")
+public abstract class HiveIcebergWriter implements 
FileSinkOperator.RecordWriter,
+org.apache.hadoop.mapred.RecordWriter> {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergWriter.class);

Review Comment:
   I've removed it from RecordWriter and the DeleteWriter





Issue Time Tracking
---

Worklog Id: (was: 754608)
Time Spent: 14h 20m  (was: 14h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754607
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 13:45
Start Date: 08/Apr/22 13:45
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846127594


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  public static PositionDelete getPositionDelete(Schema schema, Record 
rec) {
+PositionDelete positionDelete = PositionDelete.create();
+String filePath = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class);
+long filePosition = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class);
+
+int dataOffset = DELETE_SERDE_META_COLS.size(); // position in the rec 
where the actual row data begins
+Record rowData = GenericRecord.create(schema);


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754606
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 13:43
Start Date: 08/Apr/22 13:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846125881


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergWriter.java:
##
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.hive.ql.exec.FileSinkOperator;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.PartitionKey;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.InternalRecordWrapper;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.PartitioningWriter;
+import org.apache.iceberg.mr.mapred.Container;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.util.Tasks;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@SuppressWarnings("checkstyle:VisibilityModifier")
+public abstract class HiveIcebergWriter implements 
FileSinkOperator.RecordWriter,
+org.apache.hadoop.mapred.RecordWriter> {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergWriter.class);

Review Comment:
   Not used anymore





Issue Time Tracking
---

Worklog Id: (was: 754606)
Time Spent: 14h  (was: 13h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 14h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754599
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 13:39
Start Date: 08/Apr/22 13:39
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846121847


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  public static PositionDelete getPositionDelete(Schema schema, Record 
rec) {
+PositionDelete positionDelete = PositionDelete.create();
+String filePath = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class);
+long filePosition = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class);
+
+int dataOffset = DELETE_SERDE_META_COLS.size(); // position in the rec 
where the actual row data begins
+Record rowData = GenericRecord.create(schema);

Review 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754571
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 13:00
Start Date: 08/Apr/22 13:00
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846087296


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java:
##
@@ -50,10 +50,14 @@
 
   RAWDATASIZE("RAW__DATA__SIZE", TypeInfoFactory.longTypeInfo),
   /**
-   * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} 
+   * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier}
*/
   ROWID("ROW__ID", RecordIdentifier.StructInfo.typeInfo, true, 
RecordIdentifier.StructInfo.oi),
   ROWISDELETED("ROW__IS__DELETED", TypeInfoFactory.booleanTypeInfo),
+  PARTITION_SPEC_ID("PARTITION__SPEC__ID", TypeInfoFactory.intTypeInfo),
+  PARTITION_HASH("PARTITION__HASH", TypeInfoFactory.longTypeInfo),
+  FILE_PATH("FILE__PATH", TypeInfoFactory.stringTypeInfo),
+  ROW_POSITION("ROW__POSITION", TypeInfoFactory.longTypeInfo),

Review Comment:
   The row number is obtained in the `OrcRecordUpdater`:
   ```
   ByteBuffer val =
   reader.getMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME)
   ```
   then the stripe metadata is deserialized and this contains the row number





Issue Time Tracking
---

Worklog Id: (was: 754571)
Time Spent: 13h 40m  (was: 13.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754565
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 12:48
Start Date: 08/Apr/22 12:48
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846077086


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java:
##
@@ -50,10 +50,14 @@
 
   RAWDATASIZE("RAW__DATA__SIZE", TypeInfoFactory.longTypeInfo),
   /**
-   * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} 
+   * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier}
*/
   ROWID("ROW__ID", RecordIdentifier.StructInfo.typeInfo, true, 
RecordIdentifier.StructInfo.oi),
   ROWISDELETED("ROW__IS__DELETED", TypeInfoFactory.booleanTypeInfo),
+  PARTITION_SPEC_ID("PARTITION__SPEC__ID", TypeInfoFactory.intTypeInfo),
+  PARTITION_HASH("PARTITION__HASH", TypeInfoFactory.longTypeInfo),
+  FILE_PATH("FILE__PATH", TypeInfoFactory.stringTypeInfo),

Review Comment:
   Unfortunately no, the `INPUT__FILE__NAME` seems to be used differently and 
`ctx.getCurrentInputPath()` (where its value comes from) only contains the root 
location, not the full file path:
   ```
   
file:/Users/martonbod/Repos/Hive/Fork/hive/iceberg/iceberg-handler/target/tmp/hive5438133120236603811/external/customers
   ```





Issue Time Tracking
---

Worklog Id: (was: 754565)
Time Spent: 13.5h  (was: 13h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754558
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 12:34
Start Date: 08/Apr/22 12:34
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r846065462


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputFormat.java:
##
@@ -83,9 +83,20 @@ private static HiveIcebergRecordWriter writer(JobConf jc) {
 .operationId(operationId)
 .build();
 String tableName = jc.get(Catalogs.NAME);
-HiveFileWriterFactory hfwf = new HiveFileWriterFactory(table, fileFormat, 
schema,
-null, fileFormat, null, null, null, null);
-return new HiveIcebergRecordWriter(schema, spec, fileFormat,
-hfwf, outputFileFactory, io, targetFileSize, taskAttemptID, tableName);
+HiveFileWriterFactory writerFactory = new HiveFileWriterFactory(table, 
fileFormat, schema, null, fileFormat,
+null, null, null, getPositionDeleteRowSchema(schema, fileFormat));
+if (HiveIcebergStorageHandler.isDelete(jc, tableName)) {
+  return new HiveIcebergDeleteWriter(schema, spec, fileFormat, 
writerFactory, outputFileFactory, io, targetFileSize,
+  taskAttemptID, tableName);
+} else {
+  return new HiveIcebergRecordWriter(schema, spec, fileFormat, 
writerFactory, outputFileFactory, io, targetFileSize,
+  taskAttemptID, tableName);
+}
+  }
+
+  private static Schema getPositionDeleteRowSchema(Schema schema, FileFormat 
fileFormat) {
+// TODO: remove this Avro-specific logic once we have Avro writer function 
ready

Review Comment:
   Actually we no longer need an exception for Avro, we can use the 
`DataWriter` class from the iceberg-avro module





Issue Time Tracking
---

Worklog Id: (was: 754558)
Time Spent: 13h 20m  (was: 13h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25492) Major query-based compaction is skipped if partition is empty

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25492?focusedWorklogId=754540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754540
 ]

ASF GitHub Bot logged work on HIVE-25492:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 11:43
Start Date: 08/Apr/22 11:43
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on PR #3157:
URL: https://github.com/apache/hive/pull/3157#issuecomment-1092774567

   pending tests




Issue Time Tracking
---

Worklog Id: (was: 754540)
Time Spent: 2h 10m  (was: 2h)

> Major query-based compaction is skipped if partition is empty
> -
>
> Key: HIVE-25492
> URL: https://issues.apache.org/jira/browse/HIVE-25492
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently if the result of query-based compaction is an empty base, delta, or 
> delete delta, the empty directory is deleted.
> This is because of minor compaction – if there are only deltas to compact, 
> then no compacted delete delta should be created (only a compacted delta). In 
> the same way, if there are only delete deltas to compact, then no compacted 
> delta should be created (only a compacted delete delta).
> There is an issue with major compaction. If all the data in the partition has 
> been deleted, then we should get an empty base directory after compaction. 
> Instead, the empty base directory is deleted because it's empty and 
> compaction claims to succeed but we end up with the same deltas/delete deltas 
> we started with – basically compaction does not run.
> Where to start? MajorQueryCompactor#commitCompaction



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=754531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754531
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 11:28
Start Date: 08/Apr/22 11:28
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on PR #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-1092763873

   @zabetak Sorry for miss this issue long time, Can you help me reopen this PR?




Issue Time Tracking
---

Worklog Id: (was: 754531)
Time Spent: 2h 20m  (was: 2h 10m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=754525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754525
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 11:22
Start Date: 08/Apr/22 11:22
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on PR #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-1092759199

   Sorry for miss it, reopen this PR.




Issue Time Tracking
---

Worklog Id: (was: 754525)
Time Spent: 2h 10m  (was: 2h)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754511
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 10:48
Start Date: 08/Apr/22 10:48
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845988600


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergV2.java:
##
@@ -228,6 +230,104 @@ public void 
testReadAndWriteFormatV2Partitioned_PosDelete_RowSupplied() throws I
 Assert.assertArrayEquals(new Object[] {2L, "Trudy", "Pink"}, 
objects.get(3));
   }
 
+  @Test
+  public void testDeleteStatementUnpartitioned() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+
+// create and insert an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+// insert one more batch so that we have multiple data files within the 
same partition
+
shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1,
+TableIdentifier.of("default", "customers"), false));
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or 
first_name='Joanna'");
+
+List objects = shell.executeStatement("SELECT * FROM customers 
ORDER BY customer_id, last_name");
+Assert.assertEquals(6, objects.size());
+List expected = 
TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.add(1L, "Sharon", "Taylor")
+.add(2L, "Jake", "Donnel")
+.add(2L, "Susan", "Morrison")
+.add(2L, "Bob", "Silver")
+.add(4L, "Laci", "Zold")
+.add(5L, "Peti", "Rozsaszin")
+.build();
+HiveIcebergTestUtils.validateData(expected,
+
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 objects), 0);
+  }
+
+  @Test
+  public void testDeleteStatementPartitioned() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+PartitionSpec spec = 
PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.identity("last_name").bucket("customer_id", 16).build();
+
+// create and insert an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+// insert one more batch so that we have multiple data files within the 
same partition
+
shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1,
+TableIdentifier.of("default", "customers"), false));
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or 
first_name='Joanna'");
+
+List objects = shell.executeStatement("SELECT * FROM customers 
ORDER BY customer_id, last_name");
+Assert.assertEquals(6, objects.size());
+List expected = 
TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.add(1L, "Sharon", "Taylor")
+.add(2L, "Jake", "Donnel")
+.add(2L, "Susan", "Morrison")
+.add(2L, "Bob", "Silver")
+.add(4L, "Laci", "Zold")
+.add(5L, "Peti", "Rozsaszin")
+.build();
+HiveIcebergTestUtils.validateData(expected,
+
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 objects), 0);
+  }
+
+  @Test
+  public void testDeleteStatementWithOtherTable() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+PartitionSpec spec = 
PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.identity("last_name").bucket("customer_id", 16).build();
+
+// create a couple of tables, with an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+testTables.createTable(shell, "other", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, 2);
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id in (select 
t1.customer_id from customers t1 join " +
+"other t2 on t1.customer_id = t2.customer_id) or " +
+"first_name in (select first_name from 

[jira] [Updated] (HIVE-26126) Allow capturing/validating SQL generated from HMS calls in qtests

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26126:
--
Labels: pull-request-available  (was: )

> Allow capturing/validating SQL generated from HMS calls in qtests
> -
>
> Key: HIVE-26126
> URL: https://issues.apache.org/jira/browse/HIVE-26126
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During the compilation/execution of a Hive command there are usually calls in 
> the HiveMetastore (HMS). Most of the time these calls need to connect to the 
> underlying database backend in order to return the requested information so 
> they trigger the generation and execution of SQL queries. 
> We have a lot of code in Hive which affects the generation and execution of 
> these SQL queries and some vivid examples are the {{MetaStoreDirectSql}} and 
> {{CachedStore}} classes.
> [MetaStoreDirectSql|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java]
>  is responsible for building explicitly SQL queries for performance reasons. 
> [CachedStore|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java]
>  is responsible for caching certain requests to avoid going to the database 
> on every call. 
> Ensuring that the generated SQL is the expected one and/or that certain 
> queries are hitting (or not) the DB is valuable for catching regressions or 
> evaluating the effectiveness of caches.
> The idea is that for each Hive command/query in some qtest there is an option 
> to include in the output (.q.out) the list of SQL queries that were generated 
> by HMS calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26126) Allow capturing/validating SQL generated from HMS calls in qtests

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26126?focusedWorklogId=754499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754499
 ]

ASF GitHub Bot logged work on HIVE-26126:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 10:19
Start Date: 08/Apr/22 10:19
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request, #3197:
URL: https://github.com/apache/hive/pull/3197

   ### What changes were proposed in this pull request?
   1. Create logger configuration (tests only) for writing specific DataNucleus 
SQL queries to (new) operation log files.
   2. Add hook copying the content of datanucleus log files to session's 
console for use in qtests.
   3. Avoid creating appenders & files when hook is inactive.
   4. Enable the hook on certain tests with partitions and update output.
   
   ### Why are the changes needed?
   For motivation see HIVE-26126
   
   ### Does this PR introduce _any_ user-facing change?
   No, the change only affects test.
   
   ### How was this patch tested?
   See changes in .q and .q.out files
   




Issue Time Tracking
---

Worklog Id: (was: 754499)
Remaining Estimate: 0h
Time Spent: 10m

> Allow capturing/validating SQL generated from HMS calls in qtests
> -
>
> Key: HIVE-26126
> URL: https://issues.apache.org/jira/browse/HIVE-26126
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During the compilation/execution of a Hive command there are usually calls in 
> the HiveMetastore (HMS). Most of the time these calls need to connect to the 
> underlying database backend in order to return the requested information so 
> they trigger the generation and execution of SQL queries. 
> We have a lot of code in Hive which affects the generation and execution of 
> these SQL queries and some vivid examples are the {{MetaStoreDirectSql}} and 
> {{CachedStore}} classes.
> [MetaStoreDirectSql|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java]
>  is responsible for building explicitly SQL queries for performance reasons. 
> [CachedStore|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java]
>  is responsible for caching certain requests to avoid going to the database 
> on every call. 
> Ensuring that the generated SQL is the expected one and/or that certain 
> queries are hitting (or not) the DB is valuable for catching regressions or 
> evaluating the effectiveness of caches.
> The idea is that for each Hive command/query in some qtest there is an option 
> to include in the output (.q.out) the list of SQL queries that were generated 
> by HMS calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=754498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754498
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 10:18
Start Date: 08/Apr/22 10:18
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r845966555


##
standalone-metastore/metastore-server/pom.xml:
##
@@ -474,23 +474,6 @@
   
 
   
-  
-generate-version-annotation
-generate-sources
-
-  
-
-  
-  
-  
-  
-
-  
-
-
-  run
-
-  

Review Comment:
   Moved the `MetastoreVersionInfo` as suggested.
   We do not use it without the `metastore-common`, so it was not causing any 
issues, but it is more logical this way





Issue Time Tracking
---

Worklog Id: (was: 754498)
Time Spent: 1h 50m  (was: 1h 40m)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at 

[jira] [Assigned] (HIVE-26126) Allow capturing/validating SQL generated from HMS calls in qtests

2022-04-08 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-26126:
--


> Allow capturing/validating SQL generated from HMS calls in qtests
> -
>
> Key: HIVE-26126
> URL: https://issues.apache.org/jira/browse/HIVE-26126
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> During the compilation/execution of a Hive command there are usually calls in 
> the HiveMetastore (HMS). Most of the time these calls need to connect to the 
> underlying database backend in order to return the requested information so 
> they trigger the generation and execution of SQL queries. 
> We have a lot of code in Hive which affects the generation and execution of 
> these SQL queries and some vivid examples are the {{MetaStoreDirectSql}} and 
> {{CachedStore}} classes.
> [MetaStoreDirectSql|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java]
>  is responsible for building explicitly SQL queries for performance reasons. 
> [CachedStore|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java]
>  is responsible for caching certain requests to avoid going to the database 
> on every call. 
> Ensuring that the generated SQL is the expected one and/or that certain 
> queries are hitting (or not) the DB is valuable for catching regressions or 
> evaluating the effectiveness of caches.
> The idea is that for each Hive command/query in some qtest there is an option 
> to include in the output (.q.out) the list of SQL queries that were generated 
> by HMS calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26074?focusedWorklogId=754487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754487
 ]

ASF GitHub Bot logged work on HIVE-26074:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 09:32
Start Date: 08/Apr/22 09:32
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r845927749


##
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java:
##
@@ -768,6 +774,9 @@ public static SingleValueBoundaryScanner 
getBoundaryScanner(BoundaryDef start, B
 case "string":
   return new StringPrimitiveValueBoundaryScanner(start, end, exprDef, 
nullsLast);
 default:
+  if (typeString.startsWith("char") || typeString.startsWith("varchar")) {

Review Comment:
   can't do that. the entries aren't fixed char or varchar, they are like 
char(10) or char(5) or varchar(5) or varchar(6) like that. So putting char or 
varchar in switch-case won't work





Issue Time Tracking
---

Worklog Id: (was: 754487)
Time Spent: 40m  (was: 0.5h)

> PTF Vectorization: BoundaryScanner for varchar
> --
>
> Key: HIVE-26074
> URL: https://issues.apache.org/jira/browse/HIVE-26074
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-24761 should be extended for varchar, otherwise it fails on varchar type
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> attempt to setup a Window for typeString: 'varchar(170)'
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner.  (ValueBoundaryScanner.java:1257)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26074?focusedWorklogId=754485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754485
 ]

ASF GitHub Bot logged work on HIVE-26074:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 09:30
Start Date: 08/Apr/22 09:30
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r845925509


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -508,6 +508,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "with ${hive.scratch.dir.permission}."),
 REPLDIR("hive.repl.rootdir","/user/${system:user.name}/repl/",
 "HDFS root dir for all replication dumps."),
+//HS2 IP2 DistCp hdfs://namenodePort:port/use/hive/w/table1 
Ip2:/port:...table1

Review Comment:
   this is not related to the patch I guess, maybe a leftover





Issue Time Tracking
---

Worklog Id: (was: 754485)
Time Spent: 0.5h  (was: 20m)

> PTF Vectorization: BoundaryScanner for varchar
> --
>
> Key: HIVE-26074
> URL: https://issues.apache.org/jira/browse/HIVE-26074
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-24761 should be extended for varchar, otherwise it fails on varchar type
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> attempt to setup a Window for typeString: 'varchar(170)'
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner.  (ValueBoundaryScanner.java:1257)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26074?focusedWorklogId=754479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754479
 ]

ASF GitHub Bot logged work on HIVE-26074:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 09:27
Start Date: 08/Apr/22 09:27
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r845922550


##
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java:
##
@@ -768,6 +774,9 @@ public static SingleValueBoundaryScanner 
getBoundaryScanner(BoundaryDef start, B
 case "string":
   return new StringPrimitiveValueBoundaryScanner(start, end, exprDef, 
nullsLast);
 default:
+  if (typeString.startsWith("char") || typeString.startsWith("varchar")) {

Review Comment:
   putting this into default looks strange to me, why not handle similarly to 
decimal as above





Issue Time Tracking
---

Worklog Id: (was: 754479)
Time Spent: 20m  (was: 10m)

> PTF Vectorization: BoundaryScanner for varchar
> --
>
> Key: HIVE-26074
> URL: https://issues.apache.org/jira/browse/HIVE-26074
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-24761 should be extended for varchar, otherwise it fails on varchar type
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> attempt to setup a Window for typeString: 'varchar(170)'
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner.  (ValueBoundaryScanner.java:1257)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=754473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754473
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 09:18
Start Date: 08/Apr/22 09:18
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r845914266


##
standalone-metastore/pom.xml:
##
@@ -531,6 +531,29 @@
 
   
 
+
+  javadoc
+  
+
+  
+org.apache.maven.plugins
+maven-javadoc-plugin

Review Comment:
   I think we should also fix the version of the `maven-javadoc-plugin` 
globally to avoid build warnings and things being broken when a newer version 
appears.





Issue Time Tracking
---

Worklog Id: (was: 754473)
Time Spent: 1h 40m  (was: 1.5h)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> 

[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=754471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754471
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 09:15
Start Date: 08/Apr/22 09:15
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r845912230


##
standalone-metastore/metastore-server/pom.xml:
##
@@ -474,23 +474,6 @@
   
 
   
-  
-generate-version-annotation
-generate-sources
-
-  
-
-  
-  
-  
-  
-
-  
-
-
-  run
-
-  

Review Comment:
   I looked a bit more on the history of things and my understanding is that 
this `saveVersion.sh` script along with the generated package information are 
necessary so that `org.apache.hadoop.hive.metastore.utils.MetastoreVersionInfo` 
runs correctly. 
   
   From what I can see the 
`org.apache.hadoop.hive.metastore.utils.MetastoreVersionInfo` class is in 
metastore-server module and you chose to remove the generated package info from 
this module. Did you verify if it still runs correctly?
   
   I was expecting that 
`org.apache.hadoop.hive.metastore.utils.MetastoreVersionInfo`, 
`saveVersion.sh`, etc. should all be in the same module and `metastore-common` 
seems more appropriate.





Issue Time Tracking
---

Worklog Id: (was: 754471)
Time Spent: 1.5h  (was: 1h 20m)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754446
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 07:41
Start Date: 08/Apr/22 07:41
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r845830328


##
iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q:
##
@@ -0,0 +1,10 @@
+set hive.vectorized.execution.enabled=true;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

Review Comment:
   Tests passed. I've moved the txnhandler validation logic into 
UpdateDeleteSemanticAnalyzer, where we already have the table object in hand





Issue Time Tracking
---

Worklog Id: (was: 754446)
Time Spent: 13h  (was: 12h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)