[jira] [Work logged] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted
[ https://issues.apache.org/jira/browse/HIVE-26127?focusedWorklogId=754884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754884 ] ASF GitHub Bot logged work on HIVE-26127: - Author: ASF GitHub Bot Created on: 09/Apr/22 03:31 Start Date: 09/Apr/22 03:31 Worklog Time Spent: 10m Work Description: hsnusonic opened a new pull request, #3198: URL: https://github.com/apache/hive/pull/3198 …tition is deleted ### What changes were proposed in this pull request? Catch FileNotFoundException when a directory is cleaned up for insert overwrite. ### Why are the changes needed? For external tables, any partition could be deleted out of Hive's control. Insert overwrite should not fail because a partition directory is removed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert_overwrite.q Issue Time Tracking --- Worklog Id: (was: 754884) Remaining Estimate: 0h Time Spent: 10m > Insert overwrite throws FileNotFound when destination partition is deleted > --- > > Key: HIVE-26127 > URL: https://issues.apache.org/jira/browse/HIVE-26127 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce: > # create external table src (col int) partitioned by (year int); > # create external table dest (col int) partitioned by (year int); > # insert into src partition (year=2022) values (1); > # insert into dest partition (year=2022) values (2); > # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022 > # insert overwrite table dest select * from src; > We will get FileNotFoundException as below. > {code:java} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory > file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1 > could not be cleaned up. > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387) > at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657) > at > org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > It is because it call listStatus on a path doesn't exist. We should not fail > insert overwrite because there is nothing to be clean up. > {code:java} > fs.listStatus(path, pathFilter){code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted
[ https://issues.apache.org/jira/browse/HIVE-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26127: -- Labels: pull-request-available (was: ) > Insert overwrite throws FileNotFound when destination partition is deleted > --- > > Key: HIVE-26127 > URL: https://issues.apache.org/jira/browse/HIVE-26127 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce: > # create external table src (col int) partitioned by (year int); > # create external table dest (col int) partitioned by (year int); > # insert into src partition (year=2022) values (1); > # insert into dest partition (year=2022) values (2); > # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022 > # insert overwrite table dest select * from src; > We will get FileNotFoundException as below. > {code:java} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory > file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1 > could not be cleaned up. > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387) > at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657) > at > org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > It is because it call listStatus on a path doesn't exist. We should not fail > insert overwrite because there is nothing to be clean up. > {code:java} > fs.listStatus(path, pathFilter){code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted
[ https://issues.apache.org/jira/browse/HIVE-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Wen Lai updated HIVE-26127: -- Description: Steps to reproduce: # create external table src (col int) partitioned by (year int); # create external table dest (col int) partitioned by (year int); # insert into src partition (year=2022) values (1); # insert into dest partition (year=2022) values (2); # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022 # insert overwrite table dest select * from src; We will get FileNotFoundException as below. {code:java} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1 could not be cleaned up. at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657) at org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} It is because it call listStatus on a path doesn't exist. We should not fail insert overwrite because there is nothing to be clean up. {code:java} fs.listStatus(path, pathFilter){code} was: Steps to reproduce: # create external table src (col int) partitioned by (year int); # create external table dest (col int) partitioned by (year int); # insert into src partition (year=2022) values (1); # insert into dest partition (year=2022) values (2); # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022 # insert overwrite table dest select * from src; We will get FileNotFoundException when it tries to call {code:java} fs.listStatus(path, pathFilter){code} We should not fail insert overwrite because there is nothing to be clean up. > Insert overwrite throws FileNotFound when destination partition is deleted > --- > > Key: HIVE-26127 > URL: https://issues.apache.org/jira/browse/HIVE-26127 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > > Steps to reproduce: > # create external table src (col int) partitioned by (year int); > # create external table dest (col int) partitioned by (year int); > # insert into src partition (year=2022) values (1); > # insert into dest partition (year=2022) values (2); > # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022 > # insert overwrite table dest select * from src; > We will get FileNotFoundException as below. > {code:java} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory > file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1 > could not be cleaned up. > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387) > at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657) > at > org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > It is because it call listStatus on a path doesn't exist. We should not fail > insert overwrite because there is nothing to be clean up. > {code:java} > fs.listStatus(path, pathFilter){code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted
[ https://issues.apache.org/jira/browse/HIVE-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Wen Lai reassigned HIVE-26127: - > Insert overwrite throws FileNotFound when destination partition is deleted > --- > > Key: HIVE-26127 > URL: https://issues.apache.org/jira/browse/HIVE-26127 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > > Steps to reproduce: > # create external table src (col int) partitioned by (year int); > # create external table dest (col int) partitioned by (year int); > # insert into src partition (year=2022) values (1); > # insert into dest partition (year=2022) values (2); > # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022 > # insert overwrite table dest select * from src; > We will get FileNotFoundException when it tries to call > {code:java} > fs.listStatus(path, pathFilter){code} > We should not fail insert overwrite because there is nothing to be clean up. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26096) Select on single column MultiDelimitSerDe table throws AIOBE
[ https://issues.apache.org/jira/browse/HIVE-26096?focusedWorklogId=754863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754863 ] ASF GitHub Bot logged work on HIVE-26096: - Author: ASF GitHub Bot Created on: 09/Apr/22 00:22 Start Date: 09/Apr/22 00:22 Worklog Time Spent: 10m Work Description: ramesh0201 merged PR #3158: URL: https://github.com/apache/hive/pull/3158 Issue Time Tracking --- Worklog Id: (was: 754863) Time Spent: 0.5h (was: 20m) > Select on single column MultiDelimitSerDe table throws AIOBE > > > Key: HIVE-26096 > URL: https://issues.apache.org/jira/browse/HIVE-26096 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Repro details > > {code:java} > create table test_multidelim(col string) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe' > with serdeproperties('field.delim'='!^') STORED AS TEXTFILE; > insert into test_multidelim values('aa'),('bb'),('cc'),('dd'); > select * from test_multidelim; > {code} > Exception: > {code:java} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.serde2.lazy.LazyStruct.parseMultiDelimit(LazyStruct.java:303) > at > org.apache.hadoop.hive.serde2.MultiDelimitSerDe.doDeserialize(MultiDelimitSerDe.java:160) > at > org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.deserialize(AbstractEncodingAwareSerDe.java:74) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:603){code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25840) Prevent duplicate paths in the fileList while adding an entry to NotifcationLog
[ https://issues.apache.org/jira/browse/HIVE-25840?focusedWorklogId=754860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754860 ] ASF GitHub Bot logged work on HIVE-25840: - Author: ASF GitHub Bot Created on: 09/Apr/22 00:18 Start Date: 09/Apr/22 00:18 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2913: HIVE-25840: Prevent duplicate paths in the fileList while adding an e… URL: https://github.com/apache/hive/pull/2913 Issue Time Tracking --- Worklog Id: (was: 754860) Time Spent: 0.5h (was: 20m) > Prevent duplicate paths in the fileList while adding an entry to > NotifcationLog > --- > > Key: HIVE-25840 > URL: https://issues.apache.org/jira/browse/HIVE-25840 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > As of now, while adding entries to notification logs, in case of retries, > sometimes the same path gets added to the notification log entry, which > during replication leads to failures during copy. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25827) Parquet file footer is read multiple times, when multiple splits are created in same file
[ https://issues.apache.org/jira/browse/HIVE-25827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519777#comment-17519777 ] Steve Loughran commented on HIVE-25827: --- is this per input stream, or are separate streams opened to read it if its the same opened file, HADOOP-18028 will mitigate this on s3 > Parquet file footer is read multiple times, when multiple splits are created > in same file > - > > Key: HIVE-25827 > URL: https://issues.apache.org/jira/browse/HIVE-25827 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > Attachments: image-2021-12-21-03-19-38-577.png > > > With large files, it is possible that multiple splits are created in the same > file. With current codebase, "ParquetRecordReaderBase" ends up reading file > footer for each split. > It can be optimized not to read footer information multiple times for the > same file. > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java#L160] > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L91] > > > !image-2021-12-21-03-19-38-577.png|width=1363,height=1256! > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP
[ https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754689 ] ASF GitHub Bot logged work on HIVE-21456: - Author: ASF GitHub Bot Created on: 08/Apr/22 16:12 Start Date: 08/Apr/22 16:12 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #3105: URL: https://github.com/apache/hive/pull/3105#discussion_r846286761 ## standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestRemoteHiveHttpMetaStore.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore; + +import org.apache.hadoop.hive.metastore.annotation.MetastoreUnitTest; +import org.junit.experimental.categories.Category; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars; + +@Category(MetastoreCheckinTest.class) +public class TestRemoteHiveHttpMetaStore extends TestRemoteHiveMetaStore { + + private static final Logger LOG = LoggerFactory.getLogger(TestRemoteHiveHttpMetaStore.class); + + @Override + public void start() throws Exception { +MetastoreConf.setVar(conf, ConfVars.THRIFT_TRANSPORT_MODE, "http"); +LOG.info("Attempting to start test remote metastore in http mode"); +super.start(); +LOG.info("Successfully started test remote metastore in http mode"); + } + + @Override + protected HiveMetaStoreClient createClient() throws Exception { +MetastoreConf.setVar(conf, ConfVars.METASTORE_CLIENT_THRIFT_TRANSPORT_MODE, "http"); +return super.createClient(); + } +} Review Comment: Nit: Add a new line at the end of the file. Issue Time Tracking --- Worklog Id: (was: 754689) Time Spent: 3h 50m (was: 3h 40m) > Hive Metastore Thrift over HTTP > --- > > Key: HIVE-21456 > URL: https://issues.apache.org/jira/browse/HIVE-21456 > Project: Hive > Issue Type: New Feature > Components: Metastore, Standalone Metastore >Reporter: Amit Khanna >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, > HIVE-21456.4.patch, HIVE-21456.patch > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Hive Metastore currently doesn't have support for HTTP transport because of > which it is not possible to access it via Knox. Adding support for Thrift > over HTTP transport will allow the clients to access via Knox -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP
[ https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754691 ] ASF GitHub Bot logged work on HIVE-21456: - Author: ASF GitHub Bot Created on: 08/Apr/22 16:12 Start Date: 08/Apr/22 16:12 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #3105: URL: https://github.com/apache/hive/pull/3105#discussion_r846287153 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HmsThriftHttpServlet.java: ## @@ -0,0 +1,116 @@ +/* * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.metastore; + +import java.io.IOException; +import java.security.PrivilegedExceptionAction; +import java.util.Enumeration; + +import javax.servlet.ServletException; +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.thrift.TProcessor; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.server.TServlet; + +public class HmsThriftHttpServlet extends TServlet { + + private static final Logger LOG = LoggerFactory + .getLogger(HmsThriftHttpServlet.class); + + private static final String X_USER = MetaStoreUtils.USER_NAME_HTTP_HEADER; + + private final boolean isSecurityEnabled; + + public HmsThriftHttpServlet(TProcessor processor, + TProtocolFactory inProtocolFactory, TProtocolFactory outProtocolFactory) { +super(processor, inProtocolFactory, outProtocolFactory); +// This should ideally be reveiving an instance of the Configuration which is used for the check +isSecurityEnabled = UserGroupInformation.isSecurityEnabled(); + } + + public HmsThriftHttpServlet(TProcessor processor, + TProtocolFactory protocolFactory) { +super(processor, protocolFactory); +isSecurityEnabled = UserGroupInformation.isSecurityEnabled(); + } + + @Override + protected void doPost(HttpServletRequest request, + HttpServletResponse response) throws ServletException, IOException { + +Enumeration headerNames = request.getHeaderNames(); +if (LOG.isDebugEnabled()) { + LOG.debug("Logging headers in request"); + while (headerNames.hasMoreElements()) { +String headerName = headerNames.nextElement(); +LOG.debug("Header: [{}], Value: [{}]", headerName, +request.getHeader(headerName)); + } +} +String userFromHeader = request.getHeader(X_USER); +if (userFromHeader == null || userFromHeader.isEmpty()) { + LOG.error("No user header: {} found", X_USER); + response.sendError(HttpServletResponse.SC_FORBIDDEN, + "User Header missing"); + return; +} + +// TODO: These should ideally be in some kind of a Cache with Weak referencse. +// If HMS were to set up some kind of a session, this would go into the session by having +// this filter work with a custom Processor / or set the username into the session +// as is done for HS2. +// In case of HMS, it looks like each request is independent, and there is no session +// information, so the UGI needs to be set up in the Connection layer itself. +UserGroupInformation clientUgi; +// Temporary, and useless for now. Here only to allow this to work on an otherwise kerberized +// server. +if (isSecurityEnabled) { + LOG.info("Creating proxy user for: {}", userFromHeader); + clientUgi = UserGroupInformation.createProxyUser(userFromHeader, UserGroupInformation.getLoginUser()); +} else { + LOG.info("Creating remote user for: {}", userFromHeader); + clientUgi = UserGroupInformation.createRemoteUser(userFromHeader); +} + + +PrivilegedExceptionAction action = new PrivilegedExceptionAction() { + @Override + public Void run() throws Exception { +HmsThriftHttpServlet.super.doPost(request, response); +
[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP
[ https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754688 ] ASF GitHub Bot logged work on HIVE-21456: - Author: ASF GitHub Bot Created on: 08/Apr/22 16:11 Start Date: 08/Apr/22 16:11 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #3105: URL: https://github.com/apache/hive/pull/3105#discussion_r846286006 ## standalone-metastore/pom.xml: ## @@ -361,6 +362,12 @@ runtime true + Hive Metastore Thrift over HTTP > --- > > Key: HIVE-21456 > URL: https://issues.apache.org/jira/browse/HIVE-21456 > Project: Hive > Issue Type: New Feature > Components: Metastore, Standalone Metastore >Reporter: Amit Khanna >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, > HIVE-21456.4.patch, HIVE-21456.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Hive Metastore currently doesn't have support for HTTP transport because of > which it is not possible to access it via Knox. Adding support for Thrift > over HTTP transport will allow the clients to access via Knox -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP
[ https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754681 ] ASF GitHub Bot logged work on HIVE-21456: - Author: ASF GitHub Bot Created on: 08/Apr/22 16:05 Start Date: 08/Apr/22 16:05 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #3105: URL: https://github.com/apache/hive/pull/3105#discussion_r846281556 ## itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestSSL.java: ## @@ -437,15 +439,36 @@ public void testConnectionWrongCertCN() throws Exception { * Test HMS server with SSL * @throws Exception */ + @Ignore @Test public void testMetastoreWithSSL() throws Exception { testSSLHMS(true); } + /** + * Test HMS server with Http + SSL + * @throws Exception + */ + @Test + public void testMetastoreWithHttps() throws Exception { +// MetastoreConf.setBoolVar(conf, MetastoreConf.ConfVars.EVENT_DB_NOTIFICATION_API_AUTH, false); +//MetastoreConf.setVar(conf, MetastoreConf.ConfVars.METASTORE_CLIENT_TRANSPORT_MODE, "http"); +SSLTestUtils.setMetastoreHttpsConf(conf); +MetastoreConf.setVar(conf, MetastoreConf.ConfVars.SSL_TRUSTMANAGERFACTORY_ALGORITHM, +KEY_MANAGER_FACTORY_ALGORITHM); +MetastoreConf.setVar(conf, MetastoreConf.ConfVars.SSL_TRUSTSTORE_TYPE, KEY_STORE_TRUST_STORE_TYPE); +MetastoreConf.setVar(conf, MetastoreConf.ConfVars.SSL_KEYSTORE_TYPE, KEY_STORE_TRUST_STORE_TYPE); +MetastoreConf.setVar(conf, MetastoreConf.ConfVars.SSL_KEYMANAGERFACTORY_ALGORITHM, +KEY_MANAGER_FACTORY_ALGORITHM); + +testSSLHMS(false); Review Comment: Why are we passing false here? This value is used in testSSLHMS()#L459-461 to set the keystore for HMS and HS2. You are already setting this for HMS in L461 here and we don't need to set for HS2. So why don't we just pass the value true? Issue Time Tracking --- Worklog Id: (was: 754681) Time Spent: 3.5h (was: 3h 20m) > Hive Metastore Thrift over HTTP > --- > > Key: HIVE-21456 > URL: https://issues.apache.org/jira/browse/HIVE-21456 > Project: Hive > Issue Type: New Feature > Components: Metastore, Standalone Metastore >Reporter: Amit Khanna >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, > HIVE-21456.4.patch, HIVE-21456.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > Hive Metastore currently doesn't have support for HTTP transport because of > which it is not possible to access it via Knox. Adding support for Thrift > over HTTP transport will allow the clients to access via Knox -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754672 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 15:45 Start Date: 08/Apr/22 15:45 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846262788 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.TimeUnit; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1); +DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final long RECORD_CACHE_EXPIRE_MILLI = 10 * 60 * 1000; + private static final long RECORD_CACHE_MAX_SIZE = 1000; + private static final Cache RECORD_CACHE = Caffeine.newBuilder() + .expireAfterAccess(RECORD_CACHE_EXPIRE_MILLI, TimeUnit.MILLISECONDS) + .maximumSize(RECORD_CACHE_MAX_SIZE) + .build(); + + /** + * @param dataCols The columns of the original file read schema + * @param table The table object - it is used for populating the partition struct meta column + * @return The schema for reading files, extended with metadata columns needed for deletes + */ + public static Schema createFileReadSchemaForDelete(List dataCols, Table table) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_FILE_READ_META_COLS.size()); +DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> { + if (metaCol == PARTITION_STRUCT_META_COL) { +cols.add(MetadataColumns.metadataColumn(table, MetadataColumns.PARTITION_COLUMN_NAME)); + } else { +cols.add(metaCol); + } +}); +cols.addAll(dataCols); +return new Schema(cols); + } + + /** + * @param dataCols The columns of the serde projection schema + * @return The schema for SerDe operations, extended with metadata columns needed for deletes + */ + public static Schema createSerdeSchemaForDelete(List dataCols) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size()); +DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol)); +cols.addAll(dataCols); +return new Schema(cols); + } + + public static PositionDelete
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754669 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 15:33 Start Date: 08/Apr/22 15:33 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846252270 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.TimeUnit; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1); +DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final long RECORD_CACHE_EXPIRE_MILLI = 10 * 60 * 1000; + private static final long RECORD_CACHE_MAX_SIZE = 1000; + private static final Cache RECORD_CACHE = Caffeine.newBuilder() + .expireAfterAccess(RECORD_CACHE_EXPIRE_MILLI, TimeUnit.MILLISECONDS) + .maximumSize(RECORD_CACHE_MAX_SIZE) + .build(); + + /** + * @param dataCols The columns of the original file read schema + * @param table The table object - it is used for populating the partition struct meta column + * @return The schema for reading files, extended with metadata columns needed for deletes + */ + public static Schema createFileReadSchemaForDelete(List dataCols, Table table) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_FILE_READ_META_COLS.size()); +DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> { + if (metaCol == PARTITION_STRUCT_META_COL) { +cols.add(MetadataColumns.metadataColumn(table, MetadataColumns.PARTITION_COLUMN_NAME)); + } else { +cols.add(metaCol); + } +}); +cols.addAll(dataCols); +return new Schema(cols); + } + + /** + * @param dataCols The columns of the serde projection schema + * @return The schema for SerDe operations, extended with metadata columns needed for deletes + */ + public static Schema createSerdeSchemaForDelete(List dataCols) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size()); +DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol)); +cols.addAll(dataCols); +return new Schema(cols); + } + + public static PositionDelete
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754665 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 15:30 Start Date: 08/Apr/22 15:30 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846249301 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.TimeUnit; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1); +DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final long RECORD_CACHE_EXPIRE_MILLI = 10 * 60 * 1000; + private static final long RECORD_CACHE_MAX_SIZE = 1000; + private static final Cache RECORD_CACHE = Caffeine.newBuilder() + .expireAfterAccess(RECORD_CACHE_EXPIRE_MILLI, TimeUnit.MILLISECONDS) + .maximumSize(RECORD_CACHE_MAX_SIZE) + .build(); + + /** + * @param dataCols The columns of the original file read schema + * @param table The table object - it is used for populating the partition struct meta column + * @return The schema for reading files, extended with metadata columns needed for deletes + */ + public static Schema createFileReadSchemaForDelete(List dataCols, Table table) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_FILE_READ_META_COLS.size()); +DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> { + if (metaCol == PARTITION_STRUCT_META_COL) { +cols.add(MetadataColumns.metadataColumn(table, MetadataColumns.PARTITION_COLUMN_NAME)); + } else { +cols.add(metaCol); + } +}); +cols.addAll(dataCols); +return new Schema(cols); + } + + /** + * @param dataCols The columns of the serde projection schema + * @return The schema for SerDe operations, extended with metadata columns needed for deletes + */ + public static Schema createSerdeSchemaForDelete(List dataCols) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size()); +DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol)); +cols.addAll(dataCols); +return new Schema(cols); + } + + public static PositionDelete
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754662 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 15:30 Start Date: 08/Apr/22 15:30 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r84624 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.TimeUnit; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1); +DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final long RECORD_CACHE_EXPIRE_MILLI = 10 * 60 * 1000; + private static final long RECORD_CACHE_MAX_SIZE = 1000; + private static final Cache RECORD_CACHE = Caffeine.newBuilder() + .expireAfterAccess(RECORD_CACHE_EXPIRE_MILLI, TimeUnit.MILLISECONDS) + .maximumSize(RECORD_CACHE_MAX_SIZE) + .build(); + + /** + * @param dataCols The columns of the original file read schema + * @param table The table object - it is used for populating the partition struct meta column + * @return The schema for reading files, extended with metadata columns needed for deletes + */ + public static Schema createFileReadSchemaForDelete(List dataCols, Table table) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_FILE_READ_META_COLS.size()); +DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> { + if (metaCol == PARTITION_STRUCT_META_COL) { +cols.add(MetadataColumns.metadataColumn(table, MetadataColumns.PARTITION_COLUMN_NAME)); + } else { +cols.add(metaCol); + } +}); +cols.addAll(dataCols); +return new Schema(cols); + } + + /** + * @param dataCols The columns of the serde projection schema + * @return The schema for SerDe operations, extended with metadata columns needed for deletes + */ + public static Schema createSerdeSchemaForDelete(List dataCols) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size()); +DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol)); +cols.addAll(dataCols); +return new Schema(cols); + } + + public static PositionDelete
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754627 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 14:07 Start Date: 08/Apr/22 14:07 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846148875 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1); +DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + /** + * @param dataCols The columns of the original file read schema + * @param table The table object - it is used for populating the partition struct meta column + * @return The schema for reading files, extended with metadata columns needed for deletes + */ + public static Schema createFileReadSchemaForDelete(List dataCols, Table table) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_FILE_READ_META_COLS.size()); +DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> { + if (metaCol == PARTITION_STRUCT_META_COL) { +cols.add(MetadataColumns.metadataColumn(table, MetadataColumns.PARTITION_COLUMN_NAME)); + } else { +cols.add(metaCol); + } +}); +cols.addAll(dataCols); +return new Schema(cols); + } + + /** + * @param dataCols The columns of the serde projection schema + * @return The schema for SerDe operations, extended with metadata columns needed for deletes + */ + public static Schema createSerdeSchemaForDelete(List dataCols) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size()); +DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol)); +cols.addAll(dataCols); +return new Schema(cols); + } + + public static PositionDelete getPositionDelete(Schema schema, Record rec) { +PositionDelete positionDelete = PositionDelete.create(); +String filePath = rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class); +long filePosition = rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class); + +int dataOffset = DELETE_SERDE_META_COLS.size(); // position in the rec where the actual row data begins +Record rowData = GenericRecord.create(schema);
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754621 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 13:55 Start Date: 08/Apr/22 13:55 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846137472 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1); +DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + /** + * @param dataCols The columns of the original file read schema + * @param table The table object - it is used for populating the partition struct meta column + * @return The schema for reading files, extended with metadata columns needed for deletes + */ + public static Schema createFileReadSchemaForDelete(List dataCols, Table table) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_FILE_READ_META_COLS.size()); +DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> { + if (metaCol == PARTITION_STRUCT_META_COL) { +cols.add(MetadataColumns.metadataColumn(table, MetadataColumns.PARTITION_COLUMN_NAME)); + } else { +cols.add(metaCol); + } +}); +cols.addAll(dataCols); +return new Schema(cols); + } + + /** + * @param dataCols The columns of the serde projection schema + * @return The schema for SerDe operations, extended with metadata columns needed for deletes + */ + public static Schema createSerdeSchemaForDelete(List dataCols) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size()); +DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol)); +cols.addAll(dataCols); +return new Schema(cols); + } + + public static PositionDelete getPositionDelete(Schema schema, Record rec) { +PositionDelete positionDelete = PositionDelete.create(); +String filePath = rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class); +long filePosition = rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class); + +int dataOffset = DELETE_SERDE_META_COLS.size(); // position in the rec where the actual row data begins +Record rowData = GenericRecord.create(schema);
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754608 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 13:47 Start Date: 08/Apr/22 13:47 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846128892 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergWriter.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.IOException; +import java.util.Map; +import org.apache.hadoop.hive.ql.exec.FileSinkOperator; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.Reporter; +import org.apache.hadoop.mapred.TaskAttemptID; +import org.apache.iceberg.PartitionKey; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.InternalRecordWrapper; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.PartitioningWriter; +import org.apache.iceberg.mr.mapred.Container; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.util.Tasks; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@SuppressWarnings("checkstyle:VisibilityModifier") +public abstract class HiveIcebergWriter implements FileSinkOperator.RecordWriter, +org.apache.hadoop.mapred.RecordWriter> { + private static final Logger LOG = LoggerFactory.getLogger(HiveIcebergWriter.class); Review Comment: I've removed it from RecordWriter and the DeleteWriter Issue Time Tracking --- Worklog Id: (was: 754608) Time Spent: 14h 20m (was: 14h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 14h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754607 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 13:45 Start Date: 08/Apr/22 13:45 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846127594 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1); +DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + /** + * @param dataCols The columns of the original file read schema + * @param table The table object - it is used for populating the partition struct meta column + * @return The schema for reading files, extended with metadata columns needed for deletes + */ + public static Schema createFileReadSchemaForDelete(List dataCols, Table table) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_FILE_READ_META_COLS.size()); +DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> { + if (metaCol == PARTITION_STRUCT_META_COL) { +cols.add(MetadataColumns.metadataColumn(table, MetadataColumns.PARTITION_COLUMN_NAME)); + } else { +cols.add(metaCol); + } +}); +cols.addAll(dataCols); +return new Schema(cols); + } + + /** + * @param dataCols The columns of the serde projection schema + * @return The schema for SerDe operations, extended with metadata columns needed for deletes + */ + public static Schema createSerdeSchemaForDelete(List dataCols) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size()); +DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol)); +cols.addAll(dataCols); +return new Schema(cols); + } + + public static PositionDelete getPositionDelete(Schema schema, Record rec) { +PositionDelete positionDelete = PositionDelete.create(); +String filePath = rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class); +long filePosition = rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class); + +int dataOffset = DELETE_SERDE_META_COLS.size(); // position in the rec where the actual row data begins +Record rowData = GenericRecord.create(schema);
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754606 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 13:43 Start Date: 08/Apr/22 13:43 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846125881 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergWriter.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.IOException; +import java.util.Map; +import org.apache.hadoop.hive.ql.exec.FileSinkOperator; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.Reporter; +import org.apache.hadoop.mapred.TaskAttemptID; +import org.apache.iceberg.PartitionKey; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.InternalRecordWrapper; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.PartitioningWriter; +import org.apache.iceberg.mr.mapred.Container; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.util.Tasks; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@SuppressWarnings("checkstyle:VisibilityModifier") +public abstract class HiveIcebergWriter implements FileSinkOperator.RecordWriter, +org.apache.hadoop.mapred.RecordWriter> { + private static final Logger LOG = LoggerFactory.getLogger(HiveIcebergWriter.class); Review Comment: Not used anymore Issue Time Tracking --- Worklog Id: (was: 754606) Time Spent: 14h (was: 13h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 14h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754599 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 13:39 Start Date: 08/Apr/22 13:39 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846121847 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1); +DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + /** + * @param dataCols The columns of the original file read schema + * @param table The table object - it is used for populating the partition struct meta column + * @return The schema for reading files, extended with metadata columns needed for deletes + */ + public static Schema createFileReadSchemaForDelete(List dataCols, Table table) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_FILE_READ_META_COLS.size()); +DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> { + if (metaCol == PARTITION_STRUCT_META_COL) { +cols.add(MetadataColumns.metadataColumn(table, MetadataColumns.PARTITION_COLUMN_NAME)); + } else { +cols.add(metaCol); + } +}); +cols.addAll(dataCols); +return new Schema(cols); + } + + /** + * @param dataCols The columns of the serde projection schema + * @return The schema for SerDe operations, extended with metadata columns needed for deletes + */ + public static Schema createSerdeSchemaForDelete(List dataCols) { +List cols = Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size()); +DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol)); +cols.addAll(dataCols); +return new Schema(cols); + } + + public static PositionDelete getPositionDelete(Schema schema, Record rec) { +PositionDelete positionDelete = PositionDelete.create(); +String filePath = rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class); +long filePosition = rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class); + +int dataOffset = DELETE_SERDE_META_COLS.size(); // position in the rec where the actual row data begins +Record rowData = GenericRecord.create(schema); Review
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754571 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 13:00 Start Date: 08/Apr/22 13:00 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846087296 ## ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java: ## @@ -50,10 +50,14 @@ RAWDATASIZE("RAW__DATA__SIZE", TypeInfoFactory.longTypeInfo), /** - * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} + * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} */ ROWID("ROW__ID", RecordIdentifier.StructInfo.typeInfo, true, RecordIdentifier.StructInfo.oi), ROWISDELETED("ROW__IS__DELETED", TypeInfoFactory.booleanTypeInfo), + PARTITION_SPEC_ID("PARTITION__SPEC__ID", TypeInfoFactory.intTypeInfo), + PARTITION_HASH("PARTITION__HASH", TypeInfoFactory.longTypeInfo), + FILE_PATH("FILE__PATH", TypeInfoFactory.stringTypeInfo), + ROW_POSITION("ROW__POSITION", TypeInfoFactory.longTypeInfo), Review Comment: The row number is obtained in the `OrcRecordUpdater`: ``` ByteBuffer val = reader.getMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME) ``` then the stripe metadata is deserialized and this contains the row number Issue Time Tracking --- Worklog Id: (was: 754571) Time Spent: 13h 40m (was: 13.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 13h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754565 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 12:48 Start Date: 08/Apr/22 12:48 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846077086 ## ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java: ## @@ -50,10 +50,14 @@ RAWDATASIZE("RAW__DATA__SIZE", TypeInfoFactory.longTypeInfo), /** - * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} + * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} */ ROWID("ROW__ID", RecordIdentifier.StructInfo.typeInfo, true, RecordIdentifier.StructInfo.oi), ROWISDELETED("ROW__IS__DELETED", TypeInfoFactory.booleanTypeInfo), + PARTITION_SPEC_ID("PARTITION__SPEC__ID", TypeInfoFactory.intTypeInfo), + PARTITION_HASH("PARTITION__HASH", TypeInfoFactory.longTypeInfo), + FILE_PATH("FILE__PATH", TypeInfoFactory.stringTypeInfo), Review Comment: Unfortunately no, the `INPUT__FILE__NAME` seems to be used differently and `ctx.getCurrentInputPath()` (where its value comes from) only contains the root location, not the full file path: ``` file:/Users/martonbod/Repos/Hive/Fork/hive/iceberg/iceberg-handler/target/tmp/hive5438133120236603811/external/customers ``` Issue Time Tracking --- Worklog Id: (was: 754565) Time Spent: 13.5h (was: 13h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 13.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754558 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 12:34 Start Date: 08/Apr/22 12:34 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r846065462 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputFormat.java: ## @@ -83,9 +83,20 @@ private static HiveIcebergRecordWriter writer(JobConf jc) { .operationId(operationId) .build(); String tableName = jc.get(Catalogs.NAME); -HiveFileWriterFactory hfwf = new HiveFileWriterFactory(table, fileFormat, schema, -null, fileFormat, null, null, null, null); -return new HiveIcebergRecordWriter(schema, spec, fileFormat, -hfwf, outputFileFactory, io, targetFileSize, taskAttemptID, tableName); +HiveFileWriterFactory writerFactory = new HiveFileWriterFactory(table, fileFormat, schema, null, fileFormat, +null, null, null, getPositionDeleteRowSchema(schema, fileFormat)); +if (HiveIcebergStorageHandler.isDelete(jc, tableName)) { + return new HiveIcebergDeleteWriter(schema, spec, fileFormat, writerFactory, outputFileFactory, io, targetFileSize, + taskAttemptID, tableName); +} else { + return new HiveIcebergRecordWriter(schema, spec, fileFormat, writerFactory, outputFileFactory, io, targetFileSize, + taskAttemptID, tableName); +} + } + + private static Schema getPositionDeleteRowSchema(Schema schema, FileFormat fileFormat) { +// TODO: remove this Avro-specific logic once we have Avro writer function ready Review Comment: Actually we no longer need an exception for Avro, we can use the `DataWriter` class from the iceberg-avro module Issue Time Tracking --- Worklog Id: (was: 754558) Time Spent: 13h 20m (was: 13h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 13h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25492) Major query-based compaction is skipped if partition is empty
[ https://issues.apache.org/jira/browse/HIVE-25492?focusedWorklogId=754540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754540 ] ASF GitHub Bot logged work on HIVE-25492: - Author: ASF GitHub Bot Created on: 08/Apr/22 11:43 Start Date: 08/Apr/22 11:43 Worklog Time Spent: 10m Work Description: deniskuzZ commented on PR #3157: URL: https://github.com/apache/hive/pull/3157#issuecomment-1092774567 pending tests Issue Time Tracking --- Worklog Id: (was: 754540) Time Spent: 2h 10m (was: 2h) > Major query-based compaction is skipped if partition is empty > - > > Key: HIVE-25492 > URL: https://issues.apache.org/jira/browse/HIVE-25492 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently if the result of query-based compaction is an empty base, delta, or > delete delta, the empty directory is deleted. > This is because of minor compaction – if there are only deltas to compact, > then no compacted delete delta should be created (only a compacted delta). In > the same way, if there are only delete deltas to compact, then no compacted > delta should be created (only a compacted delete delta). > There is an issue with major compaction. If all the data in the partition has > been deleted, then we should get an empty base directory after compaction. > Instead, the empty base directory is deleted because it's empty and > compaction claims to succeed but we end up with the same deltas/delete deltas > we started with – basically compaction does not run. > Where to start? MajorQueryCompactor#commitCompaction -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=754531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754531 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 08/Apr/22 11:28 Start Date: 08/Apr/22 11:28 Worklog Time Spent: 10m Work Description: zhengchenyu commented on PR #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-1092763873 @zabetak Sorry for miss this issue long time, Can you help me reopen this PR? Issue Time Tracking --- Worklog Id: (was: 754531) Time Spent: 2h 20m (was: 2h 10m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=754525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754525 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 08/Apr/22 11:22 Start Date: 08/Apr/22 11:22 Worklog Time Spent: 10m Work Description: zhengchenyu commented on PR #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-1092759199 Sorry for miss it, reopen this PR. Issue Time Tracking --- Worklog Id: (was: 754525) Time Spent: 2h 10m (was: 2h) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754511 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 10:48 Start Date: 08/Apr/22 10:48 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845988600 ## iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergV2.java: ## @@ -228,6 +230,104 @@ public void testReadAndWriteFormatV2Partitioned_PosDelete_RowSupplied() throws I Assert.assertArrayEquals(new Object[] {2L, "Trudy", "Pink"}, objects.get(3)); } + @Test + public void testDeleteStatementUnpartitioned() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); + +// create and insert an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +PartitionSpec.unpartitioned(), fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +// insert one more batch so that we have multiple data files within the same partition + shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, +TableIdentifier.of("default", "customers"), false)); + +shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or first_name='Joanna'"); + +List objects = shell.executeStatement("SELECT * FROM customers ORDER BY customer_id, last_name"); +Assert.assertEquals(6, objects.size()); +List expected = TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.add(1L, "Sharon", "Taylor") +.add(2L, "Jake", "Donnel") +.add(2L, "Susan", "Morrison") +.add(2L, "Bob", "Silver") +.add(4L, "Laci", "Zold") +.add(5L, "Peti", "Rozsaszin") +.build(); +HiveIcebergTestUtils.validateData(expected, + HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, objects), 0); + } + + @Test + public void testDeleteStatementPartitioned() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); +PartitionSpec spec = PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.identity("last_name").bucket("customer_id", 16).build(); + +// create and insert an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +// insert one more batch so that we have multiple data files within the same partition + shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, +TableIdentifier.of("default", "customers"), false)); + +shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or first_name='Joanna'"); + +List objects = shell.executeStatement("SELECT * FROM customers ORDER BY customer_id, last_name"); +Assert.assertEquals(6, objects.size()); +List expected = TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.add(1L, "Sharon", "Taylor") +.add(2L, "Jake", "Donnel") +.add(2L, "Susan", "Morrison") +.add(2L, "Bob", "Silver") +.add(4L, "Laci", "Zold") +.add(5L, "Peti", "Rozsaszin") +.build(); +HiveIcebergTestUtils.validateData(expected, + HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, objects), 0); + } + + @Test + public void testDeleteStatementWithOtherTable() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); +PartitionSpec spec = PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.identity("last_name").bucket("customer_id", 16).build(); + +// create a couple of tables, with an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +testTables.createTable(shell, "other", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, 2); + +shell.executeStatement("DELETE FROM customers WHERE customer_id in (select t1.customer_id from customers t1 join " + +"other t2 on t1.customer_id = t2.customer_id) or " + +"first_name in (select first_name from
[jira] [Updated] (HIVE-26126) Allow capturing/validating SQL generated from HMS calls in qtests
[ https://issues.apache.org/jira/browse/HIVE-26126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26126: -- Labels: pull-request-available (was: ) > Allow capturing/validating SQL generated from HMS calls in qtests > - > > Key: HIVE-26126 > URL: https://issues.apache.org/jira/browse/HIVE-26126 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > During the compilation/execution of a Hive command there are usually calls in > the HiveMetastore (HMS). Most of the time these calls need to connect to the > underlying database backend in order to return the requested information so > they trigger the generation and execution of SQL queries. > We have a lot of code in Hive which affects the generation and execution of > these SQL queries and some vivid examples are the {{MetaStoreDirectSql}} and > {{CachedStore}} classes. > [MetaStoreDirectSql|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java] > is responsible for building explicitly SQL queries for performance reasons. > [CachedStore|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java] > is responsible for caching certain requests to avoid going to the database > on every call. > Ensuring that the generated SQL is the expected one and/or that certain > queries are hitting (or not) the DB is valuable for catching regressions or > evaluating the effectiveness of caches. > The idea is that for each Hive command/query in some qtest there is an option > to include in the output (.q.out) the list of SQL queries that were generated > by HMS calls. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26126) Allow capturing/validating SQL generated from HMS calls in qtests
[ https://issues.apache.org/jira/browse/HIVE-26126?focusedWorklogId=754499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754499 ] ASF GitHub Bot logged work on HIVE-26126: - Author: ASF GitHub Bot Created on: 08/Apr/22 10:19 Start Date: 08/Apr/22 10:19 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request, #3197: URL: https://github.com/apache/hive/pull/3197 ### What changes were proposed in this pull request? 1. Create logger configuration (tests only) for writing specific DataNucleus SQL queries to (new) operation log files. 2. Add hook copying the content of datanucleus log files to session's console for use in qtests. 3. Avoid creating appenders & files when hook is inactive. 4. Enable the hook on certain tests with partitions and update output. ### Why are the changes needed? For motivation see HIVE-26126 ### Does this PR introduce _any_ user-facing change? No, the change only affects test. ### How was this patch tested? See changes in .q and .q.out files Issue Time Tracking --- Worklog Id: (was: 754499) Remaining Estimate: 0h Time Spent: 10m > Allow capturing/validating SQL generated from HMS calls in qtests > - > > Key: HIVE-26126 > URL: https://issues.apache.org/jira/browse/HIVE-26126 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > During the compilation/execution of a Hive command there are usually calls in > the HiveMetastore (HMS). Most of the time these calls need to connect to the > underlying database backend in order to return the requested information so > they trigger the generation and execution of SQL queries. > We have a lot of code in Hive which affects the generation and execution of > these SQL queries and some vivid examples are the {{MetaStoreDirectSql}} and > {{CachedStore}} classes. > [MetaStoreDirectSql|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java] > is responsible for building explicitly SQL queries for performance reasons. > [CachedStore|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java] > is responsible for caching certain requests to avoid going to the database > on every call. > Ensuring that the generated SQL is the expected one and/or that certain > queries are hitting (or not) the DB is valuable for catching regressions or > evaluating the effectiveness of caches. > The idea is that for each Hive command/query in some qtest there is an option > to include in the output (.q.out) the list of SQL queries that were generated > by HMS calls. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=754498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754498 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 08/Apr/22 10:18 Start Date: 08/Apr/22 10:18 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r845966555 ## standalone-metastore/metastore-server/pom.xml: ## @@ -474,23 +474,6 @@ - -generate-version-annotation -generate-sources - - - - - - - - - - - - run - - Review Comment: Moved the `MetastoreVersionInfo` as suggested. We do not use it without the `metastore-common`, so it was not causing any issues, but it is more logical this way Issue Time Tracking --- Worklog Id: (was: 754498) Time Spent: 1h 50m (was: 1h 40m) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) > [ERROR] at > com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) > [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) > [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) > [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) > [ERROR] at > com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) > [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) > [ERROR] at
[jira] [Assigned] (HIVE-26126) Allow capturing/validating SQL generated from HMS calls in qtests
[ https://issues.apache.org/jira/browse/HIVE-26126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-26126: -- > Allow capturing/validating SQL generated from HMS calls in qtests > - > > Key: HIVE-26126 > URL: https://issues.apache.org/jira/browse/HIVE-26126 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > During the compilation/execution of a Hive command there are usually calls in > the HiveMetastore (HMS). Most of the time these calls need to connect to the > underlying database backend in order to return the requested information so > they trigger the generation and execution of SQL queries. > We have a lot of code in Hive which affects the generation and execution of > these SQL queries and some vivid examples are the {{MetaStoreDirectSql}} and > {{CachedStore}} classes. > [MetaStoreDirectSql|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java] > is responsible for building explicitly SQL queries for performance reasons. > [CachedStore|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java] > is responsible for caching certain requests to avoid going to the database > on every call. > Ensuring that the generated SQL is the expected one and/or that certain > queries are hitting (or not) the DB is valuable for catching regressions or > evaluating the effectiveness of caches. > The idea is that for each Hive command/query in some qtest there is an option > to include in the output (.q.out) the list of SQL queries that were generated > by HMS calls. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar
[ https://issues.apache.org/jira/browse/HIVE-26074?focusedWorklogId=754487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754487 ] ASF GitHub Bot logged work on HIVE-26074: - Author: ASF GitHub Bot Created on: 08/Apr/22 09:32 Start Date: 08/Apr/22 09:32 Worklog Time Spent: 10m Work Description: ayushtkn commented on code in PR #3187: URL: https://github.com/apache/hive/pull/3187#discussion_r845927749 ## ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java: ## @@ -768,6 +774,9 @@ public static SingleValueBoundaryScanner getBoundaryScanner(BoundaryDef start, B case "string": return new StringPrimitiveValueBoundaryScanner(start, end, exprDef, nullsLast); default: + if (typeString.startsWith("char") || typeString.startsWith("varchar")) { Review Comment: can't do that. the entries aren't fixed char or varchar, they are like char(10) or char(5) or varchar(5) or varchar(6) like that. So putting char or varchar in switch-case won't work Issue Time Tracking --- Worklog Id: (was: 754487) Time Spent: 40m (was: 0.5h) > PTF Vectorization: BoundaryScanner for varchar > -- > > Key: HIVE-26074 > URL: https://issues.apache.org/jira/browse/HIVE-26074 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > HIVE-24761 should be extended for varchar, otherwise it fails on varchar type > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: > attempt to setup a Window for typeString: 'varchar(170)' > at > org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner. (ValueBoundaryScanner.java:1257) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237) > at > org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327) > at > org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383) > ... 16 more > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar
[ https://issues.apache.org/jira/browse/HIVE-26074?focusedWorklogId=754485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754485 ] ASF GitHub Bot logged work on HIVE-26074: - Author: ASF GitHub Bot Created on: 08/Apr/22 09:30 Start Date: 08/Apr/22 09:30 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #3187: URL: https://github.com/apache/hive/pull/3187#discussion_r845925509 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -508,6 +508,7 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "with ${hive.scratch.dir.permission}."), REPLDIR("hive.repl.rootdir","/user/${system:user.name}/repl/", "HDFS root dir for all replication dumps."), +//HS2 IP2 DistCp hdfs://namenodePort:port/use/hive/w/table1 Ip2:/port:...table1 Review Comment: this is not related to the patch I guess, maybe a leftover Issue Time Tracking --- Worklog Id: (was: 754485) Time Spent: 0.5h (was: 20m) > PTF Vectorization: BoundaryScanner for varchar > -- > > Key: HIVE-26074 > URL: https://issues.apache.org/jira/browse/HIVE-26074 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-24761 should be extended for varchar, otherwise it fails on varchar type > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: > attempt to setup a Window for typeString: 'varchar(170)' > at > org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner. (ValueBoundaryScanner.java:1257) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237) > at > org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327) > at > org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383) > ... 16 more > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar
[ https://issues.apache.org/jira/browse/HIVE-26074?focusedWorklogId=754479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754479 ] ASF GitHub Bot logged work on HIVE-26074: - Author: ASF GitHub Bot Created on: 08/Apr/22 09:27 Start Date: 08/Apr/22 09:27 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #3187: URL: https://github.com/apache/hive/pull/3187#discussion_r845922550 ## ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java: ## @@ -768,6 +774,9 @@ public static SingleValueBoundaryScanner getBoundaryScanner(BoundaryDef start, B case "string": return new StringPrimitiveValueBoundaryScanner(start, end, exprDef, nullsLast); default: + if (typeString.startsWith("char") || typeString.startsWith("varchar")) { Review Comment: putting this into default looks strange to me, why not handle similarly to decimal as above Issue Time Tracking --- Worklog Id: (was: 754479) Time Spent: 20m (was: 10m) > PTF Vectorization: BoundaryScanner for varchar > -- > > Key: HIVE-26074 > URL: https://issues.apache.org/jira/browse/HIVE-26074 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-24761 should be extended for varchar, otherwise it fails on varchar type > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: > attempt to setup a Window for typeString: 'varchar(170)' > at > org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner. (ValueBoundaryScanner.java:1257) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237) > at > org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327) > at > org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383) > ... 16 more > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=754473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754473 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 08/Apr/22 09:18 Start Date: 08/Apr/22 09:18 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r845914266 ## standalone-metastore/pom.xml: ## @@ -531,6 +531,29 @@ + + javadoc + + + +org.apache.maven.plugins +maven-javadoc-plugin Review Comment: I think we should also fix the version of the `maven-javadoc-plugin` globally to avoid build warnings and things being broken when a newer version appears. Issue Time Tracking --- Worklog Id: (was: 754473) Time Spent: 1h 40m (was: 1.5h) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) > [ERROR] at > com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) > [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) > [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) > [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) > [ERROR] at > com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) > [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) > [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) > [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) > [ERROR] javadoc: error - fatal error > [ERROR] > [ERROR] Command line was: >
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=754471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754471 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 08/Apr/22 09:15 Start Date: 08/Apr/22 09:15 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r845912230 ## standalone-metastore/metastore-server/pom.xml: ## @@ -474,23 +474,6 @@ - -generate-version-annotation -generate-sources - - - - - - - - - - - - run - - Review Comment: I looked a bit more on the history of things and my understanding is that this `saveVersion.sh` script along with the generated package information are necessary so that `org.apache.hadoop.hive.metastore.utils.MetastoreVersionInfo` runs correctly. From what I can see the `org.apache.hadoop.hive.metastore.utils.MetastoreVersionInfo` class is in metastore-server module and you chose to remove the generated package info from this module. Did you verify if it still runs correctly? I was expecting that `org.apache.hadoop.hive.metastore.utils.MetastoreVersionInfo`, `saveVersion.sh`, etc. should all be in the same module and `metastore-common` seems more appropriate. Issue Time Tracking --- Worklog Id: (was: 754471) Time Spent: 1.5h (was: 1h 20m) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) > [ERROR] at > com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) >
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754446 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 08/Apr/22 07:41 Start Date: 08/Apr/22 07:41 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845830328 ## iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q: ## @@ -0,0 +1,10 @@ +set hive.vectorized.execution.enabled=true; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; Review Comment: Tests passed. I've moved the txnhandler validation logic into UpdateDeleteSemanticAnalyzer, where we already have the table object in hand Issue Time Tracking --- Worklog Id: (was: 754446) Time Spent: 13h (was: 12h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 13h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)