[jira] [Work logged] (HIVE-22224) Support Parquet-Avro Timestamp Type
[ https://issues.apache.org/jira/browse/HIVE-4?focusedWorklogId=754371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754371 ] ASF GitHub Bot logged work on HIVE-4: - Author: ASF GitHub Bot Created on: 08/Apr/22 00:19 Start Date: 08/Apr/22 00:19 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on PR #3002: URL: https://github.com/apache/hive/pull/3002#issuecomment-1092321630 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. Issue Time Tracking --- Worklog Id: (was: 754371) Time Spent: 50m (was: 40m) > Support Parquet-Avro Timestamp Type > --- > > Key: HIVE-4 > URL: https://issues.apache.org/jira/browse/HIVE-4 > Project: Hive > Issue Type: Bug > Components: Database/Schema >Affects Versions: 2.3.5, 2.3.6 >Reporter: cdmikechen >Assignee: cdmikechen >Priority: Major > Labels: parquet, pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When user create an external table and import a parquet-avro data with 1.8.2 > version which supported logical_type in Hive2.3 or before version, Hive can > not read timestamp type column data correctly. > Hive will read it as LongWritable which it actually stores as > long(logical_type=timestamp-millis).So we may add some codes in > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.java > to let Hive cast long type to timestamp type. > Some code like below: > > public Timestamp getPrimitiveJavaObject(Object o) { > if (o instanceof LongWritable) { > return new Timestamp(((LongWritable) o).get()); > } > return o == null ? null : ((TimestampWritable) o).getTimestamp(); > } > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP
[ https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=754304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754304 ] ASF GitHub Bot logged work on HIVE-21456: - Author: ASF GitHub Bot Created on: 07/Apr/22 19:38 Start Date: 07/Apr/22 19:38 Worklog Time Spent: 10m Work Description: nrg4878 commented on code in PR #3105: URL: https://github.com/apache/hive/pull/3105#discussion_r845501219 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HmsThriftHttpServlet.java: ## @@ -0,0 +1,116 @@ +/* * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.metastore; + +import java.io.IOException; +import java.security.PrivilegedExceptionAction; +import java.util.Enumeration; + +import javax.servlet.ServletException; +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.thrift.TProcessor; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.server.TServlet; + +public class HmsThriftHttpServlet extends TServlet { + + private static final Logger LOG = LoggerFactory + .getLogger(HmsThriftHttpServlet.class); + + private static final String X_USER = MetaStoreUtils.USER_NAME_HTTP_HEADER; + + private final boolean isSecurityEnabled; + + public HmsThriftHttpServlet(TProcessor processor, + TProtocolFactory inProtocolFactory, TProtocolFactory outProtocolFactory) { +super(processor, inProtocolFactory, outProtocolFactory); +// This should ideally be reveiving an instance of the Configuration which is used for the check +isSecurityEnabled = UserGroupInformation.isSecurityEnabled(); + } + + public HmsThriftHttpServlet(TProcessor processor, + TProtocolFactory protocolFactory) { +super(processor, protocolFactory); +isSecurityEnabled = UserGroupInformation.isSecurityEnabled(); + } + + @Override + protected void doPost(HttpServletRequest request, + HttpServletResponse response) throws ServletException, IOException { + +Enumeration headerNames = request.getHeaderNames(); +if (LOG.isDebugEnabled()) { + LOG.debug("Logging headers in request"); + while (headerNames.hasMoreElements()) { +String headerName = headerNames.nextElement(); +LOG.debug("Header: [{}], Value: [{}]", headerName, +request.getHeader(headerName)); + } +} +String userFromHeader = request.getHeader(X_USER); +if (userFromHeader == null || userFromHeader.isEmpty()) { + LOG.error("No user header: {} found", X_USER); + response.sendError(HttpServletResponse.SC_FORBIDDEN, + "User Header missing"); + return; +} + +// TODO: These should ideally be in some kind of a Cache with Weak referencse. +// If HMS were to set up some kind of a session, this would go into the session by having +// this filter work with a custom Processor / or set the username into the session +// as is done for HS2. +// In case of HMS, it looks like each request is independent, and there is no session +// information, so the UGI needs to be set up in the Connection layer itself. +UserGroupInformation clientUgi; +// Temporary, and useless for now. Here only to allow this to work on an otherwise kerberized +// server. +if (isSecurityEnabled) { + LOG.info("Creating proxy user for: {}", userFromHeader); + clientUgi = UserGroupInformation.createProxyUser(userFromHeader, UserGroupInformation.getLoginUser()); +} else { + LOG.info("Creating remote user for: {}", userFromHeader); + clientUgi = UserGroupInformation.createRemoteUser(userFromHeader); +} + + +PrivilegedExceptionAction action = new PrivilegedExceptionAction() { + @Override + public Void run() throws Exception { +HmsThriftHttpServlet.super.doPost(request, response); +return null;
[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores
[ https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26123: -- Labels: pull-request-available (was: ) > Introduce test coverage for sysdb for the different metastores > -- > > Key: HIVE-26123 > URL: https://issues.apache.org/jira/browse/HIVE-26123 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 10m > Remaining Estimate: 0h > > _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. > Existing tests are running only against Derby, meaning that any change > against sysdb query mapping is not covered by CI. > The present ticket aims at bridging this gap by introducing test coverage for > the different supported metastore for sydb. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26123) Introduce test coverage for sysdb for the different metastores
[ https://issues.apache.org/jira/browse/HIVE-26123?focusedWorklogId=754234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754234 ] ASF GitHub Bot logged work on HIVE-26123: - Author: ASF GitHub Bot Created on: 07/Apr/22 17:03 Start Date: 07/Apr/22 17:03 Worklog Time Spent: 10m Work Description: asolimando opened a new pull request, #3196: URL: https://github.com/apache/hive/pull/3196 …tores See the JIRA ticket for details. ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `mvn test -Dtest=TestMssqlMetastoreCliDriver -Dtest.output.overwrite -pl itests/qtest -Pitests` `mvn test -Dtest=TestOracleMetastoreCliDriver -Dtest.output.overwrite -pl itests/qtest -Pitests` `mvn test -Dtest=TestMariadbMetastoreCliDriver -Dtest.output.overwrite -pl itests/qtest -Pitests` `mvn test -Dtest=TestMysqlMetastoreCliDriver -Dtest.output.overwrite -pl itests/qtest -Pitests` `mvn test -Dtest=TestPostgresMetastoreCliDriver -Dtest.output.overwrite -pl itests/qtest -Pitests` Issue Time Tracking --- Worklog Id: (was: 754234) Remaining Estimate: 0h Time Spent: 10m > Introduce test coverage for sysdb for the different metastores > -- > > Key: HIVE-26123 > URL: https://issues.apache.org/jira/browse/HIVE-26123 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > Time Spent: 10m > Remaining Estimate: 0h > > _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. > Existing tests are running only against Derby, meaning that any change > against sysdb query mapping is not covered by CI. > The present ticket aims at bridging this gap by introducing test coverage for > the different supported metastore for sydb. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26125) sysdb fails with mysql as metastore db
[ https://issues.apache.org/jira/browse/HIVE-26125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26125: Description: _sysdb.q_ and _strict_managed_tables_sysdb.q_ fail when using MySQL as standalone metastore db. The issue can be reproduced with the following command: {code:java} mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile="sysdb.q,strict_managed_tables_sysdb.q" -Dtest.metastore.db=mysql -pl itests/qtest -Pitests {code} The errors are as follows: {noformat} --- Test set: org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver --- Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 282.638 s <<< FAILURE! - in org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver.testCliDriver[strict_managed_tables_sysdb] Time elapsed: 41.104 s <<< FAILURE! java.lang.AssertionError: Client execution failed with error code = 2 running select tbl_name, tbl_type from tbls where tbl_name like 'smt_sysdb%' order by tbl_name fname=strict_managed_tables_sysdb.q See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs. org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, vertexName=Map 1, vertexId=vertex_1649344918728_0001_33_00, diagnostics=[Task failed, taskId=task_1649344918728_0001_33_00_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1649344918728_0001_33_00_00_0:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught exception while trying to execute query:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '"TBLS"' at line 14 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught exception while trying to execute query:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '"TBLS"' at line 14 at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:89) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) ... 15 more Caused by: java.io.IOException: java.io.IOException: org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught exception while trying to execute query:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '"TBLS"' at line 14 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at
[jira] [Commented] (HIVE-20205) Upgrade HBase dependencies off alpha4 release
[ https://issues.apache.org/jira/browse/HIVE-20205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518984#comment-17518984 ] Naveen Gangam commented on HIVE-20205: -- Based on analysis in HIVE-26124, HBase 2 is incompatible with Hadoop3. > Upgrade HBase dependencies off alpha4 release > - > > Key: HIVE-20205 > URL: https://issues.apache.org/jira/browse/HIVE-20205 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-20205.1.patch, HIVE-20205.1.patch, > HIVE-20205.2.patch, HIVE-20205.2.patch, HIVE-20205.3.patch, HIVE-20205.patch, > HIVE-20205.patch > > > Appears Hive has dependencies on hbase 2.0.0-alpha4 releases. HBase 2.0.0 and > 2.0.1 have been released. HBase team recommends 2.0.1 and says there shouldnt > be any API surprises. (but we never know) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-20205) Upgrade HBase dependencies off alpha4 release
[ https://issues.apache.org/jira/browse/HIVE-20205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-20205: - Resolution: Won't Fix Status: Resolved (was: Patch Available) > Upgrade HBase dependencies off alpha4 release > - > > Key: HIVE-20205 > URL: https://issues.apache.org/jira/browse/HIVE-20205 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-20205.1.patch, HIVE-20205.1.patch, > HIVE-20205.2.patch, HIVE-20205.2.patch, HIVE-20205.3.patch, HIVE-20205.patch, > HIVE-20205.patch > > > Appears Hive has dependencies on hbase 2.0.0-alpha4 releases. HBase 2.0.0 and > 2.0.1 have been released. HBase team recommends 2.0.1 and says there shouldnt > be any API surprises. (but we never know) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518980#comment-17518980 ] Naveen Gangam commented on HIVE-26124: -- Thanks Peter. I will close the other jira as well. > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-26124. --- Resolution: Won't Fix HBase 2 and Hadoop 3 is incompatible. We might have to move forward to HBase 3 if it becomes available. > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518978#comment-17518978 ] Peter Vary commented on HIVE-26124: --- Talked to [~stoty], and he pointed out that he already did this exercise on HIVE-24473. The short story is that HBase 2.x is compiled against Hadoop 2, and it could not be used for testing with any Hadoop 3 artifacts. The root cause is HBASE-22394 BTW. Thanks [~stoty] for the pointers! > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?focusedWorklogId=754203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754203 ] ASF GitHub Bot logged work on HIVE-26124: - Author: ASF GitHub Bot Created on: 07/Apr/22 15:55 Start Date: 07/Apr/22 15:55 Worklog Time Spent: 10m Work Description: pvary closed pull request #3186: HIVE-26124: Upgrade HBase from 2.0.0-alpha4 to 2.0.0 URL: https://github.com/apache/hive/pull/3186 Issue Time Tracking --- Worklog Id: (was: 754203) Time Spent: 20m (was: 10m) > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode
[ https://issues.apache.org/jira/browse/HIVE-26117?focusedWorklogId=754195=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754195 ] ASF GitHub Bot logged work on HIVE-26117: - Author: ASF GitHub Bot Created on: 07/Apr/22 15:47 Start Date: 07/Apr/22 15:47 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3179: URL: https://github.com/apache/hive/pull/3179#discussion_r845281998 ## ql/src/test/results/clientnegative/joinneg.q.out: ## @@ -1 +1 @@ -FAILED: SemanticException [Error 10004]: Line 6:12 Invalid table alias or column reference 'b': (possible column names are: x.key, x.value, y.key, y.value) +FAILED: SemanticException [Error 10009]: Line 6:12 Invalid table alias 'b' Review Comment: The original error message was more informative. ## ql/src/test/results/clientpositive/llap/views_explain_ddl.q.out: ## @@ -305,7 +305,7 @@ TBLPROPERTIES ( ALTER TABLE db1.table2_n13 UPDATE STATISTICS SET('numRows'='0','rawDataSize'='0' ); ALTER TABLE db1.table1_n19 UPDATE STATISTICS SET('numRows'='0','rawDataSize'='0' ); -CREATE VIEW `db1`.`v3_n3` AS SELECT `t1`.`key`, `t1`.`value`, `t2`.`key` `k` FROM `db1`.`table1_n19` `t1` JOIN `db1`.`table2_n13` `t2` ON `t1`.`key` = `t2`.`key`; +CREATE VIEW `db1`.`v3_n3` AS SELECT `t1`.`key`, `t1`.`value`, `t2`.`key` `k` FROM `db1`.`table1_n19` `t1` JOIN `db1`.`table2_n13` `t2` ON t1.key = t2.key; Review Comment: View expanded text changed: quotation removed from table and column names in join condition: ``` t1.key = t2.key ``` should remain ``` `t1`.`key` = `t2`.`key` ``` Issue Time Tracking --- Worklog Id: (was: 754195) Time Spent: 20m (was: 10m) > Remove 2 superfluous lines of code in genJoinRelNode > > > Key: HIVE-26117 > URL: https://issues.apache.org/jira/browse/HIVE-26117 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Steve Carlin >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The code was rewritten to associate ASTNodes to RexNodes. Some code was left > behind that doesn't add any value. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable
[ https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=754173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754173 ] ASF GitHub Bot logged work on HIVE-25980: - Author: ASF GitHub Bot Created on: 07/Apr/22 15:13 Start Date: 07/Apr/22 15:13 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3053: URL: https://github.com/apache/hive/pull/3053#discussion_r845256354 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java: ## @@ -422,21 +413,46 @@ void findUnknownPartitions(Table table, Set partPaths, byte[] filterExp, } allPartDirs = partDirs; } -// don't want the table dir -allPartDirs.remove(tablePath); - -// remove the partition paths we know about -allPartDirs.removeAll(partPaths); - Set partColNames = Sets.newHashSet(); for(FieldSchema fSchema : getPartCols(table)) { partColNames.add(fSchema.getName()); } Map partitionColToTypeMap = getPartitionColtoTypeMap(table.getPartitionKeys()); + +Set correctPartPathsInMS = new HashSet<>(partPathsInMS); +// remove partition paths in partPathsInMS, to getPartitionsNotOnFs +partPathsInMS.removeAll(allPartDirs); +FileSystem fs = tablePath.getFileSystem(conf); +// There can be edge case where user can define partition directory outside of table directory +// to avoid eviction of such partitions +// we check for partition path not exists and add to result for getPartitionsNotOnFs. +for (Path partPath : partPathsInMS) { + CheckResult.PartitionResult pr = new CheckResult.PartitionResult(); + pr.setTableName(table.getTableName()); + pr.setPartitionName(getPartitionName(fs.makeQualified(tablePath), + partPath, partColNames, partitionColToTypeMap)); + if (!fs.exists(partPath)) { +result.getPartitionsNotOnFs().add(pr); +correctPartPathsInMS.remove(partPath); + } +} +for (Path partPath : correctPartPathsInMS) { + CheckResult.PartitionResult pr = new CheckResult.PartitionResult(); + pr.setTableName(table.getTableName()); + pr.setPartitionName(getPartitionName(fs.makeQualified(tablePath), + partPath, partColNames, partitionColToTypeMap)); + result.getCorrectPartitions().add(pr); +} + +// don't want the table dir +allPartDirs.remove(tablePath); + +// remove the partition paths we know about +allPartDirs.removeAll(partPaths); Review Comment: Does allPartDirs contain non-full path objects? Do we need them there? Issue Time Tracking --- Worklog Id: (was: 754173) Time Spent: 5h 10m (was: 5h) > Reduce fs calls in HiveMetaStoreChecker.checkTable > -- > > Key: HIVE-25980 > URL: https://issues.apache.org/jira/browse/HIVE-25980 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.2, 4.0.0 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Major > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > MSCK Repair table for high partition table can perform slow on Cloud Storage > such as S3, one of the case we found where slowness was observed in > HiveMetaStoreChecker.checkTable. > {code:java} > "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 > tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at > sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464) > at > sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68) > at > sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341) > at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73) > at > sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957) > at > com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) > at > com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) > at > com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) > at >
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754159 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 15:01 Start Date: 07/Apr/22 15:01 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845243489 ## iceberg/iceberg-handler/src/test/queries/positive/delete_iceberg_partitioned_avro.q: ## @@ -0,0 +1,26 @@ +set hive.vectorized.execution.enabled=false; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; + +drop table if exists tbl_ice; +create external table tbl_ice(a int, b string, c int) partitioned by spec (bucket(16, a), truncate(3, b)) stored by iceberg stored as avro tblproperties ('format-version'='2'); + + Issue Time Tracking --- Worklog Id: (was: 754159) Time Spent: 12h 50m (was: 12h 40m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 12h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754152 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 14:59 Start Date: 07/Apr/22 14:59 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845240720 ## iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q: ## @@ -0,0 +1,10 @@ +set hive.vectorized.execution.enabled=true; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; Review Comment: I've tried to address this with this commit: [a7fb7f9](https://github.com/apache/hive/pull/3131/commits/a7fb7f90a2fcc3c69b9e533de35b16eda99e3719) Issue Time Tracking --- Worklog Id: (was: 754152) Time Spent: 12h 40m (was: 12.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 12h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores
[ https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26123: Description: _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. Existing tests are running only against Derby, meaning that any change against sysdb query mapping is not covered by CI. The present ticket aims at bridging this gap by introducing test coverage for the different supported metastore for sydb. was: _sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. Existing tests are running only against Derby, meaning that any change against sysdb query mapping is not covered by CI. The present ticket aims at bridging this gap by introducing test coverage for the different supported metastore for sydb. > Introduce test coverage for sysdb for the different metastores > -- > > Key: HIVE-26123 > URL: https://issues.apache.org/jira/browse/HIVE-26123 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. > Existing tests are running only against Derby, meaning that any change > against sysdb query mapping is not covered by CI. > The present ticket aims at bridging this gap by introducing test coverage for > the different supported metastore for sydb. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-26119) Remove unnecessary Exceptions from DDLPlanUtils
[ https://issues.apache.org/jira/browse/HIVE-26119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-26119. Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/71b62c68ef76e90ee53281102870d570c8f50834. Thanks for the PR [~soumyakanti.das]! > Remove unnecessary Exceptions from DDLPlanUtils > --- > > Key: HIVE-26119 > URL: https://issues.apache.org/jira/browse/HIVE-26119 > Project: Hive > Issue Type: Improvement >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 10m > Remaining Estimate: 0h > > There are a few {{HiveExceptions}} which were added to a few methods like > {{getCreateTableCommand}}, {{getColumns}}, {{formatType}}, etc, which can be > removed. Some methods in {{ExplainTask}} can also be cleaned up which are > related. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26119) Remove unnecessary Exceptions from DDLPlanUtils
[ https://issues.apache.org/jira/browse/HIVE-26119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26119: -- Labels: pull-request-available (was: ) > Remove unnecessary Exceptions from DDLPlanUtils > --- > > Key: HIVE-26119 > URL: https://issues.apache.org/jira/browse/HIVE-26119 > Project: Hive > Issue Type: Improvement >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > There are a few {{HiveExceptions}} which were added to a few methods like > {{getCreateTableCommand}}, {{getColumns}}, {{formatType}}, etc, which can be > removed. Some methods in {{ExplainTask}} can also be cleaned up which are > related. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26019) Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0
[ https://issues.apache.org/jira/browse/HIVE-26019?focusedWorklogId=754134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754134 ] ASF GitHub Bot logged work on HIVE-26019: - Author: ASF GitHub Bot Created on: 07/Apr/22 14:41 Start Date: 07/Apr/22 14:41 Worklog Time Spent: 10m Work Description: zabetak closed pull request #3075: HIVE-26019 HIVE-26020: Improvements around transitive dependencies from calcite-core URL: https://github.com/apache/hive/pull/3075 Issue Time Tracking --- Worklog Id: (was: 754134) Time Spent: 0.5h (was: 20m) > Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0 > --- > > Key: HIVE-26019 > URL: https://issues.apache.org/jira/browse/HIVE-26019 > Project: Hive > Issue Type: Task > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26119) Remove unnecessary Exceptions from DDLPlanUtils
[ https://issues.apache.org/jira/browse/HIVE-26119?focusedWorklogId=754135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754135 ] ASF GitHub Bot logged work on HIVE-26119: - Author: ASF GitHub Bot Created on: 07/Apr/22 14:41 Start Date: 07/Apr/22 14:41 Worklog Time Spent: 10m Work Description: zabetak closed pull request #3184: HIVE-26119: Remove unnecessary Exceptions from DDLPlanUtils URL: https://github.com/apache/hive/pull/3184 Issue Time Tracking --- Worklog Id: (was: 754135) Remaining Estimate: 0h Time Spent: 10m > Remove unnecessary Exceptions from DDLPlanUtils > --- > > Key: HIVE-26119 > URL: https://issues.apache.org/jira/browse/HIVE-26119 > Project: Hive > Issue Type: Improvement >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > There are a few {{HiveExceptions}} which were added to a few methods like > {{getCreateTableCommand}}, {{getColumns}}, {{formatType}}, etc, which can be > removed. Some methods in {{ExplainTask}} can also be cleaned up which are > related. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-26020) Set dependency scope for json-path, commons-compiler and janino to runtime
[ https://issues.apache.org/jira/browse/HIVE-26020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-26020. Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b. Thanks for the reviews [~asolimando], [~kkasa]! > Set dependency scope for json-path, commons-compiler and janino to runtime > -- > > Key: HIVE-26020 > URL: https://issues.apache.org/jira/browse/HIVE-26020 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0-alpha-2 > > > The dependencies are necessary only when running Hive. They are not required > during compilation since Hive does not depend on them directly but > transitively through Calcite. > > Changing the scope to runtime makes the intention clear and guards against > accidental usages in Hive. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-26019) Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0
[ https://issues.apache.org/jira/browse/HIVE-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-26019. Resolution: Fixed Fixed in https://github.com/apache/hive/commit/73cbab65eafd58c07f5658a163a331dcdac8046d. Thanks for the reviews [~asolimando] [~kkasa]! > Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0 > --- > > Key: HIVE-26019 > URL: https://issues.apache.org/jira/browse/HIVE-26019 > Project: Hive > Issue Type: Task > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26019) Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0
[ https://issues.apache.org/jira/browse/HIVE-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-26019: --- Fix Version/s: 4.0.0-alpha-2 > Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0 > --- > > Key: HIVE-26019 > URL: https://issues.apache.org/jira/browse/HIVE-26019 > Project: Hive > Issue Type: Task > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518896#comment-17518896 ] Peter Vary commented on HIVE-26124: --- Now I back at the first step: {code} [ERROR] Please refer to /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream. [ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /Users/pvary/dev/upstream/hive/hbase-handler && /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/bin/java -Xmx2048m -jar /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire/surefirebooter1320893522602873596.jar /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire 2022-04-07T15-55-06_090-jvmRun1 surefire4212888302150641194tmp surefire_04095119596947982877tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 134 [ERROR] Crashed tests: [ERROR] org.apache.hadoop.hive.hbase.TestHBaseQueries [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /Users/pvary/dev/upstream/hive/hbase-handler && /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/bin/java -Xmx2048m -jar /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire/surefirebooter1320893522602873596.jar /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire 2022-04-07T15-55-06_090-jvmRun1 surefire4212888302150641194tmp surefire_04095119596947982877tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 134 [ERROR] Crashed tests: [ERROR] org.apache.hadoop.hive.hbase.TestHBaseQueries [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:513) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:460) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:301) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:249) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1217) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1063) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:889) [ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) [ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192) [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105) [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:972) [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:293) [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:196) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [ERROR] at java.lang.reflect.Method.invoke(Method.java:498) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347) [ERROR]
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754095 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 13:56 Start Date: 07/Apr/22 13:56 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845167903 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java: ## @@ -224,7 +232,7 @@ public Writable serialize(Object o, ObjectInspector objectInspector) { Deserializer deserializer = deserializers.get(objectInspector); if (deserializer == null) { deserializer = new Deserializer.Builder() - .schema(tableSchema) + .schema(isDelete ? deleteSchema : tableSchema) Review Comment: Yes, I think that's a good idea Issue Time Tracking --- Worklog Id: (was: 754095) Time Spent: 12.5h (was: 12h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 12.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518870#comment-17518870 ] Peter Vary commented on HIVE-26124: --- That would be nice. There is some config changes in the test utils. > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518869#comment-17518869 ] Naveen Gangam commented on HIVE-26124: -- got you. Shocking that alpha4 release has no issues but GA does. We need some HBase help on this then? > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518866#comment-17518866 ] Peter Vary commented on HIVE-26124: --- I think I am struggling with the same test failures on the PR. {code} Caused by: java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.(InetSocketAddress.java:224) at org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1217) at org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1184) at org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:723) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:561) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.(MiniHBaseCluster.java:147) {code} I was expecting some issues, so I was trying to be conservative. If we can fix the issues, I would be happy to move as high as possible with the dependency > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518865#comment-17518865 ] Naveen Gangam commented on HIVE-26124: -- [~pvary] https://issues.apache.org/jira/browse/HIVE-20205 never got committed due to not having a clean test run. I can close it as duplicate of this. But is there a reason we are using 2.0.0 (it looks like my 3-year old patch was using 2.1.0). ? Thanks > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754020 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 12:34 Start Date: 07/Apr/22 12:34 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845080500 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java: ## @@ -224,7 +232,7 @@ public Writable serialize(Object o, ObjectInspector objectInspector) { Deserializer deserializer = deserializers.get(objectInspector); if (deserializer == null) { deserializer = new Deserializer.Builder() - .schema(tableSchema) + .schema(isDelete ? deleteSchema : tableSchema) Review Comment: would it make sense to keep the `projectedSchema` attribute and remove the `isDelete` and the `deleteSchema`? Issue Time Tracking --- Worklog Id: (was: 754020) Time Spent: 12h 20m (was: 12h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 12h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754003 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 12:09 Start Date: 07/Apr/22 12:09 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845056909 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.IOException; +import java.util.List; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.TaskAttemptID; +import org.apache.iceberg.DeleteFile; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.io.ClusteredPositionDeleteWriter; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.FileWriterFactory; +import org.apache.iceberg.io.OutputFileFactory; +import org.apache.iceberg.mr.mapred.Container; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HiveIcebergDeleteWriter extends HiveIcebergWriter { + private static final Logger LOG = LoggerFactory.getLogger(HiveIcebergDeleteWriter.class); + + private final ClusteredPositionDeleteWriter deleteWriter; Review Comment: Yes, we can use `PartitioningWriter` as the common ancestor, which has the `write()` and `close()` methods conveniently. I've moved the writer object into the parent class, and now the children don't need to override the `close()` method anymore. However, in `files()` we need to cast to `DataWriteResult` and `DeleteWriteResult` Issue Time Tracking --- Worklog Id: (was: 754003) Time Spent: 12h 10m (was: 12h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 12h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=754002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754002 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 12:09 Start Date: 07/Apr/22 12:09 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845056909 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.IOException; +import java.util.List; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.TaskAttemptID; +import org.apache.iceberg.DeleteFile; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.io.ClusteredPositionDeleteWriter; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.FileWriterFactory; +import org.apache.iceberg.io.OutputFileFactory; +import org.apache.iceberg.mr.mapred.Container; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HiveIcebergDeleteWriter extends HiveIcebergWriter { + private static final Logger LOG = LoggerFactory.getLogger(HiveIcebergDeleteWriter.class); + + private final ClusteredPositionDeleteWriter deleteWriter; Review Comment: Yes, we can use `PartitioningWriter` as the common ancestor, which has the `write()` and `close()` methods conveniently. I've moved the writer object into the parent class, and now the children don't need to override the `close()` method anymore Issue Time Tracking --- Worklog Id: (was: 754002) Time Spent: 12h (was: 11h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 12h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753995 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:52 Start Date: 07/Apr/22 11:52 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845042863 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/FilesForCommit.java: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.Serializable; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.ContentFile; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DeleteFile; + +public class FilesForCommit implements Serializable { + + private final List dataFiles; Review Comment: I would expect the difference is not very significant, probably `DataFile[]` is a bit more performant, but not sure Issue Time Tracking --- Worklog Id: (was: 753995) Time Spent: 11h 50m (was: 11h 40m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 11h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753993 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:37 Start Date: 07/Apr/22 11:37 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845029968 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/FilesForCommit.java: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.Serializable; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.ContentFile; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DeleteFile; + +public class FilesForCommit implements Serializable { + + private final List dataFiles; + private final List deleteFiles; + + public FilesForCommit(List dataFiles, List deleteFiles) { +this.dataFiles = dataFiles; +this.deleteFiles = deleteFiles; + } + + public static FilesForCommit onlyDelete(List deleteFiles) { +return new FilesForCommit(Collections.emptyList(), deleteFiles); + } + + public static FilesForCommit onlyData(List dataFiles) { +return new FilesForCommit(dataFiles, Collections.emptyList()); + } + + public static FilesForCommit empty() { +return new FilesForCommit(Collections.emptyList(), Collections.emptyList()); + } + + public List getDataFiles() { Review Comment: Done Issue Time Tracking --- Worklog Id: (was: 753993) Time Spent: 11h 40m (was: 11.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753992 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:37 Start Date: 07/Apr/22 11:37 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845029785 ## ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java: ## @@ -97,12 +100,22 @@ private void reparseAndSuperAnalyze(ASTNode tree) throws SemanticException { Table mTable = getTargetTable(tabName); validateTargetTable(mTable); +// save the operation type into the query state +SessionStateUtil.addResource(conf, Context.Operation.class.getSimpleName(), operation.name()); + StringBuilder rewrittenQueryStr = new StringBuilder(); rewrittenQueryStr.append("insert into table "); rewrittenQueryStr.append(getFullTableNameForSQL(tabName)); addPartitionColsToInsert(mTable.getPartCols(), rewrittenQueryStr); -rewrittenQueryStr.append(" select ROW__ID"); +boolean nonNativeAcid = mTable.getStorageHandler() != null && mTable.getStorageHandler().supportsAcidOperations(); Review Comment: Sure, makes sense! I've added a util method to AcidUtils Issue Time Tracking --- Worklog Id: (was: 753992) Time Spent: 11.5h (was: 11h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 11.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=753989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753989 ] ASF GitHub Bot logged work on HIVE-26121: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:30 Start Date: 07/Apr/22 11:30 Worklog Time Spent: 10m Work Description: pvary commented on PR #3181: URL: https://github.com/apache/hive/pull/3181#issuecomment-1091622189 I have missed this before, but do we really need to synchronize `DriverTxnHandler.endTransactionAndCleanup` and `DbTxnManager.java.stopHeartbeat` too? Otherwise LGTM Issue Time Tracking --- Worklog Id: (was: 753989) Time Spent: 40m (was: 0.5h) > Hive transaction rollback should be thread-safe > --- > > Key: HIVE-26121 > URL: https://issues.apache.org/jira/browse/HIVE-26121 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When Hive query is being interrupted via cancel request, both the background > pool thread (HiveServer2-Background) executing the query and the HttpHandler > thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic > will eventually trigger the below method: > {code} > DriverTxnHandler.endTransactionAndCleanup(boolean commit) > {code} > Since this method could be invoked concurrently we need to synchronize access > to it, so that only 1 thread would attempt to abort the transaction and stop > the heartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753986 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:24 Start Date: 07/Apr/22 11:24 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845020372 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -7822,9 +7824,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) List vecCol = new ArrayList(); -if (updating(dest) || deleting(dest)) { +boolean nonNativeAcid = Optional.ofNullable(destinationTable) +.map(Table::getStorageHandler) +.map(HiveStorageHandler::supportsAcidOperations) +.orElse(false); +boolean isUpdateDelete = updating(dest) || deleting(dest); +if (!nonNativeAcid && isUpdateDelete) { Review Comment: I agree, that's more readable Issue Time Tracking --- Worklog Id: (was: 753986) Time Spent: 11h 20m (was: 11h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 11h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753982 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:19 Start Date: 07/Apr/22 11:19 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845016196 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java: ## @@ -224,7 +232,7 @@ public Writable serialize(Object o, ObjectInspector objectInspector) { Deserializer deserializer = deserializers.get(objectInspector); if (deserializer == null) { deserializer = new Deserializer.Builder() - .schema(tableSchema) + .schema(isDelete ? deleteSchema : tableSchema) Review Comment: `projectedSchema` is only local variable inside `initialize()` and not available here Issue Time Tracking --- Worklog Id: (was: 753982) Time Spent: 11h 10m (was: 11h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 11h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753981 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:17 Start Date: 07/Apr/22 11:17 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845014886 ## itests/qtest-iceberg/pom.xml: ## @@ -122,6 +122,12 @@ jersey-servlet test + + org.roaringbitmap Review Comment: The q test fails with ClassNotFoundException if this is not here. It's the same dependency included into the handler module: https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/pom.xml#L103-L108 Issue Time Tracking --- Worklog Id: (was: 753981) Time Spent: 11h (was: 10h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 11h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753977 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:00 Start Date: 07/Apr/22 11:00 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845001375 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -484,10 +500,35 @@ private static Schema readSchema(Configuration conf, Schema tableSchema, boolean String[] selectedColumns = InputFormatConfig.selectedColumns(conf); if (selectedColumns == null) { -return tableSchema; +return table.schema(); + } + + readSchema = caseSensitive ? table.schema().select(selectedColumns) : + table.schema().caseInsensitiveSelect(selectedColumns); + + // for DELETE queries, add additional metadata columns into the read schema + if (HiveIcebergStorageHandler.isDelete(conf, conf.get(Catalogs.NAME))) { +readSchema = IcebergAcidUtil.createFileReadSchemaForDelete(readSchema.columns(), table); } - return caseSensitive ? tableSchema.select(selectedColumns) : tableSchema.caseInsensitiveSelect(selectedColumns); + return readSchema; +} + +private Schema schemaWithoutConstantsAndMeta(Schema readSchema, Map idToConstant) { Review Comment: Yes! Issue Time Tracking --- Worklog Id: (was: 753977) Time Spent: 10h 40m (was: 10.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753978 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:00 Start Date: 07/Apr/22 11:00 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r845001571 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.hadoop.hive.ql.io.PositionDeleteInfo; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap(); Review Comment: Done Issue Time Tracking --- Worklog Id: (was: 753978) Time Spent: 10h 50m (was: 10h 40m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar
[ https://issues.apache.org/jira/browse/HIVE-26074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26074: -- Labels: pull-request-available (was: ) > PTF Vectorization: BoundaryScanner for varchar > -- > > Key: HIVE-26074 > URL: https://issues.apache.org/jira/browse/HIVE-26074 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-24761 should be extended for varchar, otherwise it fails on varchar type > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: > attempt to setup a Window for typeString: 'varchar(170)' > at > org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner. (ValueBoundaryScanner.java:1257) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237) > at > org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327) > at > org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383) > ... 16 more > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26074) PTF Vectorization: BoundaryScanner for varchar
[ https://issues.apache.org/jira/browse/HIVE-26074?focusedWorklogId=753976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753976 ] ASF GitHub Bot logged work on HIVE-26074: - Author: ASF GitHub Bot Created on: 07/Apr/22 11:00 Start Date: 07/Apr/22 11:00 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request, #3187: URL: https://github.com/apache/hive/pull/3187 HIVE-26074: PTF Vectorization: BoundaryScanner for varchar. Issue Time Tracking --- Worklog Id: (was: 753976) Remaining Estimate: 0h Time Spent: 10m > PTF Vectorization: BoundaryScanner for varchar > -- > > Key: HIVE-26074 > URL: https://issues.apache.org/jira/browse/HIVE-26074 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-24761 should be extended for varchar, otherwise it fails on varchar type > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: > attempt to setup a Window for typeString: 'varchar(170)' > at > org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.getBoundaryScanner(ValueBoundaryScanner.java:773) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner$MultiPrimitiveValueBoundaryScanner. (ValueBoundaryScanner.java:1257) > at > org.apache.hadoop.hive.ql.udf.ptf.MultiValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:1237) > at > org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.getScanner(ValueBoundaryScanner.java:327) > at > org.apache.hadoop.hive.ql.udf.ptf.PTFRangeUtil.getRange(PTFRangeUtil.java:40) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.finishPartition(VectorPTFGroupBatches.java:442) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.finishPartition(VectorPTFOperator.java:631) > at > org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.closeOp(VectorPTFOperator.java:782) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383) > ... 16 more > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753973 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 10:57 Start Date: 07/Apr/22 10:57 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844995858 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -234,22 +240,23 @@ private static void checkResiduals(CombinedScanTask task) { private CloseableIterator currentIterator; private FileIO io; private EncryptionManager encryptionManager; +private Table table; Review Comment: We need the whole table object for this call: ``` MetadataColumns#metadataColumn(Table table, String name) ``` (which is called inside IcebergAcidUtil#createFileReadSchemaForDelete) -> this gives us the _partition metadata column during file read Issue Time Tracking --- Worklog Id: (was: 753973) Time Spent: 10.5h (was: 10h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753971 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 10:56 Start Date: 07/Apr/22 10:56 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844997990 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -234,22 +240,23 @@ private static void checkResiduals(CombinedScanTask task) { private CloseableIterator currentIterator; private FileIO io; private EncryptionManager encryptionManager; +private Table table; Review Comment: I'll remove those fields which are easily derivable from table, such as io and encryption Issue Time Tracking --- Worklog Id: (was: 753971) Time Spent: 10h 20m (was: 10h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753967 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 10:53 Start Date: 07/Apr/22 10:53 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844995858 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -234,22 +240,23 @@ private static void checkResiduals(CombinedScanTask task) { private CloseableIterator currentIterator; private FileIO io; private EncryptionManager encryptionManager; +private Table table; Review Comment: We need the whole table object for this call: ``` public static NestedField metadataColumn(Table table, String name) ``` (which is called inside IcebergAcidUtil#createFileReadSchemaForDelete) -> this gives us the _partition metadata column during file read Issue Time Tracking --- Worklog Id: (was: 753967) Time Spent: 10h 10m (was: 10h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753966 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 10:51 Start Date: 07/Apr/22 10:51 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844994035 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.hadoop.hive.ql.io.PositionDeleteInfo; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILEREAD_META_COLS.put(MetadataColumns.SPEC_ID, 0); Review Comment: I chose a linked hashmap so that the iteration order is always deterministic when I extend the schema here: ``` DELETE_FILE_READ_META_COLS.forEach((col, index) -> ... ``` ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.hadoop.hive.ql.io.PositionDeleteInfo; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILEREAD_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILEREAD_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILEREAD_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILEREAD_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID,
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753965 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 10:49 Start Date: 07/Apr/22 10:49 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844992530 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException { while (true) { if (currentIterator.hasNext()) { current = currentIterator.next(); + Configuration conf = context.getConfiguration(); Review Comment: Sure Issue Time Tracking --- Worklog Id: (was: 753965) Time Spent: 9h 50m (was: 9h 40m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 9h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753942 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:59 Start Date: 07/Apr/22 09:59 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844950764 ## itests/qtest-iceberg/pom.xml: ## @@ -122,6 +122,12 @@ jersey-servlet test + + org.roaringbitmap Review Comment: Where is this used? Issue Time Tracking --- Worklog Id: (was: 753942) Time Spent: 9h 40m (was: 9.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 9h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753941 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:57 Start Date: 07/Apr/22 09:57 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844948586 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -484,10 +500,35 @@ private static Schema readSchema(Configuration conf, Schema tableSchema, boolean String[] selectedColumns = InputFormatConfig.selectedColumns(conf); if (selectedColumns == null) { -return tableSchema; +return table.schema(); + } + + readSchema = caseSensitive ? table.schema().select(selectedColumns) : + table.schema().caseInsensitiveSelect(selectedColumns); + + // for DELETE queries, add additional metadata columns into the read schema + if (HiveIcebergStorageHandler.isDelete(conf, conf.get(Catalogs.NAME))) { +readSchema = IcebergAcidUtil.createFileReadSchemaForDelete(readSchema.columns(), table); } - return caseSensitive ? tableSchema.select(selectedColumns) : tableSchema.caseInsensitiveSelect(selectedColumns); + return readSchema; +} + +private Schema schemaWithoutConstantsAndMeta(Schema readSchema, Map idToConstant) { Review Comment: Could this be a static method? Issue Time Tracking --- Worklog Id: (was: 753941) Time Spent: 9.5h (was: 9h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 9.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753940 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:56 Start Date: 07/Apr/22 09:56 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844947260 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -234,22 +240,23 @@ private static void checkResiduals(CombinedScanTask task) { private CloseableIterator currentIterator; private FileIO io; private EncryptionManager encryptionManager; +private Table table; Review Comment: Do we need the whole table here? Or we just need the partition objects and schema and... Either we remove the table, and set the specific values in `initialize`, or keep the table and remove the ones which are easily accessible, and do not need calculation. Issue Time Tracking --- Worklog Id: (was: 753940) Time Spent: 9h 20m (was: 9h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 9h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753938 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:52 Start Date: 07/Apr/22 09:52 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844943512 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException { while (true) { if (currentIterator.hasNext()) { current = currentIterator.next(); + Configuration conf = context.getConfiguration(); Review Comment: We can set it as an object attribute instead of getting it again and again Issue Time Tracking --- Worklog Id: (was: 753938) Time Spent: 9h 10m (was: 9h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 9h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753934 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:48 Start Date: 07/Apr/22 09:48 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844939837 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.hadoop.hive.ql.io.PositionDeleteInfo; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILEREAD_META_COLS.put(MetadataColumns.SPEC_ID, 0); +DELETE_FILEREAD_META_COLS.put(PARTITION_STRUCT_META_COL, 1); +DELETE_FILEREAD_META_COLS.put(MetadataColumns.FILE_PATH, 2); +DELETE_FILEREAD_META_COLS.put(MetadataColumns.ROW_POSITION, 3); + } + + private static final Types.NestedField PARTITION_HASH_META_COL = Types.NestedField.required( + MetadataColumns.PARTITION_COLUMN_ID, MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get()); + private static final Map DELETE_SERDE_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0); Review Comment: Maybe use ImmutableMap? Issue Time Tracking --- Worklog Id: (was: 753934) Time Spent: 9h (was: 8h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 9h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753933 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:47 Start Date: 07/Apr/22 09:47 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844939281 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.hadoop.hive.ql.io.PositionDeleteInfo; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap(); + + static { +DELETE_FILEREAD_META_COLS.put(MetadataColumns.SPEC_ID, 0); Review Comment: Maybe use `ImmutableMap`? Issue Time Tracking --- Worklog Id: (was: 753933) Time Spent: 8h 50m (was: 8h 40m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 8h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753932 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:47 Start Date: 07/Apr/22 09:47 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844938722 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.hadoop.hive.ql.io.PositionDeleteInfo; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.data.GenericRecord; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.StructProjection; + +public class IcebergAcidUtil { + + private IcebergAcidUtil() { + } + + private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // placeholder value in the map + private static final Map DELETE_FILEREAD_META_COLS = Maps.newLinkedHashMap(); Review Comment: nit: FILE_READ? When you are using camelcase you write FileRead :D Issue Time Tracking --- Worklog Id: (was: 753932) Time Spent: 8h 40m (was: 8.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 8h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753929 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:37 Start Date: 07/Apr/22 09:37 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844929235 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java: ## @@ -224,7 +232,7 @@ public Writable serialize(Object o, ObjectInspector objectInspector) { Deserializer deserializer = deserializers.get(objectInspector); if (deserializer == null) { deserializer = new Deserializer.Builder() - .schema(tableSchema) + .schema(isDelete ? deleteSchema : tableSchema) Review Comment: Why not use the `projectedSchema` here? Issue Time Tracking --- Worklog Id: (was: 753929) Time Spent: 8.5h (was: 8h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 8.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753928 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:34 Start Date: 07/Apr/22 09:34 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844925536 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputFormat.java: ## @@ -83,9 +83,20 @@ private static HiveIcebergRecordWriter writer(JobConf jc) { .operationId(operationId) .build(); String tableName = jc.get(Catalogs.NAME); -HiveFileWriterFactory hfwf = new HiveFileWriterFactory(table, fileFormat, schema, -null, fileFormat, null, null, null, null); -return new HiveIcebergRecordWriter(schema, spec, fileFormat, -hfwf, outputFileFactory, io, targetFileSize, taskAttemptID, tableName); +HiveFileWriterFactory writerFactory = new HiveFileWriterFactory(table, fileFormat, schema, null, fileFormat, +null, null, null, getPositionDeleteRowSchema(schema, fileFormat)); +if (HiveIcebergStorageHandler.isDelete(jc, tableName)) { + return new HiveIcebergDeleteWriter(schema, spec, fileFormat, writerFactory, outputFileFactory, io, targetFileSize, + taskAttemptID, tableName); +} else { + return new HiveIcebergRecordWriter(schema, spec, fileFormat, writerFactory, outputFileFactory, io, targetFileSize, + taskAttemptID, tableName); +} + } + + private static Schema getPositionDeleteRowSchema(Schema schema, FileFormat fileFormat) { +// TODO: remove this Avro-specific logic once we have Avro writer function ready Review Comment: Is it implemented in the Iceberg project? Is there an existing PR or issue for it? Issue Time Tracking --- Worklog Id: (was: 753928) Time Spent: 8h 20m (was: 8h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 8h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26111) FULL JOIN returns incorrect result
[ https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Youjun Yuan updated HIVE-26111: --- Description: we hit a query which FULL JOINs two tables, hive produces incorrect results, for a single value of join key, it produces two records, each record has a valid value for one table and NULL for the other table. The query is: {code:java} SET mapreduce.job.reduces=2; SELECT d.id, u.id FROM ( SELECT id FROM airflow.tableA rud WHERE rud.dt = '2022-04-02-1row' ) d FULL JOIN ( SELECT id FROM default.tableB WHERE dt = '2022-04-01' and device_token='blabla' ) u ON u.id = d.id ; {code} According to the job log, the two reducers each get an input record, and output a record. And produces two records for id=350570497 {code:java} 350570497 NULL NULL 350570497 Time taken: 62.692 seconds, Fetched: 2 row(s) {code} I am sure tableB has only one row where device_token='blabla' And we tried: 1, SET mapreduce.job.reduces=1; then it produces right result; -2, SET hive.execution.engine=mr; then it produces right result;- mr also has the issue. 3, JOIN (instead of FULL JOIN) worked as expected 4, in sub query u, change filter device_token='blabla' to id=350570497, it worked ok 5, flatten the sub queries, then it works ok, like below: {code:java} SELECT d.id, u.id from airflow.rds_users_delta d full join default.users u on (u.id = d.id) where d.dt = '2022-04-02-1row' and u.dt = '2022-04-01' and u.device_token='blabla' {code} Below is the explain output of the query: {code:java} Plan optimized by CBO.Vertex dependency in root stage Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 3 File Output Operator [FS_10] Map Join Operator [MAPJOIN_13] (rows=2 width=8) Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"] <-Map 1 [CUSTOM_SIMPLE_EDGE] PARTITION_ONLY_SHUFFLE [RS_6] PartitionCols:_col0 Select Operator [SEL_2] (rows=1 width=4) Output:["_col0"] TableScan [TS_0] (rows=1 width=4) airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] <-Map 2 [CUSTOM_SIMPLE_EDGE] PARTITION_ONLY_SHUFFLE [RS_7] PartitionCols:_col0 Select Operator [SEL_5] (rows=1 width=4) Output:["_col0"] Filter Operator [FIL_12] (rows=1 width=110) predicate:(device_token = 'blabla') TableScan [TS_3] (rows=215192362 width=109) default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"] {code} I can't generate a small enough result set to reproduce the issue, I have minimized the tableA to only 1 row, tableB has ~200m rows, but if I further reduce the size of tableB, then the issue can't be reproduced. Any suggestion would be highly appreciated, regarding the root cause of the issue, how to work around it, or how to reproduce it with small enough dataset. below is the log found in hive.log {code:java} 220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17 : STAGE DEPENDENCIES: Stage-1 is a root stage [MAPRED] Stage-0 depends on stages: Stage-1 [FETCH]STAGE PLANS: Stage: Stage-1 Tez DagId: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1 Edges: Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE) DagName: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1 Vertices: Map 1 Map Operator Tree: TableScan alias: rud Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE GatherStats: false Select Operator expressions: id (type: int) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) null sort order: a sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE tag: 0 auto parallelism: true Path -> Alias: s3a://.../rds_users_delta/dt=2022-04-02-1row/hh=00 [rud] Path -> Partition: s3a://.../rds_users_delta/dt=2022-04-02-1row/hh=00 Partition base file name: hh=00 input format: org.apache.hadoop.mapred.TextInputFormat output format:
[jira] [Commented] (HIVE-26111) FULL JOIN returns incorrect result
[ https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518731#comment-17518731 ] Youjun Yuan commented on HIVE-26111: should a duplication of https://issues.apache.org/jira/browse/HIVE-22098, the bucketing_version issue. > FULL JOIN returns incorrect result > -- > > Key: HIVE-26111 > URL: https://issues.apache.org/jira/browse/HIVE-26111 > Project: Hive > Issue Type: Bug > Environment: aws EMR (hive 3.1.2 + Tez 0.10.1) >Reporter: Youjun Yuan >Priority: Blocker > > we hit a query which FULL JOINs two tables, hive produces incorrect results, > for a single value of join key, it produces two records, each record has a > valid value for one table and NULL for the other table. > The query is: > {code:java} > SET mapreduce.job.reduces=2; > SELECT d.id, u.id > FROM ( > SELECT id > FROM airflow.tableA rud > WHERE rud.dt = '2022-04-02-1row' > ) d > FULL JOIN ( > SELECT id > FROM default.tableB > WHERE dt = '2022-04-01' and device_token='blabla' > ) u > ON u.id = d.id > ; {code} > According to the job log, the two reducers each get an input record, and > output a record. > And produces two records for id=350570497 > {code:java} > 350570497 NULL > NULL 350570497 > Time taken: 62.692 seconds, Fetched: 2 row(s) {code} > I am sure tableB has only one row where device_token='blabla' > And we tried: > 1, SET mapreduce.job.reduces=1; then it produces right result; > -2, SET hive.execution.engine=mr; then it produces right result;- mr also has > the issue. > 3, JOIN (instead of FULL JOIN) worked as expected > 4, in sub query u, change filter device_token='blabla' to id=350570497, it > worked ok > 5, flatten the sub queries, then it works ok, like below: > {code:java} > SELECT d.id, u.id > from airflow.rds_users_delta d full join default.users u > on (u.id = d.id) > where d.dt = '2022-04-02-1row' and u.dt = '2022-04-01' and > u.device_token='blabla' {code} > Below is the explain output of the query: > {code:java} > Plan optimized by CBO.Vertex dependency in root stage > Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 3 > File Output Operator [FS_10] > Map Join Operator [MAPJOIN_13] (rows=2 width=8) > > Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"] > <-Map 1 [CUSTOM_SIMPLE_EDGE] > PARTITION_ONLY_SHUFFLE [RS_6] > PartitionCols:_col0 > Select Operator [SEL_2] (rows=1 width=4) > Output:["_col0"] > TableScan [TS_0] (rows=1 width=4) > > airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] > <-Map 2 [CUSTOM_SIMPLE_EDGE] > PARTITION_ONLY_SHUFFLE [RS_7] > PartitionCols:_col0 > Select Operator [SEL_5] (rows=1 width=4) > Output:["_col0"] > Filter Operator [FIL_12] (rows=1 width=110) > predicate:(device_token = 'blabla') > TableScan [TS_3] (rows=215192362 width=109) > > default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"] > {code} > I can't generate a small enough result set to reproduce the issue, I have > minimized the tableA to only 1 row, tableB has ~200m rows, but if I further > reduce the size of tableB, then the issue can't be reproduced. > Any suggestion would be highly appreciated, regarding the root cause of the > issue, how to work around it, or how to reproduce it with small enough > dataset. > > below is the log found in hive.log > {code:java} > 220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17 : STAGE DEPENDENCIES: > Stage-1 is a root stage [MAPRED] > Stage-0 depends on stages: Stage-1 [FETCH]STAGE PLANS: > Stage: Stage-1 > Tez > DagId: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1 > Edges: > Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE) > DagName: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: rud > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: COMPLETE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) >
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753926 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:28 Start Date: 07/Apr/22 09:28 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844919325 ## iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergV2.java: ## @@ -228,6 +230,104 @@ public void testReadAndWriteFormatV2Partitioned_PosDelete_RowSupplied() throws I Assert.assertArrayEquals(new Object[] {2L, "Trudy", "Pink"}, objects.get(3)); } + @Test + public void testDeleteStatementUnpartitioned() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); + +// create and insert an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +PartitionSpec.unpartitioned(), fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +// insert one more batch so that we have multiple data files within the same partition + shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, +TableIdentifier.of("default", "customers"), false)); + +shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or first_name='Joanna'"); + +List objects = shell.executeStatement("SELECT * FROM customers ORDER BY customer_id, last_name"); +Assert.assertEquals(6, objects.size()); +List expected = TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.add(1L, "Sharon", "Taylor") +.add(2L, "Jake", "Donnel") +.add(2L, "Susan", "Morrison") +.add(2L, "Bob", "Silver") +.add(4L, "Laci", "Zold") +.add(5L, "Peti", "Rozsaszin") +.build(); +HiveIcebergTestUtils.validateData(expected, + HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, objects), 0); + } + + @Test + public void testDeleteStatementPartitioned() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); +PartitionSpec spec = PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.identity("last_name").bucket("customer_id", 16).build(); + +// create and insert an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +// insert one more batch so that we have multiple data files within the same partition + shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, +TableIdentifier.of("default", "customers"), false)); + +shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or first_name='Joanna'"); + +List objects = shell.executeStatement("SELECT * FROM customers ORDER BY customer_id, last_name"); +Assert.assertEquals(6, objects.size()); +List expected = TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.add(1L, "Sharon", "Taylor") +.add(2L, "Jake", "Donnel") +.add(2L, "Susan", "Morrison") +.add(2L, "Bob", "Silver") +.add(4L, "Laci", "Zold") +.add(5L, "Peti", "Rozsaszin") +.build(); +HiveIcebergTestUtils.validateData(expected, + HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, objects), 0); + } + + @Test + public void testDeleteStatementWithOtherTable() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); +PartitionSpec spec = PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.identity("last_name").bucket("customer_id", 16).build(); + +// create a couple of tables, with an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +testTables.createTable(shell, "other", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, 2); + +shell.executeStatement("DELETE FROM customers WHERE customer_id in (select t1.customer_id from customers t1 join " + +"other t2 on t1.customer_id = t2.customer_id) or " + +"first_name in (select first_name from
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753923 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:19 Start Date: 07/Apr/22 09:19 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r844910675 ## standalone-metastore/metastore-server/pom.xml: ## @@ -474,23 +474,6 @@ - -generate-version-annotation -generate-sources - - - - - - - - - - - - run - - Review Comment: Also removed the script too, as it was duplicated as well Issue Time Tracking --- Worklog Id: (was: 753923) Time Spent: 1h 20m (was: 1h 10m) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) > [ERROR] at > com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) > [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) > [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) > [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) > [ERROR] at > com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) > [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) > [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) > [ERROR] at
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753922 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:18 Start Date: 07/Apr/22 09:18 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r844910181 ## standalone-metastore/pom.xml: ## @@ -531,6 +531,30 @@ + + javadoc + + + +org.apache.maven.plugins +maven-javadoc-plugin + + none + -Xdoclint:none Review Comment: Removed the unnecessary line Issue Time Tracking --- Worklog Id: (was: 753922) Time Spent: 1h 10m (was: 1h) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) > [ERROR] at > com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) > [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) > [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) > [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) > [ERROR] at > com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) > [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) > [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) > [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) > [ERROR] javadoc: error - fatal error > [ERROR] > [ERROR] Command line was: >
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753921 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:17 Start Date: 07/Apr/22 09:17 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r844909281 ## pom.xml: ## @@ -1810,6 +1810,7 @@ org.apache.maven.plugins maven-javadoc-plugin + none -Xdoclint:none Review Comment: Removed the unnecessary line Issue Time Tracking --- Worklog Id: (was: 753921) Time Spent: 1h (was: 50m) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) > [ERROR] at > com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) > [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) > [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) > [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) > [ERROR] at > com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) > [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) > [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) > [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) > [ERROR] javadoc: error - fatal error > [ERROR] > [ERROR] Command line was: > /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/../bin/javadoc > @options @packages > [ERROR] > [ERROR] Refer to the generated Javadoc files in >
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753918 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:07 Start Date: 07/Apr/22 09:07 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r844899419 ## standalone-metastore/metastore-server/pom.xml: ## @@ -474,23 +474,6 @@ - -generate-version-annotation -generate-sources - - - - - - - - - - - - run - - Review Comment: I think HIVE-20188 made the mistake to duplicate the code instead of moving it Issue Time Tracking --- Worklog Id: (was: 753918) Time Spent: 50m (was: 40m) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) > [ERROR] at > com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) > [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) > [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) > [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) > [ERROR] at > com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) > [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) > [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) > [ERROR]
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753915 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 07/Apr/22 09:03 Start Date: 07/Apr/22 09:03 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r844895657 ## pom.xml: ## @@ -1810,6 +1810,7 @@ org.apache.maven.plugins maven-javadoc-plugin + none -Xdoclint:none Review Comment: I thought this depends on the maven version, but found that this depends on the maven-javadoc-plugin version https://blog.joda.org/2014/02/turning-off-doclint-in-jdk-8-javadoc.html Issue Time Tracking --- Worklog Id: (was: 753915) Time Spent: 40m (was: 0.5h) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) > [ERROR] at > com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) > [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) > [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) > [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) > [ERROR] at > com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) > [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) > [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) > [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) > [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) > [ERROR] javadoc: error - fatal error > [ERROR] > [ERROR] Command line was: >
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753913 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 08:57 Start Date: 07/Apr/22 08:57 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844890096 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/FilesForCommit.java: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.Serializable; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.ContentFile; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DeleteFile; + +public class FilesForCommit implements Serializable { + + private final List dataFiles; Review Comment: Which ones is easier/smaller to serialize `List` or `DataFile[]`? Issue Time Tracking --- Worklog Id: (was: 753913) Time Spent: 8h (was: 7h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 8h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
[ https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=753912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753912 ] ASF GitHub Bot logged work on HIVE-26093: - Author: ASF GitHub Bot Created on: 07/Apr/22 08:57 Start Date: 07/Apr/22 08:57 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3168: URL: https://github.com/apache/hive/pull/3168#discussion_r844887792 ## standalone-metastore/metastore-server/pom.xml: ## @@ -474,23 +474,6 @@ - -generate-version-annotation -generate-sources - - - - - - - - - - - - run - - Review Comment: Do we know why was this introduced in the first place and if it is safe to remove? If I understood well this is the main point of the fix, can you confirm? ## standalone-metastore/pom.xml: ## @@ -531,6 +531,30 @@ + + javadoc + + + +org.apache.maven.plugins +maven-javadoc-plugin + + none + -Xdoclint:none Review Comment: Do we need both? ## pom.xml: ## @@ -1810,6 +1810,7 @@ org.apache.maven.plugins maven-javadoc-plugin + none -Xdoclint:none Review Comment: Is this change mandatory for building javadocs? Aren't these two lines somewhat equivalent? Why do we need both? Issue Time Tracking --- Worklog Id: (was: 753912) Time Spent: 0.5h (was: 20m) > Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java > - > > Key: HIVE-26093 > URL: https://issues.apache.org/jira/browse/HIVE-26093 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently we define > org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 > places: > - > ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > - > ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java > This causes javadoc generation to fail with: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) > on project hive: An error has occurred in Javadoc report generation: > [ERROR] Exit code: 1 - > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8: > warning: a package-info.java file has already been seen for package > org.apache.hadoop.hive.metastore.annotation > [ERROR] package org.apache.hadoop.hive.metastore.annotation; > [ERROR] ^ > [ERROR] javadoc: warning - Multiple sources of package comments found for > package "org.apache.hive.streaming" > [ERROR] > /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556: > error: type MapSerializer does not take parameters > [ERROR] com.esotericsoftware.kryo.serializers.MapSerializer { > [ERROR] ^ > [ERROR] > /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4: > error: package org.apache.hadoop.hive.metastore.annotation has already been > annotated > [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", > shortVersion="4.0.0-alpha-1", > [ERROR] ^ > [ERROR] java.lang.AssertionError > [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) > [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) > [ERROR] at > com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177) > [ERROR] at > com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) > [ERROR] at > com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) > [ERROR] at
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753910 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 08:55 Start Date: 07/Apr/22 08:55 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844888405 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.IOException; +import java.util.List; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.TaskAttemptID; +import org.apache.iceberg.DeleteFile; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.io.ClusteredPositionDeleteWriter; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.FileWriterFactory; +import org.apache.iceberg.io.OutputFileFactory; +import org.apache.iceberg.mr.mapred.Container; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HiveIcebergDeleteWriter extends HiveIcebergWriter { + private static final Logger LOG = LoggerFactory.getLogger(HiveIcebergDeleteWriter.class); + + private final ClusteredPositionDeleteWriter deleteWriter; Review Comment: Do we have a common ancestor for ClusteredPositionDeleteWriter and ClusteredDataWriter, which we could use? Issue Time Tracking --- Worklog Id: (was: 753910) Time Spent: 7h 50m (was: 7h 40m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 7h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753909 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 07/Apr/22 08:53 Start Date: 07/Apr/22 08:53 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844885993 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/FilesForCommit.java: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.Serializable; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.ContentFile; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DeleteFile; + +public class FilesForCommit implements Serializable { + + private final List dataFiles; + private final List deleteFiles; + + public FilesForCommit(List dataFiles, List deleteFiles) { +this.dataFiles = dataFiles; +this.deleteFiles = deleteFiles; + } + + public static FilesForCommit onlyDelete(List deleteFiles) { +return new FilesForCommit(Collections.emptyList(), deleteFiles); + } + + public static FilesForCommit onlyData(List dataFiles) { +return new FilesForCommit(dataFiles, Collections.emptyList()); + } + + public static FilesForCommit empty() { +return new FilesForCommit(Collections.emptyList(), Collections.emptyList()); + } + + public List getDataFiles() { Review Comment: In the Iceberg related code we usually try to avoid `get`. We might want to use `dataFiles()` instead Issue Time Tracking --- Worklog Id: (was: 753909) Time Spent: 7h 40m (was: 7.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 7h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly
[ https://issues.apache.org/jira/browse/HIVE-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518682#comment-17518682 ] Naveen Gangam commented on HIVE-26118: -- Fix has been merged to master. Thank you for the review [~dengzh] > [Standalone Beeline] Jar name mismatch between build and assembly > - > > Key: HIVE-26118 > URL: https://issues.apache.org/jira/browse/HIVE-26118 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Affects Versions: 3.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Fix from HIVE-25750 has an issue where the beeline builds a jar named > "jar-with-dependencies.jar" but the assembly looks for a jar name > "original-jar-with-dependencies.jar". Thus this uber jar never gets included > in the distribution. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly
[ https://issues.apache.org/jira/browse/HIVE-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-26118. -- Fix Version/s: 4.0.0 Resolution: Fixed > [Standalone Beeline] Jar name mismatch between build and assembly > - > > Key: HIVE-26118 > URL: https://issues.apache.org/jira/browse/HIVE-26118 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Affects Versions: 3.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Fix from HIVE-25750 has an issue where the beeline builds a jar named > "jar-with-dependencies.jar" but the assembly looks for a jar name > "original-jar-with-dependencies.jar". Thus this uber jar never gets included > in the distribution. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly
[ https://issues.apache.org/jira/browse/HIVE-26118?focusedWorklogId=753900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753900 ] ASF GitHub Bot logged work on HIVE-26118: - Author: ASF GitHub Bot Created on: 07/Apr/22 08:19 Start Date: 07/Apr/22 08:19 Worklog Time Spent: 10m Work Description: nrg4878 commented on PR #3180: URL: https://github.com/apache/hive/pull/3180#issuecomment-1091280878 The 2 test failures seem random as these 2 have passed in the prior run where there was a different test failure. I do not see a connection between the failures and the fix. Issue Time Tracking --- Worklog Id: (was: 753900) Time Spent: 40m (was: 0.5h) > [Standalone Beeline] Jar name mismatch between build and assembly > - > > Key: HIVE-26118 > URL: https://issues.apache.org/jira/browse/HIVE-26118 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Affects Versions: 3.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Fix from HIVE-25750 has an issue where the beeline builds a jar named > "jar-with-dependencies.jar" but the assembly looks for a jar name > "original-jar-with-dependencies.jar". Thus this uber jar never gets included > in the distribution. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly
[ https://issues.apache.org/jira/browse/HIVE-26118?focusedWorklogId=753899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753899 ] ASF GitHub Bot logged work on HIVE-26118: - Author: ASF GitHub Bot Created on: 07/Apr/22 08:18 Start Date: 07/Apr/22 08:18 Worklog Time Spent: 10m Work Description: nrg4878 merged PR #3180: URL: https://github.com/apache/hive/pull/3180 Issue Time Tracking --- Worklog Id: (was: 753899) Time Spent: 0.5h (was: 20m) > [Standalone Beeline] Jar name mismatch between build and assembly > - > > Key: HIVE-26118 > URL: https://issues.apache.org/jira/browse/HIVE-26118 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Affects Versions: 3.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Fix from HIVE-25750 has an issue where the beeline builds a jar named > "jar-with-dependencies.jar" but the assembly looks for a jar name > "original-jar-with-dependencies.jar". Thus this uber jar never gets included > in the distribution. -- This message was sent by Atlassian Jira (v8.20.1#820001)