[jira] [Created] (HIVE-26156) Iceberg delete writer should handle deleting from old partition specs
Marton Bod created HIVE-26156: - Summary: Iceberg delete writer should handle deleting from old partition specs Key: HIVE-26156 URL: https://issues.apache.org/jira/browse/HIVE-26156 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod While {{HiveIcebergRecordWriter}} always writes data out according to the latest spec, the {{HiveIcebergDeleteWriter}} might have to write delete files into partitions that correspond to a variety of specs, both old and new. Therefore we should pass the {{{}table.specs(){}}}map into the {{HiveIcebergWriter}} so that the delete writer can choose the appropriate spec on a per-record basis. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26151) Support range-based time travel queries for Iceberg
Marton Bod created HIVE-26151: - Summary: Support range-based time travel queries for Iceberg Key: HIVE-26151 URL: https://issues.apache.org/jira/browse/HIVE-26151 Project: Hive Issue Type: New Feature Reporter: Marton Bod Assignee: Marton Bod Allow querying which records have been inserted during a certain time window for Iceberg tables. The Iceberg TableScan API provides an implementation for that, so most of the work would go into adding syntax support and transporting the startTime and endTime parameters to the Iceberg input format. Proposed new syntax: SELECT * FROM table FOR SYSTEM_TIME FROM '' TO '' SELECT * FROM table FOR SYSTEM_VERSION FROM TO (the TO clause is optional in both cases) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26102) Implement DELETE statements for Iceberg tables
Marton Bod created HIVE-26102: - Summary: Implement DELETE statements for Iceberg tables Key: HIVE-26102 URL: https://issues.apache.org/jira/browse/HIVE-26102 Project: Hive Issue Type: New Feature Reporter: Marton Bod Assignee: Marton Bod -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [VOTE] Apache Hive 4.0.0-alpha-1 Release Candidate 1
+1 (non-binding) Tested the checksums, signatures and built it successfully On Tue, Mar 22, 2022 at 2:26 PM Peter Vary wrote: > Hi Team, > > Apache Hive 4.0.0-alpha-1 Release Candidate 1 is available here: > > https://people.apache.org/~pvary/apache-hive-4.0.0-alpha-1-rc1/ > > Maven artifacts are available here: > > https://repository.apache.org/content/repositories/orgapachehive-/ > > The tag 4.0.0-alpha-1-rc1 has been applied to the source for this release > in github, you can see it at > https://github.com/apache/hive/tree/release-4.0.0-alpha-1-rc1 > > Voting will conclude in 72 hours. > > All interested parties: Please test. > Hive PMC Members: Please test and vote. > > Thanks.
[jira] [Created] (HIVE-26004) Upgrade Iceberg to 0.13.1
Marton Bod created HIVE-26004: - Summary: Upgrade Iceberg to 0.13.1 Key: HIVE-26004 URL: https://issues.apache.org/jira/browse/HIVE-26004 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases
Marton Bod created HIVE-25891: - Summary: Improve Iceberg error message for unsupported vectorization cases Key: HIVE-25891 URL: https://issues.apache.org/jira/browse/HIVE-25891 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently, if you attempt to read a Parquet or Avro Iceberg table with vectorization turned on, you will eventually get an error message since it's not supported. However, this error message is very misleading and does not explain clearly what the problem is and how to work around it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25890) Fix truncate problem with Iceberg CTAS tables
Marton Bod created HIVE-25890: - Summary: Fix truncate problem with Iceberg CTAS tables Key: HIVE-25890 URL: https://issues.apache.org/jira/browse/HIVE-25890 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod Currently Iceberg CTAS tables cannot be truncated in a subsequent operation. This is because we populate the table properties differently on the CTAS codepath, and the external.table.purge=true is not populated in this case. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25849) Disable insert overwrite for bucket partitioned Iceberg tables
Marton Bod created HIVE-25849: - Summary: Disable insert overwrite for bucket partitioned Iceberg tables Key: HIVE-25849 URL: https://issues.apache.org/jira/browse/HIVE-25849 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Insert overwrite should be disabled where the target Iceberg table is a bucket partitioned table, since which existing partitions will be overwritten is very hard to predict from a user's POV, as it depends on the bucket hash values calculated for the new dataset's rows. It's better to be on the safe side and disable this operation to avoid unwanted data loss. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization
Marton Bod created HIVE-25843: - Summary: Add flag to disable Iceberg FileIO config serialization Key: HIVE-25843 URL: https://issues.apache.org/jira/browse/HIVE-25843 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Hive serializes the Iceberg table object into each individual split. Since the FileIO is part of the Iceberg table and it has its own hadoop configuration, this configuration will be the dominant factor determining the size of the serialized split. In our tests we have found that due to this serialized config, iceberg splits are 15-20x larger than normal Hive splits (which led to OOM in some of our perf tests). This PR proposes to introduce a config which can turn off this config serialization, and let the deserializer-side fill out the config values instead (which works for Hive executors, since they have all the config values in hand). This can reduce the Iceberg split size by ~20x based on local tests. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25815) Add flag to skip maven validate phase for Iceberg modules
Marton Bod created HIVE-25815: - Summary: Add flag to skip maven validate phase for Iceberg modules Key: HIVE-25815 URL: https://issues.apache.org/jira/browse/HIVE-25815 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod The Iceberg checkstyle and spotless plugins which run during the validate phase are quite strict in terms of enforcing proper code style. This is great when checking in code to master, but it can be inconvenient when doing quick dev iterations locally. This PR introduces a maven {{validate.skip}} flag to be able to skip checkstyle and spotless checks on demand for the Iceberg modules. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25788) Iceberg CTAS should honor location clause and have correct table properties
Marton Bod created HIVE-25788: - Summary: Iceberg CTAS should honor location clause and have correct table properties Key: HIVE-25788 URL: https://issues.apache.org/jira/browse/HIVE-25788 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod Currently Iceberg CTAS does not take the LOCATION clause into consideration. Also, these tables end up with some unintended table properties coming from the SerDe, such as partition.columns or partition.columns.comments, etc. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25754) Fix column projection for union all queries with multiple aliases
Marton Bod created HIVE-25754: - Summary: Fix column projection for union all queries with multiple aliases Key: HIVE-25754 URL: https://issues.apache.org/jira/browse/HIVE-25754 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod Given two tables: {{create table source1 (dt string, d1 int, d2 int) stored as orc; create table source2 (dt string, d1 int, d2 int) stored as orc; insert into source1 values ('20211107', 1, 2); insert into source2 values ('20211108', 11, 22);}} If you run this query with UNION ALL, the {{key}} column will be missing from the output: {{select explode(map('D219', D219 ,'D220', D220)) as (key, value) from (}} {{select '20211107' as date_key ,1 as D219 ,2 as D220 ) t}} {{union all}} {{select explode(map('D221', D221 ,'D222', D222)) as (key, value) from (}} {{select '20211107' as date_key ,1 as D221 ,2 as D222 ) t}} Result: {{1}} {{2}} {{11}} {{22}} Correct result should be: {{D219 1}} {{D220 2}} {{D221 11}} {{D222 22}} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer
Marton Bod created HIVE-25741: - Summary: HiveProtoLoggingHook EventLogger should always close old writer Key: HIVE-25741 URL: https://issues.apache.org/jira/browse/HIVE-25741 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), the Hive proto {{EventLogger}} will create a new file for each proto event. However, if we already had an appropriate writer (i.e. maybeRolloverWriterForDay() returns false) from some previous operation - we don't close the previous writer instance before creating a new one. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater
Marton Bod created HIVE-25740: - Summary: Handle race condition between compaction txn abort/commit and heartbeater Key: HIVE-25740 URL: https://issues.apache.org/jira/browse/HIVE-25740 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod This issue is the following: once the compaction worker finishes, commitTxn/abortTxn is invoked first, and the heartbeater thread is only interrupted after that. This can lead to race conditions where the txn has already been deleted from the backend DB via commit/abort, but the concurrently running heartbeater thread still attempts to send a last heartbeat after that, but the txn id won't be found in the DB, leading to {{{}NoSuchTxnException{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams
Marton Bod created HIVE-25727: - Summary: Iceberg hive catalog should create table object with initialised SerdeParams Key: HIVE-25727 URL: https://issues.apache.org/jira/browse/HIVE-25727 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently we leave the serdeInfo.parameters as null when we create the table object to be persisted during commit time in Iceberg hive catalog. We should init the params with an empty map to avoid any NPE possibilities. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25690) Fix column reorder detection for Iceberg schema evolution
Marton Bod created HIVE-25690: - Summary: Fix column reorder detection for Iceberg schema evolution Key: HIVE-25690 URL: https://issues.apache.org/jira/browse/HIVE-25690 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod Current algorithm for detecting schema differences between HMS and Iceberg schema is broken when it comes to column reorders. This patch should fix that up and add more extensive testing. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests
Marton Bod created HIVE-25658: - Summary: Fix regex for masking totalSize table properties in Iceberg q-tests Key: HIVE-25658 URL: https://issues.apache.org/jira/browse/HIVE-25658 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod HIVE-25607 introduced a text replace regex for masking out the totalSize table property values in Iceberg q.out files. The regex however did not cover all of the props in the q.out files, so here is the fix for the regex. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables
Marton Bod created HIVE-25643: - Summary: Disable replace cols and change col commands for migrated Iceberg tables Key: HIVE-25643 URL: https://issues.apache.org/jira/browse/HIVE-25643 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Since the Iceberg table migration will intentionally not rewrite the data files, the migrated table will end up with data files that do not contain the Iceberg field IDs necessary for safe, reliable schema migration. For this purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands for these migrated Iceberg tables. ADD COLUMNS are still permitted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25622) Change storage handler authz URI API to use the HMS table object
Marton Bod created HIVE-25622: - Summary: Change storage handler authz URI API to use the HMS table object Key: HIVE-25622 URL: https://issues.apache.org/jira/browse/HIVE-25622 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod * Moving the {{getURIForAuth}} method into {{HiveStorageHandler}} with a default implementation * Changing its signature to accept the HMS table object instead, as it provides implementations with more flexibility around constructing the URIs * Deleting the StorageAuthorizationHandler interface * Cleaning up code parts where {{getURIForAuth}} is invoked -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25607) Mask totalSize table property in Iceberg q-tests
Marton Bod created HIVE-25607: - Summary: Mask totalSize table property in Iceberg q-tests Key: HIVE-25607 URL: https://issues.apache.org/jira/browse/HIVE-25607 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod The totalSize tbl prop can change whenever the file format version changes, therefore potentially causing the q tests to be flaky when issuing describe formatted commands. We should mask this and not test against the exact value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25604) Iceberg should implement the authorization storage handler
Marton Bod created HIVE-25604: - Summary: Iceberg should implement the authorization storage handler Key: HIVE-25604 URL: https://issues.apache.org/jira/browse/HIVE-25604 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Iceberg's StorageHandler should implement the HiveStorageAuthorizationHandler interface for authorization purposes. We'll use the iceberg table root location as the basis for permission handling. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25587) Disable Iceberg table migration for unsupported source file formats
Marton Bod created HIVE-25587: - Summary: Disable Iceberg table migration for unsupported source file formats Key: HIVE-25587 URL: https://issues.apache.org/jira/browse/HIVE-25587 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently, we only support migrating ORC, Parquet and Avro tables to Iceberg. However, there is no check in the code to fail early for other formats (e.g. text, json, rcfile), which can lead to wasted effort at best, and leaving the source table unusable at worst. Therefore, we should check the source format early and shortcircuit for unsupported types. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25581) Iceberg storage handler should set common projection pruning config
Marton Bod created HIVE-25581: - Summary: Iceberg storage handler should set common projection pruning config Key: HIVE-25581 URL: https://issues.apache.org/jira/browse/HIVE-25581 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently the value for the config "tez.mrreader.config.update.properties" is not set for Iceberg jobs, when in fact it needs to be part of the jobConf for all Iceberg queries. This change should ensure it's set by the Iceberg storage handler by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25529) Add tests for reading/writing Iceberg V2 tables with delete files
Marton Bod created HIVE-25529: - Summary: Add tests for reading/writing Iceberg V2 tables with delete files Key: HIVE-25529 URL: https://issues.apache.org/jira/browse/HIVE-25529 Project: Hive Issue Type: Task Reporter: Marton Bod Assignee: Marton Bod Since Iceberg V2 tables are now official, we can start testing out whether V2 tables can be created/read/written by Hive. While Hive has no delete statement yet on Iceberg tables, we can nonetheless use the Iceberg API to create delete files manually and then check if Hive honors those deletes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25486) Upgrade to I
Marton Bod created HIVE-25486: - Summary: Upgrade to I Key: HIVE-25486 URL: https://issues.apache.org/jira/browse/HIVE-25486 Project: Hive Issue Type: Improvement Reporter: Marton Bod -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25354) Handle unsupported queries for Iceberg tables
Marton Bod created HIVE-25354: - Summary: Handle unsupported queries for Iceberg tables Key: HIVE-25354 URL: https://issues.apache.org/jira/browse/HIVE-25354 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod In Iceberg, there will be several Hive commands that will be unsupported either temporarily or else. For example, right now all commands containing the PARTITION keyword would fail the Semantic Analysis given that the HMS table is always unpartitioned for Iceberg. The resulting error message for the user would be confusing. We should provide a common, unified error handling for these queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables
Marton Bod created HIVE-25328: - Summary: Limit scope of REPLACE COLUMNS for Iceberg tables Key: HIVE-25328 URL: https://issues.apache.org/jira/browse/HIVE-25328 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Replace columns is a rather wildcard operation which can do heavy-weight schema changes. We would only want to allow this operation for dropping columns for Iceberg tables. For other changes (adding cols, renaming, type promotion etc.), we should use the CHANGE COLUMN command. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25308) Use new Tez API to get JobID for Iceberg commits
Marton Bod created HIVE-25308: - Summary: Use new Tez API to get JobID for Iceberg commits Key: HIVE-25308 URL: https://issues.apache.org/jira/browse/HIVE-25308 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod When committing Iceberg writes, currently we only have the JobID without the vertexID, therefore we have to list the folder {{/temp}} first, and parse out the full JobIDs (incl. vertexID) from the resulting folder names. With Tez 0.10.1 released, now we have a new API we can call to acquire the full JobID, making the file listing unnecessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25256) Support ALTER TABLE CHANGE COLUMN for Iceberg
Marton Bod created HIVE-25256: - Summary: Support ALTER TABLE CHANGE COLUMN for Iceberg Key: HIVE-25256 URL: https://issues.apache.org/jira/browse/HIVE-25256 Project: Hive Issue Type: New Feature Reporter: Marton Bod Assignee: Marton Bod In order to provide support for renaming/changing the data type of a single column, we should add alter table change column support for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25255) Support ALTER TABLE REPLACE COLUMNS for Iceberg
Marton Bod created HIVE-25255: - Summary: Support ALTER TABLE REPLACE COLUMNS for Iceberg Key: HIVE-25255 URL: https://issues.apache.org/jira/browse/HIVE-25255 Project: Hive Issue Type: New Feature Reporter: Marton Bod Assignee: Marton Bod -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25222) Fix reading Iceberg tables with a comma in column names
Marton Bod created HIVE-25222: - Summary: Fix reading Iceberg tables with a comma in column names Key: HIVE-25222 URL: https://issues.apache.org/jira/browse/HIVE-25222 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod When using a table with a column name containing a comma (e.g. `employ,ee`), reading an Iceberg table fails because we rely on the property "hive.io.file.readcolumn.names" which encodes the read columns in a comma-separated list, put together by the ColumnProjectionUtils class. Because it's comma-separated in all cases, it will produce a string like: "id,birth_date,employ,ee" which can cause problems for Iceberg readers which use this string list to construct their expected read schema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState
Marton Bod created HIVE-25195: - Summary: Store Iceberg write commit and ctas information in QueryState Key: HIVE-25195 URL: https://issues.apache.org/jira/browse/HIVE-25195 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod We should replace the current method of passing Iceberg write commit-related information (jobID, task num) and CTAS info via the session conf using prefixed keys. We have a new way of doing that more cleanly, using the QueryState object. This should make the code easier to maintain and guard against accidental session conf pollution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25161) Implement CTAS for partitioned Iceberg tables
Marton Bod created HIVE-25161: - Summary: Implement CTAS for partitioned Iceberg tables Key: HIVE-25161 URL: https://issues.apache.org/jira/browse/HIVE-25161 Project: Hive Issue Type: New Feature Reporter: Marton Bod Assignee: Marton Bod -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25106) Do not exclude avatica and protobuf for Iceberg
Marton Bod created HIVE-25106: - Summary: Do not exclude avatica and protobuf for Iceberg Key: HIVE-25106 URL: https://issues.apache.org/jira/browse/HIVE-25106 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod When running tests from the IDE, the current dependency exclusions in the hive-iceberg pom can result in: {code:java} Caused by: java.lang.ClassNotFoundException: com.google.protobuf.GeneratedMessageV3 at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 62 more {code} and {code:java} Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.calcite.avatica.ConnectionPropertiesImpl at org.apache.calcite.avatica.MetaImpl.(MetaImpl.java:72) at org.apache.calcite.jdbc.CalciteMetaImpl.(CalciteMetaImpl.java:85) at org.apache.calcite.jdbc.Driver.createMeta(Driver.java:169) at org.apache.calcite.avatica.AvaticaConnection.(AvaticaConnection.java:121){code} {{}} when running {{testCBOWithSelectedColumnsOverlapJoin}} and {{testCBOWithSelectedColumnsNonOverlapJoin}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits
Marton Bod created HIVE-25076: - Summary: Get number of write tasks from jobConf for Iceberg commits Key: HIVE-25076 URL: https://issues.apache.org/jira/browse/HIVE-25076 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod When writing empty data into Iceberg tables, we can end up with 0 succeeded task count number. With the current logic, we might then erroneously end up taking the number of mapper tasks in the commit logic, which would result in failures. We should instead save the number of succeeded task count into the JobConf under a specified key and retrieve it from there. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25052) Writing to Iceberg tables can fail when inserting empty result set
Marton Bod created HIVE-25052: - Summary: Writing to Iceberg tables can fail when inserting empty result set Key: HIVE-25052 URL: https://issues.apache.org/jira/browse/HIVE-25052 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25038) Increase Iceberg test timeout and remove mr tests
Marton Bod created HIVE-25038: - Summary: Increase Iceberg test timeout and remove mr tests Key: HIVE-25038 URL: https://issues.apache.org/jira/browse/HIVE-25038 Project: Hive Issue Type: Task Reporter: Marton Bod Assignee: Marton Bod -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25034) Implement CTAS for Iceberg
Marton Bod created HIVE-25034: - Summary: Implement CTAS for Iceberg Key: HIVE-25034 URL: https://issues.apache.org/jira/browse/HIVE-25034 Project: Hive Issue Type: New Feature Reporter: Marton Bod Assignee: Marton Bod -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25007) Implement insert overwrite for Iceberg tables
Marton Bod created HIVE-25007: - Summary: Implement insert overwrite for Iceberg tables Key: HIVE-25007 URL: https://issues.apache.org/jira/browse/HIVE-25007 Project: Hive Issue Type: New Feature Reporter: Marton Bod Assignee: Marton Bod -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM
Marton Bod created HIVE-25006: - Summary: Commit Iceberg writes in HiveMetaHook instead of TezAM Key: HIVE-25006 URL: https://issues.apache.org/jira/browse/HIVE-25006 Project: Hive Issue Type: Task Reporter: Marton Bod Assignee: Marton Bod Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. This will enable us to implement insert overwrites for iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24958) Create Iceberg catalog module in Hive
Marton Bod created HIVE-24958: - Summary: Create Iceberg catalog module in Hive Key: HIVE-24958 URL: https://issues.apache.org/jira/browse/HIVE-24958 Project: Hive Issue Type: Task Reporter: Marton Bod Assignee: Marton Bod * Create a new iceberg-catalog module in Hive, with the code currently contained in Iceberg's iceberg-hive-metastore module * Make sure all tests pass (including static analysis and checkstyle) * Make iceberg-handler depend on this module instead of iceberg-hive-metastore -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24947) Casting exception when reading vectorized parquet file for insert into
Marton Bod created HIVE-24947: - Summary: Casting exception when reading vectorized parquet file for insert into Key: HIVE-24947 URL: https://issues.apache.org/jira/browse/HIVE-24947 Project: Hive Issue Type: Bug Reporter: Marton Bod We have two parquet tables (target and source). Upon running the query: {code:java} set hive.vectorized.execution.enabled=true; insert into target2 partition(tlmtc_fl_gnrtd_yr_nb, tlmtc_fl_gnrtd_mnth_nb) select * from source;{code} The following exception is thrown: {code:java} Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to [B at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.fillColumnVector(VectorizedListColumnReader.java:308) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:342) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:91) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:433) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:376) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:99) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) ... 24 more {code} The same runs without problems when vectorization is turned off. Attaching the show create table statements and the sample parquet file so it can reproduced. cc [~nareshpr] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24867) Create iceberg-handler module in Hive
Marton Bod created HIVE-24867: - Summary: Create iceberg-handler module in Hive Key: HIVE-24867 URL: https://issues.apache.org/jira/browse/HIVE-24867 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod * Create a new iceberg-handler module in Hive * Copy the code from the Iceberg/iceberg-mr module into this new Hive module * Make necessary changes so it compiles with Hive 4.0.0 dependencies (iceberg-mr code was based on Hive 3.1) * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24857) Trigger Tez output commit after close operation
Marton Bod created HIVE-24857: - Summary: Trigger Tez output commit after close operation Key: HIVE-24857 URL: https://issues.apache.org/jira/browse/HIVE-24857 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently Tez triggers the OutputCommitter.commit() operation between the proc.run() and proc.close() operations in TezProcessor. However, when writing out data, calling the proc.close() operation may still produce some extra records, which would be missed by the output committer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24665) Add commitAlterTable method to the HiveMetaHook interface
Marton Bod created HIVE-24665: - Summary: Add commitAlterTable method to the HiveMetaHook interface Key: HIVE-24665 URL: https://issues.apache.org/jira/browse/HIVE-24665 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently we have pre and post hooks for create table and drop table commands, but only a pre hook for alter table commands. We should add a post hook as well (with a default implementation). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24629) Invoke optional output committer in TezProcessor
Marton Bod created HIVE-24629: - Summary: Invoke optional output committer in TezProcessor Key: HIVE-24629 URL: https://issues.apache.org/jira/browse/HIVE-24629 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod In order to enable Hive to write to Iceberg tables, we need to use an output committer which will fire at the end of each Tez task execution (commitTask) and the after the execution of each vertex (commitOutput/commitJob). This output committer will issue a commit containing the written-out data files to the Iceberg table, replacing its previous snapshot pointer with a new one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24541) Add config to set a default storage handler class
Marton Bod created HIVE-24541: - Summary: Add config to set a default storage handler class Key: HIVE-24541 URL: https://issues.apache.org/jira/browse/HIVE-24541 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Add a config param "hive.default.storage.handler.class" so we can set a default storage handler class that can be used for all create table statements. By default it would be an empty string, taking no effect. This would allow existing user queries to be reused for a new table format for example, such as Iceberg. For example, after setting in the config: hive.default.storage.handler.class= org.apache.iceberg.mr.hive.HiveIcebergStorageHandler The query: CREATE TABLE abc (a int, b string) LOCATION ... would be equivalent to: CREATE TABLE abc (a int, b string) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION ... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24540) Add test hive shell for simpler execution tests and debugging
Marton Bod created HIVE-24540: - Summary: Add test hive shell for simpler execution tests and debugging Key: HIVE-24540 URL: https://issues.apache.org/jira/browse/HIVE-24540 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod In the Apache Iceberg project, we've been using a TestHiveShell/TestHiveMetastore class for running query execution unit tests, which made our life much easier both in terms of writing test and debugging the code from an IDE. It would have value bringing it to the Apache Hive project as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24498) Package facebook thrift classes into hive-exec jar
Marton Bod created HIVE-24498: - Summary: Package facebook thrift classes into hive-exec jar Key: HIVE-24498 URL: https://issues.apache.org/jira/browse/HIVE-24498 Project: Hive Issue Type: Task Reporter: Marton Bod Assignee: Marton Bod When using Iceberg tables, the Tez AM needs to communicate with the HMS to fetch table information in the SerDe and InputFormat operations. The libfb303 classes are currently missing from the hive-exec jar leading to Tez missing these classes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24133) Hive query with Hbase storagehandler can give back incorrect results when predicate contains null check
Marton Bod created HIVE-24133: - Summary: Hive query with Hbase storagehandler can give back incorrect results when predicate contains null check Key: HIVE-24133 URL: https://issues.apache.org/jira/browse/HIVE-24133 Project: Hive Issue Type: Bug Reporter: Marton Bod It has been observed that when using Hbase storage handler and the table contains null values, Hive can give back wrong query results, depending on what columns we select for and whether the where clause predicate contains any null checks. For example: create 'default:hive_test', 'cf' put 'default:hive_test', '1', 'cf:col1', 'val1' put 'default:hive_test', '1', 'cf:col2', 'val2' put 'default:hive_test', '2', 'cf:col1', 'val1_2' put 'default:hive_test', '2', 'cf:col2', 'val2_2' put 'default:hive_test', '3', 'cf:col1', 'val1_3' put 'default:hive_test', '3', 'cf:col2', 'val2_3' put 'default:hive_test', '3', 'cf:col3', 'val3_3' put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2" put 'default:hive_test', '4', 'cf:col1', 'val1_4' put 'default:hive_test', '4', 'cf:col2', 'val2_4' scan 'default:hive_test' = HIVE CREATE EXTERNAL TABLE hbase_hive_test ( rowkey string, col1 string, col2 string, col3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3" ) TBLPROPERTIES("hbase.table.name" = "default:hive_test"); query: select * from hbase_hive_test where col3 is null; result: Total MapReduce CPU Time Spent: 10 seconds 980 msec OK 1 val1 val2 NULL 2 val1_2 val2_2 NULL 4 val1_4 val2_4 NULL query: select rowkey from hbase_hive_test where col3 is null; This does not produce any records. However, select rowkey, col2 from hbase_hive_test where col3 is null; This gives back the correct results again. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23759) Refactor CommitTxnRequest field order
Marton Bod created HIVE-23759: - Summary: Refactor CommitTxnRequest field order Key: HIVE-23759 URL: https://issues.apache.org/jira/browse/HIVE-23759 Project: Hive Issue Type: Task Reporter: Marton Bod Assignee: Marton Bod Refactor CommitTxnRequest field order (keyValue and replLastIdInfo). This should be a safe change as neither of these fields have been part of any official Hive release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23625) HS2 Web UI displays query drill-down results in plain text, not html
Marton Bod created HIVE-23625: - Summary: HS2 Web UI displays query drill-down results in plain text, not html Key: HIVE-23625 URL: https://issues.apache.org/jira/browse/HIVE-23625 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod Opening a drilldown link on the HS2 Web UI, you are directed to the following URL: /query_page?operationId= Since the path /query_page contains no file extensions, Jetty cannot determine the mimetype and therefore the Hive HttpServer returns response header Content-Type: text/plain;charset=utf-8, and the information does not render as html in the browser. This should be corrected to return text/html;charset=utf-8. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds
> On May 14, 2020, 3:07 p.m., Denys Kuzmenko wrote: > > LGTM, some minor comments Thanks Denys, I've address your comments - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72481/#review220758 --- On May 14, 2020, 3:38 p.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72481/ > --- > > (Updated May 14, 2020, 3:38 p.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > Removed global mutex on writeId allocation, which means write ids can now be > allocated concurrently for different tables without blocking each other, > speeding up execution (perf test results below). Concurrent > allocateTableWriteIds() operations targeting the same table are still mutexed > by an S4U if the table is already present in next_write_id, otherwise a race > condition to insert the table into next_write_id is solved by retrying after > catching the duplicate key exception (the thread which commits later will be > the one to retry). > > The situation is similar when allocateTableWriteIds() and > replTableWriteIdState() are running concurrently - if they target different > tables, they won't block each other anymore. If they target the same table, > and the table is already inserted into next_write_id, replTableWriteIdState() > returns early and allocateTableWriteIds() updates the next id. If the table > is not yet in next_write_id, they might attempt to insert the same row > concurrently, in which case who commits later will get a duplicate key > exception and retry the operation, just as above. > > > Diffs > - > > ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java > 868da0c7a0 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java > d59f863b11 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > cf41ef8aaf > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java > 1e177f4a7b > > > Diff: https://reviews.apache.org/r/72481/diff/2/ > > > Testing > --- > > Unit test in TestTxnHandler > + Perf tests: > dbTypesameTable variant ms/op error > MYSQL FALSE original 46.93 3.041 > MYSQL FALSE patched 19.283 1.311 > MYSQL TRUE original 50.185 3.595 > MYSQL TRUE patched 32.254 2.164 > ORACLEFALSE original 57.609 4.461 > ORACLEFALSE patched 25.721 2.551 > ORACLETRUE original 59.668 3.172 > ORACLETRUE patched 39.061 2.548 > POSTGRES FALSE original 39.364 2.94 > POSTGRES FALSE patched 18.518 1.038 > POSTGRES TRUE original 39.868 2.679 > POSTGRES TRUE patched 28.874 1.768 > SQLSERVER FALSE original 45.252 1.643 > SQLSERVER FALSE patched 24.583 1.529 > SQLSERVER TRUE original 49.149 3.45 > SQLSERVER TRUE patched 32.918 1.654 > (sameTable=true means that all threads were trying to allocate ids for the > same db.table, > false means they all targeted different tables) > > > Thanks, > > Marton Bod > >
Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72481/ --- (Updated May 14, 2020, 3:38 p.m.) Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- Removed global mutex on writeId allocation, which means write ids can now be allocated concurrently for different tables without blocking each other, speeding up execution (perf test results below). Concurrent allocateTableWriteIds() operations targeting the same table are still mutexed by an S4U if the table is already present in next_write_id, otherwise a race condition to insert the table into next_write_id is solved by retrying after catching the duplicate key exception (the thread which commits later will be the one to retry). The situation is similar when allocateTableWriteIds() and replTableWriteIdState() are running concurrently - if they target different tables, they won't block each other anymore. If they target the same table, and the table is already inserted into next_write_id, replTableWriteIdState() returns early and allocateTableWriteIds() updates the next id. If the table is not yet in next_write_id, they might attempt to insert the same row concurrently, in which case who commits later will get a duplicate key exception and retry the operation, just as above. Diffs (updated) - ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 868da0c7a0 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java d59f863b11 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java cf41ef8aaf standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java 1e177f4a7b Diff: https://reviews.apache.org/r/72481/diff/2/ Changes: https://reviews.apache.org/r/72481/diff/1-2/ Testing --- Unit test in TestTxnHandler + Perf tests: dbTypesameTable variant ms/op error MYSQL FALSE original 46.93 3.041 MYSQL FALSE patched 19.283 1.311 MYSQL TRUE original 50.185 3.595 MYSQL TRUE patched 32.254 2.164 ORACLEFALSE original 57.609 4.461 ORACLEFALSE patched 25.721 2.551 ORACLETRUE original 59.668 3.172 ORACLETRUE patched 39.061 2.548 POSTGRES FALSE original 39.364 2.94 POSTGRES FALSE patched 18.518 1.038 POSTGRES TRUE original 39.868 2.679 POSTGRES TRUE patched 28.874 1.768 SQLSERVER FALSE original 45.252 1.643 SQLSERVER FALSE patched 24.583 1.529 SQLSERVER TRUE original 49.149 3.45 SQLSERVER TRUE patched 32.918 1.654 (sameTable=true means that all threads were trying to allocate ids for the same db.table, false means they all targeted different tables) Thanks, Marton Bod
Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds
> On May 8, 2020, 10:23 a.m., Peter Vary wrote: > > Thanks Marci, > > Few querstions below - probably I just do not understand this part of the > > code enough. > > > > Another question for the perf test: How many threads are you using? > > > > Thanks, > > Peter Thanks Peti for the review. See my answers below, they are related to minor housekeeping changes that are not core to the optimization story. The perf tests were run with 8 concurrent threads. > On May 8, 2020, 10:23 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java > > Line 1102 (original) > > <https://reviews.apache.org/r/72481/diff/1/?file=2230753#file2230753line1102> > > > > Why did we remove this? checkRetryable never throws MetaException (and hence the code never enters this catch block), so I removed it from its throws clause > On May 8, 2020, 10:23 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java > > Line 1137 (original) > > <https://reviews.apache.org/r/72481/diff/1/?file=2230753#file2230753line1137> > > > > why did we remove this? same as above > On May 8, 2020, 10:23 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 1762-1765 (original), 1759-1762 (patched) > > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line1764> > > > > Is this a functionality or performance change? neither really, just some readability refactor > On May 8, 2020, 10:23 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Line 2021 (original), 2013 (patched) > > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line2025> > > > > Why is this change required? just seems counterintuitive to use string concat if we're already using a stringbuilder anyway > On May 8, 2020, 10:23 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Line 2067 (original), 2057 (patched) > > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line2071> > > > > Why is this change? this was causing a checkstyle issue (line lenght too long) > On May 8, 2020, 10:23 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Line 2079 (original), 2070 (patched) > > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line2084> > > > > Why is this change? unnecessary call to Long.toString > On May 8, 2020, 10:23 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Line 2090 (original), 2081 (patched) > > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line2095> > > > > Why is this change? same as above - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72481/#review220689 --- On May 7, 2020, 3:55 p.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72481/ > --- > > (Updated May 7, 2020, 3:55 p.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > Removed global mutex on writeId allocation, which means write ids can now be > allocated concurrently for different tables without blocking each other, > speeding up execution (perf test results below). Concurrent > allocateTableWriteIds() operations targeting the same table are still mutexed > by an S4U if the table is already present in next_write_id, otherwise a race > condition to insert the table into next_write_id is solved by retrying after > catching the duplicate key exception (the thread which commits later will be > the one to retry). > > The situation is similar when allocateTableWriteIds() and > replTableWriteIdState() are running concurrently - if they target different > tables, they won't block each other anymore. If they target the same table, > and
Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72481/ --- Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- Removed global mutex on writeId allocation, which means write ids can now be allocated concurrently for different tables without blocking each other, speeding up execution (perf test results below). Concurrent allocateTableWriteIds() operations targeting the same table are still mutexed by an S4U if the table is already present in next_write_id, otherwise a race condition to insert the table into next_write_id is solved by retrying after catching the duplicate key exception (the thread which commits later will be the one to retry). The situation is similar when allocateTableWriteIds() and replTableWriteIdState() are running concurrently - if they target different tables, they won't block each other anymore. If they target the same table, and the table is already inserted into next_write_id, replTableWriteIdState() returns early and allocateTableWriteIds() updates the next id. If the table is not yet in next_write_id, they might attempt to insert the same row concurrently, in which case who commits later will get a duplicate key exception and retry the operation, just as above. Diffs - ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 868da0c7a0 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java d59f863b11 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java cf41ef8aaf standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java 1e177f4a7b Diff: https://reviews.apache.org/r/72481/diff/1/ Testing --- Unit test in TestTxnHandler + Perf tests: dbTypesameTable variant ms/op error MYSQL FALSE original 46.93 3.041 MYSQL FALSE patched 19.283 1.311 MYSQL TRUE original 50.185 3.595 MYSQL TRUE patched 32.254 2.164 ORACLEFALSE original 57.609 4.461 ORACLEFALSE patched 25.721 2.551 ORACLETRUE original 59.668 3.172 ORACLETRUE patched 39.061 2.548 POSTGRES FALSE original 39.364 2.94 POSTGRES FALSE patched 18.518 1.038 POSTGRES TRUE original 39.868 2.679 POSTGRES TRUE patched 28.874 1.768 SQLSERVER FALSE original 45.252 1.643 SQLSERVER FALSE patched 24.583 1.529 SQLSERVER TRUE original 49.149 3.45 SQLSERVER TRUE patched 32.918 1.654 (sameTable=true means that all threads were trying to allocate ids for the same db.table, false means they all targeted different tables) Thanks, Marton Bod
Re: Review Request 72472: HIVE-23318: TxnHandler need not delete from MATERIALIZATION_REBUILD_LOCKS on need basis
> On May 6, 2020, 11:18 a.m., Denys Kuzmenko wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java > > Line 3158 (original) > > <https://reviews.apache.org/r/72472/diff/2/?file=2230577#file2230577line3159> > > > > nit: new line added it back > On May 6, 2020, 11:18 a.m., Denys Kuzmenko wrote: > > ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java > > Line 104 (original), 108 (patched) > > <https://reviews.apache.org/r/72472/diff/2/?file=2230578#file2230578line108> > > > > nit: I would just use ternary operator in-place, it's easier to > > understand why the fallback replaced with ternary - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72472/#review220652 ------- On May 5, 2020, 8:58 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72472/ > --- > > (Updated May 5, 2020, 8:58 a.m.) > > > Review request for hive, Denys Kuzmenko, Jesús Camacho Rodríguez, and Peter > Vary. > > > Repository: hive-git > > > Description > --- > > Introduces a new TxnType for materialization view rebuilds, and only issues > rebuild lock delete calls during commit for that specific type > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d > ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java e91a7ed760 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/TxnType.java > 349f37f914 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 302c340198 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 28c971f5bf > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > cdf97fc53c > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > c78aeb46d9 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 8fded608d0 > > > Diff: https://reviews.apache.org/r/72472/diff/3/ > > > Testing > --- > > > Thanks, > > Marton Bod > >
Re: Review Request 72472: HIVE-23318: TxnHandler need not delete from MATERIALIZATION_REBUILD_LOCKS on need basis
> On May 6, 2020, 4:54 a.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java > > Line 3153 (original), 3153 (patched) > > <https://reviews.apache.org/r/72472/diff/1/?file=2230251#file2230251line3153> > > > > Unrelated to this patch but related to compilation latency. I am not > > sure if it is actually possible but would be great to find a way to avoid > > the AST traversal (assuming this method is executed for every incoming > > query), since it adds to compilation latency and can become noticible for > > large queries. Makes sense. I've removed the traversal for mat view rebuild. As for the read-only check, we can create a separate story to figure out if we can optimize it. > On May 6, 2020, 4:54 a.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java > > Lines 3165 (patched) > > <https://reviews.apache.org/r/72472/diff/1/?file=2230251#file2230251line3166> > > > > LGTM. Have you confirmed that you need the traversal and that > > TOK_ALTER_MATERIALIZED_VIEW_REBUILD is not the root (I am not sure about > > this)? > > > > Also wanted to mention. MV rebuild is translated internally into INSERT > > OVERWRITE, INSERT, or MERGE statements, depending on the MV semantics. > > Setting this TxnType to MATER_VIEW_REBUILD will not have any kind of side > > effect on those operations. Is that correct? Good point, I've rechecked and we can use the root token instead of a traversal - updated the diff. This should not cause any side effects, as behaviour is only branched off for read-only txn types. - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72472/#review220647 --- On May 5, 2020, 8:58 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72472/ > --- > > (Updated May 5, 2020, 8:58 a.m.) > > > Review request for hive, Denys Kuzmenko, Jesús Camacho Rodríguez, and Peter > Vary. > > > Repository: hive-git > > > Description > --- > > Introduces a new TxnType for materialization view rebuilds, and only issues > rebuild lock delete calls during commit for that specific type > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d > ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java e91a7ed760 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/TxnType.java > 349f37f914 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > e4b0bc726c > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 1a0fee319b > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > e6224ec16f > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > 8462b3d7a3 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 8fded608d0 > > > Diff: https://reviews.apache.org/r/72472/diff/2/ > > > Testing > --- > > > Thanks, > > Marton Bod > >
Re: Review Request 72470: ACID: Concurrent MERGE INSERT operations produce duplicates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72470/#review220632 --- Ship it! Ship It! - Marton Bod On May 5, 2020, 8:33 a.m., Denys Kuzmenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72470/ > --- > > (Updated May 5, 2020, 8:33 a.m.) > > > Review request for hive, Marton Bod, Peter Varga, and Peter Vary. > > > Bugs: HIVE-23349 > https://issues.apache.org/jira/browse/HIVE-23349 > > > Repository: hive-git > > > Description > --- > > 2 concurrent MERGE INSERT operations generate duplicates due to lack of > locking. > MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that > doesn't prevent other INSERTs. We should use EXCLUSIVE lock here or > EXCL_WRITE if hive.txn.write.xlock=false; > > create table target (a int, b int) stored as orc TBLPROPERTIES > ('transactional'='true')"); > insert into target values (1,2), (3,4) > create table source (a int, b int) > execute in parallel: > > insert into source values (5,6), (7,8) > > PS: > > Current patch doesn cover following scenario: > 1) T1 (merge-insert) opens txns & records snapshot; > 2) T2 (insert/merge-insert) opens txns, records snapshot & locks it; > 3) T2 commits it's changes and unlocks T1; > 4) T1 locks snapshot and validates txn list, however only txns with txnId > lower than T1's is beeing considered, T2 changes are ignored -> duplicates > are generated. > > > merge-insert/merge-insert scenario could be fixed by leveraging write-write > conflict check mechanism. We just need to set operation type for merge-insert > to update. > However it won't solve issue with merge-insert/insert. > > We should consider moving locking before snapshot generation, current > snapshot re-check mechanism doesn't handle described scenarios. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Context.java 9f59d4cea3 > ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java c1f94d165b > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java deaab89c1f > ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java > 3ffdcec528 > > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/DbTxnManagerEndToEndTestBase.java > b435e79c3c > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 1687425bcb > ql/src/test/results/clientpositive/llap/explain_locks.q.out d62f6ccafd > > > Diff: https://reviews.apache.org/r/72470/diff/3/ > > > Testing > --- > > Manual, added number of merge related test scenarios into TestDbTxnManager2, > modified explain_locks.q > > > Thanks, > > Denys Kuzmenko > >
Review Request 72472: HIVE-23318: TxnHandler need not delete from MATERIALIZATION_REBUILD_LOCKS on need basis
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72472/ --- Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- Introduces a new TxnType for materialization view rebuilds, and only issues rebuild lock delete calls during commit for that specific type Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java e91a7ed760 standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/TxnType.java 349f37f914 standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php e4b0bc726c standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py 1a0fee319b standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb e6224ec16f standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 8462b3d7a3 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 8fded608d0 Diff: https://reviews.apache.org/r/72472/diff/1/ Testing --- Thanks, Marton Bod
Re: Review Request 72469: HIVE-23325: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler
> On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java > > Line 31 (original), 32 (patched) > > <https://reviews.apache.org/r/72469/diff/2/?file=2230151#file2230151line32> > > > > What does empty txn mean? Is it obsolete/abandoned/timed-out txns? > > Marton Bod wrote: > It refers to aborted/committed TXNs that do not have any entries in > TXN_COMPONENTS (hence the word empty). > > Denys Kuzmenko wrote: > could we name it TxnCleanerServic? EmptyTxn is a bit confusing. Yeah, I was also a bit on the fence about the naming. Changed it. - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72469/#review220594 ------- On May 5, 2020, 7:42 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72469/ > --- > > (Updated May 5, 2020, 7:42 a.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > Consolidate different metastore housekeeper threads into one, move cleaner > methods out of compactor initiator. > Create separate txn cleaner service which will need to run more frequently. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 829791e0a9 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > 23512e252e > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 48bf8529fa > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 5b8c6701e1 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java > eac2c6307f > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 1687425bcb > ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java > e4ff14a140 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java > 842b7fe53d > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 7bba8d6ee6 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MaterializationsRebuildLockCleanerTask.java > d35c9602a6 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java > e96a7ba289 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java > c4a488bac0 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidOpenTxnsCounterService.java > 2ad5a89f03 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidWriteSetService.java > 5ec513dfd4 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 8fded608d0 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java > e8ac71ae55 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/conf/TestMetastoreConf.java > 9905a14983 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/txn/TestAcidTxnCleanerService.java > PRE-CREATION > > > Diff: https://reviews.apache.org/r/72469/diff/3/ > > > Testing > --- > > > Thanks, > > Marton Bod > >
Re: Review Request 72469: HIVE-23325: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72469/ --- (Updated May 5, 2020, 7:42 a.m.) Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- Consolidate different metastore housekeeper threads into one, move cleaner methods out of compactor initiator. Create separate txn cleaner service which will need to run more frequently. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 829791e0a9 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 23512e252e ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 48bf8529fa ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 5b8c6701e1 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java eac2c6307f ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 1687425bcb ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java e4ff14a140 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 842b7fe53d standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 7bba8d6ee6 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MaterializationsRebuildLockCleanerTask.java d35c9602a6 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java e96a7ba289 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java c4a488bac0 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidOpenTxnsCounterService.java 2ad5a89f03 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidWriteSetService.java 5ec513dfd4 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 8fded608d0 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java e8ac71ae55 standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/conf/TestMetastoreConf.java 9905a14983 standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/txn/TestAcidTxnCleanerService.java PRE-CREATION Diff: https://reviews.apache.org/r/72469/diff/3/ Changes: https://reviews.apache.org/r/72469/diff/2-3/ Testing --- Thanks, Marton Bod
Re: Review Request 72469: HIVE-23325: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler
> On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote: > > LGTM , just few questions Thanks for the review! > On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote: > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > > Line 153 (original) > > <https://reviews.apache.org/r/72469/diff/2/?file=2230142#file2230142line155> > > > > Cleanup functionality was moved out of the Initiator and is covered by > > HousekeeperThreadnow. Is it right? Yes, that's correct. > On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote: > > ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java > > Line 205 (original) > > <https://reviews.apache.org/r/72469/diff/2/?file=2230147#file2230147line205> > > > > Do we have similar test for Housekeeper? Yes. The various TxnStore operations (e.g. writeset pruning) that AcidHouseKeeper executes in sequence are already covered by tests in various places (e.g. TestTxnHandler, TestDbTxnManager2). > On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java > > Line 31 (original), 32 (patched) > > <https://reviews.apache.org/r/72469/diff/2/?file=2230151#file2230151line32> > > > > What does empty txn mean? Is it obsolete/abandoned/timed-out txns? It refers to aborted/committed TXNs that do not have any entries in TXN_COMPONENTS (hence the word empty). > On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java > > Lines 84 (patched) > > <https://reviews.apache.org/r/72469/diff/2/?file=2230152#file2230152line84> > > > > Old AcidCompactionHistoryService was doing same stuff > > (txnHandler.purgeCompactionHistory()). Do we need than > > AcidEmptyTxnCleanerService and if yes can we embed it here? Those operations that can run with the about same frequency (timedout txn cleaning, writeset cleaning, compaction history pruning) have been merged and consolidated into a single HouseKeeper for simplicity (along with the cleaning ops Initiator used to do). However, AcidEmptyTxnCleaner needs to be treated separately it has to run significantly more frequently than AcidHouseKeeper due to how the new open txn mechanism works. - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72469/#review220594 --- On May 4, 2020, 12:58 p.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72469/ > --- > > (Updated May 4, 2020, 12:58 p.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > Consolidate different metastore housekeeper threads into one, move cleaner > methods out of compactor initiator. > Create separate txn cleaner service which will need to run more frequently. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 829791e0a9 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > 23512e252e > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 48bf8529fa > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 5b8c6701e1 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java > eac2c6307f > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 1687425bcb > ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java > e4ff14a140 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java > 842b7fe53d > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 7bba8d6ee6 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MaterializationsRebuildLockCleanerTask.java > d35c9602a6 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java > e96a7ba289 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java > c4a488bac0 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidOpenTxnsCounterService.java &g
Re: Review Request 72470: ACID: Concurrent MERGE INSERT operations produce duplicates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72470/#review220595 --- Looks good to me, just a couple of questions. ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java Lines 2226 (patched) <https://reviews.apache.org/r/72470/#comment309077> nit: the .getLockid() part is a leftover? ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java Lines 2231 (patched) <https://reviews.apache.org/r/72470/#comment309080> nit: this appears in a few places, might make sense to extract ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java Lines 2241 (patched) <https://reviews.apache.org/r/72470/#comment309079> maybe worth running the same tests with 'false' too? - Marton Bod On May 4, 2020, 1:32 p.m., Denys Kuzmenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72470/ > --- > > (Updated May 4, 2020, 1:32 p.m.) > > > Review request for hive, Marton Bod, Peter Varga, and Peter Vary. > > > Bugs: HIVE-23349 > https://issues.apache.org/jira/browse/HIVE-23349 > > > Repository: hive-git > > > Description > --- > > 2 concurrent MERGE INSERT operations generate duplicates due to lack of > locking. > MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that > doesn't prevent other INSERTs. We should use EXCLUSIVE lock here or > EXCL_WRITE if hive.txn.write.xlock=false; > > create table target (a int, b int) stored as orc TBLPROPERTIES > ('transactional'='true')"); > insert into target values (1,2), (3,4) > create table source (a int, b int) > execute in parallel: > > insert into source values (5,6), (7,8) > > PS: > > Current patch doesn cover following scenario: > 1) T1 (merge-insert) opens txns & records snapshot; > 2) T2 (insert/merge-insert) opens txns, records snapshot & locks it; > 3) T2 commits it's changes and unlocks T1; > 4) T1 locks snapshot and validates txn list, however only txns with txnId > lower than T1's is beeing considered, T2 changes are ignored -> duplicates > are generated. > > > merge-insert/merge-insert scenario could be fixed by leveraging write-write > conflict check mechanism. We just need to set operation type for merge-insert > to update. > However it won't solve issue with merge-insert/insert. > > We should consider moving locking before snapshot generation, current > snapshot re-check mechanism doesn't handle described scenarios. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Context.java 9f59d4cea3 > ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java c1f94d165b > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java deaab89c1f > ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java > 3ffdcec528 > > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/DbTxnManagerEndToEndTestBase.java > b435e79c3c > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 1687425bcb > ql/src/test/results/clientpositive/llap/explain_locks.q.out d62f6ccafd > > > Diff: https://reviews.apache.org/r/72470/diff/1/ > > > Testing > --- > > Manual, added number of merge related test scenarios into TestDbTxnManager2, > modified explain_locks.q > > > Thanks, > > Denys Kuzmenko > >
Review Request 72469: HIVE-23325: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72469/ --- Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- Consolidate different metastore housekeeper threads into one, move cleaner methods out of compactor initiator. Create separate txn cleaner service which will need to run more frequently. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 829791e0a9 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 23512e252e ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 48bf8529fa ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 5b8c6701e1 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java eac2c6307f ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 1687425bcb ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java e4ff14a140 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 842b7fe53d standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 7bba8d6ee6 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MaterializationsRebuildLockCleanerTask.java d35c9602a6 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java e96a7ba289 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java c4a488bac0 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidOpenTxnsCounterService.java 2ad5a89f03 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidWriteSetService.java 5ec513dfd4 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 8fded608d0 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java e8ac71ae55 standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/conf/TestMetastoreConf.java 9905a14983 standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/txn/TestAcidEmptyTxnCleanerService.java PRE-CREATION Diff: https://reviews.apache.org/r/72469/diff/1/ Testing --- Thanks, Marton Bod
[jira] [Created] (HIVE-23325) Clean up cleanup tasks for TxnHandler/CompactionTxnHandler
Marton Bod created HIVE-23325: - Summary: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler Key: HIVE-23325 URL: https://issues.apache.org/jira/browse/HIVE-23325 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently there are multiple ways to clean up ACID related stuff: * AcidHouseKeeperService * AcidWriteSetService * AcidCompactionHistoryService * Initiator * Cleaner * etc. We should consolidate them where possible and improve logging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23324) Par\
Marton Bod created HIVE-23324: - Summary: Par\ Key: HIVE-23324 URL: https://issues.apache.org/jira/browse/HIVE-23324 Project: Hive Issue Type: Improvement Reporter: Marton Bod -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72436: Locks: Implement zero-wait readers
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72436/#review220540 --- Ship it! Ship It! - Marton Bod On April 29, 2020, 10:18 a.m., Denys Kuzmenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72436/ > --- > > (Updated April 29, 2020, 10:18 a.m.) > > > Review request for hive, Marton Bod and Peter Vary. > > > Bugs: HIVE-23293 > https://issues.apache.org/jira/browse/HIVE-23293 > > > Repository: hive-git > > > Description > --- > > With a new lock type (EXCL_WRITE) for INSERT_OVERWRITE, SHARED_READ does not > have to wait for any lock - it can fails fast for a pending EXCLUSIVE, > because even if there is an EXCL_WRITE or SHARED_WRITE pending, there's no > semantic reason to wait for them. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8e643fe844 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java fb5a306ac4 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java b4dac4346e > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > f90396b2a3 > ql/src/test/results/clientnegative/insert_into1.q.out 066f04d1cc > ql/src/test/results/clientnegative/insert_into2.q.out b839efeb7d > ql/src/test/results/clientnegative/insert_into3.q.out 91aeb5b9a1 > ql/src/test/results/clientnegative/insert_into4.q.out 303fb6aa2e > ql/src/test/results/clientnegative/lockneg_try_drop_locked_db.q.out > e66965e693 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockRequest.java > 7402fb30eb > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockResponse.java > fdaab4b394 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 9fb7ff011a > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 4f317b3453 > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > e64ae0ead2 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockRequestBuilder.java > 93da0f60ec > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > 1e3d6e9b8b > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > fe39b0b36e > > > Diff: https://reviews.apache.org/r/72436/diff/6/ > > > Testing > --- > > DbTxnManager tests > > > Thanks, > > Denys Kuzmenko > >
Re: Review Request 72436: Locks: Implement zero-wait readers
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72436/#review220538 --- ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java Lines 145 (patched) <https://reviews.apache.org/r/72436/#comment309021> do we want to handle the case here when lockstate was NOT_ACQUIRED? e.g. log out the errorMsg on the client side ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java Lines 2632 (patched) <https://reviews.apache.org/r/72436/#comment309022> this hardcoded part might break if we ever change the internal representation of LockResponse or LockInfo. Would it make sense to instead create those objects here and use their toString()? - Marton Bod On April 28, 2020, 5:59 p.m., Denys Kuzmenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72436/ > --- > > (Updated April 28, 2020, 5:59 p.m.) > > > Review request for hive, Marton Bod and Peter Vary. > > > Bugs: HIVE-23293 > https://issues.apache.org/jira/browse/HIVE-23293 > > > Repository: hive-git > > > Description > --- > > With a new lock type (EXCL_WRITE) for INSERT_OVERWRITE, SHARED_READ does not > have to wait for any lock - it can fails fast for a pending EXCLUSIVE, > because even if there is an EXCL_WRITE or SHARED_WRITE pending, there's no > semantic reason to wait for them. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8e643fe844 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java fb5a306ac4 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java b4dac4346e > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > f90396b2a3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockRequest.java > 7402fb30eb > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockResponse.java > fdaab4b394 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 9fb7ff011a > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 4f317b3453 > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > e64ae0ead2 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockRequestBuilder.java > 93da0f60ec > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > 1e3d6e9b8b > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > fe39b0b36e > > > Diff: https://reviews.apache.org/r/72436/diff/5/ > > > Testing > --- > > DbTxnManager tests > > > Thanks, > > Denys Kuzmenko > >
Re: Review Request 72436: Locks: Implement zero-wait readers
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72436/#review220497 --- LGMT, just few questions ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java Lines 2553 (patched) <https://reviews.apache.org/r/72436/#comment308950> nit: white space plus explanation mentions "wrong code" but it's the whole ErrorMsg that's being compared ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java Lines 2555 (patched) <https://reviews.apache.org/r/72436/#comment308949> perhaps move this errorMsg into a constant since it's being referenced more than once (or into ErrorMsg) ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java Line 2559 (original), 2588 (patched) <https://reviews.apache.org/r/72436/#comment308955> it might make sense to extract this setup/query running part into a common method, and put the different assertions into testFairness and testFairnessZeroWaitRead, that might make the assertion part easier to follow. Same above standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockRequest.java Lines 59 (patched) <https://reviews.apache.org/r/72436/#comment308951> quick question: does this need to be part of the API, or can we just derive it from the conf vars (TXN_WRITE_X_LOCK, TXN_OVERWRITE_X_LOCK) as we do in the tests? standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java Lines 4389 (patched) <https://reviews.apache.org/r/72436/#comment308952> do we need this delete? if the txn is aborted, wouldn't all the locks be removed in abortTxn? standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java Lines 4391 (patched) <https://reviews.apache.org/r/72436/#comment308953> shouldn't we check blockedBy for certainty that it was an exclusive lock that blocked us? (and log the anomaly if not) - Marton Bod On April 27, 2020, 9:01 a.m., Denys Kuzmenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72436/ > --- > > (Updated April 27, 2020, 9:01 a.m.) > > > Review request for hive, Marton Bod and Peter Vary. > > > Bugs: HIVE-23293 > https://issues.apache.org/jira/browse/HIVE-23293 > > > Repository: hive-git > > > Description > --- > > With a new lock type (EXCL_WRITE) for INSERT_OVERWRITE, SHARED_READ does not > have to wait for any lock - it can fails fast for a pending EXCLUSIVE, > because even if there is an EXCL_WRITE or SHARED_WRITE pending, there's no > semantic reason to wait for them. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java b4dac4346e > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 497cedd61f > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockRequest.java > 7402fb30eb > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 9fb7ff011a > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 4f317b3453 > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > e64ae0ead2 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockRequestBuilder.java > 93da0f60ec > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > 1e3d6e9b8b > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 580786832e > > > Diff: https://reviews.apache.org/r/72436/diff/1/ > > > Testing > --- > > DbTxnManager tests > > > Thanks, > > Denys Kuzmenko > >
[jira] [Created] (HIVE-23297) Precompile statements where needed across TxnHandler
Marton Bod created HIVE-23297: - Summary: Precompile statements where needed across TxnHandler Key: HIVE-23297 URL: https://issues.apache.org/jira/browse/HIVE-23297 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod There are multiple places in TxnHandler that could benefit from pre-compiling sql queries using prepared statements. Some queries are complex in structure (e.g. checkLock) and we use string concats to recreate them every time the query is to run, in addition to re-parsing the query structure as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23283) Generate random temp ID for lock enqueue and commitTxn
Marton Bod created HIVE-23283: - Summary: Generate random temp ID for lock enqueue and commitTxn Key: HIVE-23283 URL: https://issues.apache.org/jira/browse/HIVE-23283 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod In order to optimize the S4U scope of enqueue lock and commitTxn, currently a hardcoded constant (-1) is used to first insert all the lock and ws entries with a temporary lockID/commitID. However, in a concurrent environment this seems to cause some performance degradation (and deadlock issues with some rdbms) as multiple concurrent transactions are trying to insert rows with the same primary key (e.g. (-1, 1), (-1, 2), (-1, 3), .. etc. for (extID/intID) in HIVE_LOCKS). The proposed solution is to replace the constant with a random generated negative number, which seems to resolve this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72378: HIVE-23201: Improve logging in locking
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72378/ --- (Updated April 23, 2020, 8:56 a.m.) Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- HIVE-23201: Improve logging in locking Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java e7910c1c5d standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java b3a1f826bb Diff: https://reviews.apache.org/r/72378/diff/5/ Changes: https://reviews.apache.org/r/72378/diff/4-5/ Testing --- Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/ Thanks, Marton Bod
Re: Review Request 72378: HIVE-23201: Improve logging in locking
> On April 22, 2020, 11:46 a.m., Denys Kuzmenko wrote: > > Looks good, however I don't really like change with > > REQ_LOCKS_ALIASED_FIELDS as it generats much bigger query (keep in mind > > there 3 of them joined by union, will be 4 for excl_write) + we are > > transfering more data that we are not very interested in. To get more > > information about the lock we can simply run show locks. Thanks Denys. As discussed, there was a misunderstanding related to logging out the blocked lock in more detail. I've reverted that part and updated the diff. - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72378/#review220419 --- On April 17, 2020, 11:30 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72378/ > --- > > (Updated April 17, 2020, 11:30 a.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > HIVE-23201: Improve logging in locking > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > e7910c1c5d > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java > b3a1f826bb > > > Diff: https://reviews.apache.org/r/72378/diff/4/ > > > Testing > --- > > Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/ > > > Thanks, > > Marton Bod > >
Re: Review Request 72378: HIVE-23201: Improve logging in locking
> On April 21, 2020, 10:38 a.m., Denys Kuzmenko wrote: > > LGTM, just number of comments Thanks for the review Denys! As per your comments, I have elevated the log levels, removed the less useful log lines and changed the original statements to prepared statements. - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72378/#review220389 --- On April 17, 2020, 11:30 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72378/ > --- > > (Updated April 17, 2020, 11:30 a.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > HIVE-23201: Improve logging in locking > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > d080df417b > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java > b3a1f826bb > > > Diff: https://reviews.apache.org/r/72378/diff/2/ > > > Testing > --- > > Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/ > > > Thanks, > > Marton Bod > >
Re: Review Request 72392: HIVE-23103 Oracle statement batching
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72392/#review220401 --- Ship it! Ship It! - Marton Bod On April 21, 2020, 12:41 p.m., Peter Vary wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72392/ > --- > > (Updated April 21, 2020, 12:41 p.m.) > > > Review request for hive, Denys Kuzmenko and Marton Bod. > > > Bugs: HIVE-23103 > https://issues.apache.org/jira/browse/HIVE-23103 > > > Repository: hive-git > > > Description > --- > > Examine how to really get better performance for oracle statement batches. > > Oracle JDBC doc describes: > > The Oracle implementation of standard update batching does not implement true > batching for generic statements and callable statements. Even though Oracle > JDBC supports the use of standard batching for Statement and > CallableStatement objects, you are unlikely to see performance improvement. > > I would look for connection properties to set, so it is handled anyway, or if > not, then use: > > begin > query1; > query2; > query3; > end; > to we will have only a single roundtrip for the db. > > > Diffs > - > > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java > bb29410e7d > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > d080df417b > > > Diff: https://reviews.apache.org/r/72392/diff/2/ > > > Testing > --- > > Baseline: > Benchmark(dbProduct) (txnType) Mode Cnt Score > Error Units > TxnHandlerBenchRunner.commitTxn ORACLEDEFAULTss 100 42.988 ± > 4.569 ms/op > TxnHandlerBenchRunner.commitTxn ORACLE READ_ONLYss 100 45.029 ± > 4.686 ms/op > > After patch: > Benchmark(dbProduct) (txnType) Mode Cnt Score > Error Units > TxnHandlerBenchRunner.commitTxn ORACLEDEFAULTss 100 36.208 ± > 3.869 ms/op > TxnHandlerBenchRunner.commitTxn ORACLE READ_ONLYss 100 37.038 ± > 3.746 ms/op > > > Thanks, > > Peter Vary > >
Re: Review Request 72392: HIVE-23103 Oracle statement batching
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72392/#review220387 --- standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java Lines 662 (patched) <https://reviews.apache.org/r/72392/#comment308749> Once this part executes, we wouldn't have 'begin' for the next batch, no? Also, the sb would need to be cleared I think - Marton Bod On April 20, 2020, 2:53 p.m., Peter Vary wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72392/ > --- > > (Updated April 20, 2020, 2:53 p.m.) > > > Review request for hive, Denys Kuzmenko and Marton Bod. > > > Bugs: HIVE-23103 > https://issues.apache.org/jira/browse/HIVE-23103 > > > Repository: hive-git > > > Description > --- > > Examine how to really get better performance for oracle statement batches. > > Oracle JDBC doc describes: > > The Oracle implementation of standard update batching does not implement true > batching for generic statements and callable statements. Even though Oracle > JDBC supports the use of standard batching for Statement and > CallableStatement objects, you are unlikely to see performance improvement. > > I would look for connection properties to set, so it is handled anyway, or if > not, then use: > > begin > query1; > query2; > query3; > end; > to we will have only a single roundtrip for the db. > > > Diffs > - > > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java > bb29410e7d > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > d080df417b > > > Diff: https://reviews.apache.org/r/72392/diff/1/ > > > Testing > --- > > Baseline: > Benchmark(dbProduct) (txnType) Mode Cnt Score > Error Units > TxnHandlerBenchRunner.commitTxn ORACLEDEFAULTss 100 42.988 ± > 4.569 ms/op > TxnHandlerBenchRunner.commitTxn ORACLE READ_ONLYss 100 45.029 ± > 4.686 ms/op > > After patch: > Benchmark(dbProduct) (txnType) Mode Cnt Score > Error Units > TxnHandlerBenchRunner.commitTxn ORACLEDEFAULTss 100 36.208 ± > 3.869 ms/op > TxnHandlerBenchRunner.commitTxn ORACLE READ_ONLYss 100 37.038 ± > 3.746 ms/op > > > Thanks, > > Peter Vary > >
Re: Review Request 72378: HIVE-23201: Improve logging in locking
> On April 17, 2020, 1:46 p.m., Peter Vary wrote: > > Mostly agree, few comments. > > I would like to ask you to go through them, and if there are any places > > where the problem should not happen use at least info level. > > > > Thanks, > > Peter Thanks for the review Peter! I've made changes based on your comments - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72378/#review220347 --- On April 17, 2020, 11:30 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72378/ > --- > > (Updated April 17, 2020, 11:30 a.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > HIVE-23201: Improve logging in locking > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > d080df417b > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java > b3a1f826bb > > > Diff: https://reviews.apache.org/r/72378/diff/2/ > > > Testing > --- > > Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/ > > > Thanks, > > Marton Bod > >
[jira] [Created] (HIVE-23236) Remove the global lock from acquireLock
Marton Bod created HIVE-23236: - Summary: Remove the global lock from acquireLock Key: HIVE-23236 URL: https://issues.apache.org/jira/browse/HIVE-23236 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently we have a global lock (NEXT_LOCK_ID) when running enqueueLock, because the algorithm in checkLock requires the locks to have a well defined order, and also requires that every lock component is already stored in the RDBMS before checking the locks. Proposed approach: * Enqueue lock without a global lock, using a sequence for getting the next lock ID * Insert the locks as usual, but with an additional timestamp (enqueued_at) * Check lock should check all enqueued locks for the given db/table/partition but order should be determined based on the enqueued_at time instead of the lockID * If there is no blocking lock, update the state to acquired as usual -- This message was sent by Atlassian Jira (v8.3.4#803005)
Review Request 72378: HIVE-23201: Improve logging in locking
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72378/ --- Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- HIVE-23201: Improve logging in locking Diffs - ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 962a63d418 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java b3a1f826bb Diff: https://reviews.apache.org/r/72378/diff/1/ Testing --- Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/ Thanks, Marton Bod
[jira] [Created] (HIVE-23234) Optimize TxnHandler::allocateTableWriteIds
Marton Bod created HIVE-23234: - Summary: Optimize TxnHandler::allocateTableWriteIds Key: HIVE-23234 URL: https://issues.apache.org/jira/browse/HIVE-23234 Project: Hive Issue Type: Improvement Reporter: Marton Bod Table write id allocation should be examined and optimized. One low hanging fruit is batching all the PreparedStatement inserts, but there might be other opportunities as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72360: HIVE-23093: Create new metastore config value for jdbc max batch size
> On April 16, 2020, 9:24 a.m., Peter Vary wrote: > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java > > Lines 650-651 (patched) > > <https://reviews.apache.org/r/72360/diff/1/?file=2217548#file2217548line650> > > > > The text describing the config value is for administrators and not for > > developers. So this part should be a java comment instead of being part of > > the description moved into a comment > On April 16, 2020, 9:24 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 2520-2525 (patched) > > <https://reviews.apache.org/r/72360/diff/1/?file=2217550#file2217550line2520> > > > > I would do it in setConf, like we did with the numOpenTxns and friends makes perfect sense, updated the diff - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72360/#review220331 ----------- On April 14, 2020, 2:25 p.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72360/ > --- > > (Updated April 14, 2020, 2:25 p.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > HIVE-23093: Create new metastore config value for jdbc max batch size > > > Diffs > - > > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java > 3bfb0e69cb > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java > a66e16973f > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 331fd4cc8d > > > Diff: https://reviews.apache.org/r/72360/diff/2/ > > > Testing > --- > > > Thanks, > > Marton Bod > >
[jira] [Created] (HIVE-23201) Improve logging in locking
Marton Bod created HIVE-23201: - Summary: Improve logging in locking Key: HIVE-23201 URL: https://issues.apache.org/jira/browse/HIVE-23201 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently it can be quite difficult to troubleshoot issues related to locking. To understand why a particular txn gave up after a while on acquiring a lock, you have to connect directly to the backend DB, since we are not logging right now which exact locks the txn is waiting for. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Review Request 72360: HIVE-23093: Create new metastore config value for jdbc max batch size
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72360/ --- Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- HIVE-23093: Create new metastore config value for jdbc max batch size Diffs - standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 3bfb0e69cb standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 620c77e589 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 7d0db0c3a0 Diff: https://reviews.apache.org/r/72360/diff/1/ Testing --- Thanks, Marton Bod
Re: Review Request 72359: HIVE-23104: Minimize critical paths of TxnHandler::commitTxn and abortTxn
> On April 14, 2020, 11:22 a.m., Peter Vary wrote: > > Thanks for the patch Marton! > > Some questions, ideas. > > > > Thanks, > > Peter Thanks for the review Peter! See my answers below and the updated diff > On April 14, 2020, 11:22 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Line 1248 (original), 1228 (patched) > > <https://reviews.apache.org/r/72359/diff/1/?file=2217523#file2217523line1267> > > > > Is this 'rs' not reused later? Maybe use a local scoped rs here? No, it's not reused anymore. I've made it locally-scoped. > On April 14, 2020, 11:22 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 1325 (patched) > > <https://reviews.apache.org/r/72359/diff/1/?file=2217523#file2217523line1388> > > > > Not really important since this is a private method, but why not return > > boolean instead? Would be nice, but we need the ResultSet to log out the details of what exact conflict has been found. > On April 14, 2020, 11:22 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 1351 (patched) > > <https://reviews.apache.org/r/72359/diff/1/?file=2217523#file2217523line1414> > > > > Maybe log the query in debug level? Sure > On April 14, 2020, 11:22 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 1375 (patched) > > <https://reviews.apache.org/r/72359/diff/1/?file=2217523#file2217523line1438> > > > > Might worth to initialize with a size Done - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72359/#review220308 --- On April 14, 2020, 8:57 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72359/ > --- > > (Updated April 14, 2020, 8:57 a.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > HIVE-23104: Minimize critical paths of TxnHandler::commitTxn and abortTxn > > > Diffs > - > > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 7d0db0c3a0 > > > Diff: https://reviews.apache.org/r/72359/diff/1/ > > > Testing > --- > > Green build: > https://builds.apache.org/job/PreCommit-HIVE-Build/21539/testReport > Benchmark results attached to ticket: > https://issues.apache.org/jira/browse/HIVE-23104 > > > Thanks, > > Marton Bod > >
Review Request 72359: HIVE-23104: Minimize critical paths of TxnHandler::commitTxn and abortTxn
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72359/ --- Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- HIVE-23104: Minimize critical paths of TxnHandler::commitTxn and abortTxn Diffs - standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 7d0db0c3a0 Diff: https://reviews.apache.org/r/72359/diff/1/ Testing --- Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21539/testReport Benchmark results attached to ticket: https://issues.apache.org/jira/browse/HIVE-23104 Thanks, Marton Bod
Re: Review Request 72324: HIVE-22750: Consolidate LockType naming
> On April 8, 2020, 3:23 p.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java > > Line 60 (original), 60 (patched) > > <https://reviews.apache.org/r/72324/diff/3/?file=2217221#file2217221line60> > > > > add builder for ExclWrite added > On April 8, 2020, 3:23 p.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java > > Lines 14 (patched) > > <https://reviews.apache.org/r/72324/diff/3/?file=2217225#file2217225line14> > > > > why not enum? That was my initial approach too, but in the end, having a second enum required lots of mapping between the two enums, and both having the same value set caused quite a bit of duplication too. I think by using this util class, we can more cleanly extend the LockType enum with any necessary state and behaviour, without introducing duplication. > On April 8, 2020, 3:23 p.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java > > Lines 34 (patched) > > <https://reviews.apache.org/r/72324/diff/3/?file=2217225#file2217225line34> > > > > you can move inverse into static and do no calc every time good spot, fixed - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72324/#review220258 --- On April 8, 2020, 3:09 p.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72324/ > --- > > (Updated April 8, 2020, 3:09 p.m.) > > > Review request for hive, Denys Kuzmenko, Peter Vary, and Zoltan Chovan. > > > Repository: hive-git > > > Description > --- > > HIVE-22750: Consolidate LockType naming > > > Diffs > - > > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java > e249b7775e > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/Lock.java > 52eb6133e7 > metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql 03540bba4d > metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql fa518747de > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 > ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java > 72f095d264 > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 80fb1aff78 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockType.java > 8ae4351129 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > db4cfb996a > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > cf3137928f > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > 849970eb56 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java > c739d4d196 > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > 098ddec5dc > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 7d0db0c3a0 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java > da38a6bbd3 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java > PRE-CREATION > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java > 1dfc105958 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtilTest.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/TransactionBatch.java > d44065018f > > > Diff: https://reviews.apache.org/r/72324/diff/3/ > > > Testing > --- > > > Thanks, > > Marton Bod > >
Re: Review Request 72324: HIVE-22750: Consolidate LockType naming
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72324/ --- (Updated April 8, 2020, 3:09 p.m.) Review request for hive, Denys Kuzmenko, Peter Vary, and Zoltan Chovan. Repository: hive-git Description --- HIVE-22750: Consolidate LockType naming Diffs (updated) - hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java e249b7775e hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/Lock.java 52eb6133e7 metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql 03540bba4d metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql fa518747de ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 72f095d264 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 80fb1aff78 standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockType.java 8ae4351129 standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php db4cfb996a standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py cf3137928f standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb 849970eb56 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java c739d4d196 standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 098ddec5dc standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 7d0db0c3a0 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java da38a6bbd3 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java PRE-CREATION standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java 1dfc105958 standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtilTest.java PRE-CREATION streaming/src/java/org/apache/hive/streaming/TransactionBatch.java d44065018f Diff: https://reviews.apache.org/r/72324/diff/3/ Changes: https://reviews.apache.org/r/72324/diff/2-3/ Testing --- Thanks, Marton Bod
Review Request 72324: HIVE-22750: Consolidate LockType naming
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72324/ --- Review request for hive, Denys Kuzmenko, Peter Vary, and Zoltan Chovan. Repository: hive-git Description --- HIVE-22750: Consolidate LockType naming Diffs - hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java e249b7775e hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/Lock.java 52eb6133e7 metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql 03540bba4d metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql fa518747de ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 72f095d264 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 80fb1aff78 standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockType.java 8ae4351129 standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php db4cfb996a standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py cf3137928f standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb 849970eb56 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java c739d4d196 standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 098ddec5dc standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java e8a988cca8 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java da38a6bbd3 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java PRE-CREATION standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java 1dfc105958 standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtilTest.java PRE-CREATION streaming/src/java/org/apache/hive/streaming/TransactionBatch.java d44065018f Diff: https://reviews.apache.org/r/72324/diff/1/ Testing --- Thanks, Marton Bod
Re: Review Request 72290: HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns
> On April 1, 2020, 9:02 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 1379 (patched) > > <https://reviews.apache.org/r/72290/diff/1/?file=2216627#file2216627line1410> > > > > Will this issue unnecessary queries for read only queries? For Oracle > > this could increase exexution time > > > > Also why not use executeQueryiesInBatch for this? As discussed, in the current state, these delete queries are executed every time during commitTxn (not during abortTxn). Regarding reusing executeQueryiesInBatch, that's a good point, will do. - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72290/#review220167 ----------- On April 1, 2020, 6:53 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72290/ > --- > > (Updated April 1, 2020, 6:53 a.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns > > > Diffs > - > > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 74ef88545e > > > Diff: https://reviews.apache.org/r/72290/diff/1/ > > > Testing > --- > > Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21347/ > > > Thanks, > > Marton Bod > >
Re: Review Request 72290: HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns
> On April 1, 2020, 7:38 a.m., Denys Kuzmenko wrote: > > LGTM, just a few comments. Thanks for the review! > On April 1, 2020, 7:38 a.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Line 1300 (original), 1305 (patched) > > <https://reviews.apache.org/r/72290/diff/1/?file=2216627#file2216627line1306> > > > > Use setLong, setObject would require type inference sure > On April 1, 2020, 7:38 a.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 1390 (patched) > > <https://reviews.apache.org/r/72290/diff/1/?file=2216627#file2216627line1421> > > > > Not 100% sure, I think Laci P was working on removal of > > MIN_HISTORY_LEVEL table (HIVE-23107), please check with him. yes he is, I will rebase onto his change once it's committed > On April 1, 2020, 7:38 a.m., Denys Kuzmenko wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 4295 (patched) > > <https://reviews.apache.org/r/72290/diff/1/?file=2216627#file2216627line4335> > > > > Please extract this into > > org.apache.hadoop.hive.metastore.tools.SQLGenerator as discussed, let's move it to TxnDbUtil - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72290/#review220164 --- On April 1, 2020, 6:53 a.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72290/ > --- > > (Updated April 1, 2020, 6:53 a.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns > > > Diffs > - > > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 74ef88545e > > > Diff: https://reviews.apache.org/r/72290/diff/1/ > > > Testing > --- > > Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21347/ > > > Thanks, > > Marton Bod > >
Review Request 72290: HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72290/ --- Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns Diffs - standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 74ef88545e Diff: https://reviews.apache.org/r/72290/diff/1/ Testing --- Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21347/ Thanks, Marton Bod
Re: Review Request 72264: HIVE-23052: Optimize lock enqueueing in TxnHandler
> On March 31, 2020, 9:35 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 199 (patched) > > <https://reviews.apache.org/r/72264/diff/2/?file=2216464#file2216464line199> > > > > %d could be problematic with different locales I will change to be on the safe side > On March 31, 2020, 9:35 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 2412 (patched) > > <https://reviews.apache.org/r/72264/diff/2/?file=2216464#file2216464line2416> > > > > nit: unnecessary move, might make backporting harder sure, makes sense > On March 31, 2020, 9:35 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 2545 (patched) > > <https://reviews.apache.org/r/72264/diff/2/?file=2216464#file2216464line2550> > > > > Is it possible to collect the all of the data before starting the RDBMS > > query, and get all of the data for a enqueueLockRequest in one go? As discussed, good idea, but it would overcomplicate the code and the query structure, whereas usually only 1-2 dbName-tblName combos are locked in one go > On March 31, 2020, 9:35 a.m., Peter Vary wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > > Lines 2599 (patched) > > <https://reviews.apache.org/r/72264/diff/2/?file=2216464#file2216464line2652> > > > > Do we need this? We do not commit temp lock id, so we can have just a > > single value which definitely will not appear in the table makes sense, will change to a cheaper placeholder value - Marton --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72264/#review220146 --- On March 24, 2020, 2:24 p.m., Marton Bod wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72264/ > --- > > (Updated March 24, 2020, 2:24 p.m.) > > > Review request for hive, Denys Kuzmenko and Peter Vary. > > > Repository: hive-git > > > Description > --- > > HIVE-23052: Optimize lock enqueueing in TxnHandler > > > Diffs > - > > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java > f92ce7325e > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java > 85719fdf84 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java > 76bbf3bc1e > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 74ef88545e > > > Diff: https://reviews.apache.org/r/72264/diff/2/ > > > Testing > --- > > Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21249/ > Custom benchmark results attached to ticket: > https://issues.apache.org/jira/browse/HIVE-23052 > > > Thanks, > > Marton Bod > >
[jira] [Created] (HIVE-23104) Minimize critical paths of TxnHandler::commitTxn and abortTxn
Marton Bod created HIVE-23104: - Summary: Minimize critical paths of TxnHandler::commitTxn and abortTxn Key: HIVE-23104 URL: https://issues.apache.org/jira/browse/HIVE-23104 Project: Hive Issue Type: Improvement Reporter: Marton Bod Investigate whether any code sections in TxnHandler::commitTxn and abortTxn can be lifted out/executed async in order to reduce the overall execution time of these methods. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23093) Create new metastore config value for jdbc max batch size
Marton Bod created HIVE-23093: - Summary: Create new metastore config value for jdbc max batch size Key: HIVE-23093 URL: https://issues.apache.org/jira/browse/HIVE-23093 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod In order to reduce the number of network calls to the metastore DB, there is an effort to batch acid/locking DB calls where it makes sense. Existing metastore config params do not provide a suitable, generic option to control the jdbc statement max batch size. Solution is to create a new config param, and use that in the appropriate batching calls in TxnHandler. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Review Request 72264: HIVE-23052: Optimize lock enqueueing in TxnHandler
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72264/ --- Review request for hive, Denys Kuzmenko and Peter Vary. Repository: hive-git Description --- HIVE-23052: Optimize lock enqueueing in TxnHandler Diffs - standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 06defdb910 Diff: https://reviews.apache.org/r/72264/diff/1/ Testing --- Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21249/ Custom benchmark tests with local and remote DB Thanks, Marton Bod
[jira] [Created] (HIVE-23067) Use batch inserts in TxnHandler
Marton Bod created HIVE-23067: - Summary: Use batch inserts in TxnHandler Key: HIVE-23067 URL: https://issues.apache.org/jira/browse/HIVE-23067 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod To reduce the number of database calls and network roundtrips, we could use more batching in TxnHandler, where currently in many places we call insert commands in loops sequentially. Some examples: * openTxns (TXNS, REPL_TXN_MAP * commitTxn (COMPLETED_TXN_COMPONENTS) * replTableWriteIdState (TXN_TO_WRITE_ID) * allocateTableWriteIds (TXN_TO_WRITE_ID) * -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23052) Optimize lock enqueueing in TxnHandler
Marton Bod created HIVE-23052: - Summary: Optimize lock enqueueing in TxnHandler Key: HIVE-23052 URL: https://issues.apache.org/jira/browse/HIVE-23052 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod TxnHandler::enqueueLockWithRetry uses select for update (S4U) to retrieve the next lock id from the metastore. This locks the NEXT_LOCK_ID table for concurrent lock requests until the transaction has been committed. Since this creates contention, we should reduce the scope of this S4U, thereby locking the table for only as long as necessary. Currently, within the same txn, after the initial S4U, we also insert the transaction components (TXN_COMPONENTS table) before committing, even though they are independent and do not make use of the lock id. So factoring out these txn_component inserts should decrease the time the next_lock_id table is locked and thus increase concurrency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Review Request 72181: HIVE-22832: Parallelise direct insert directory cleaning process
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72181/ --- Review request for hive, Marta Kuczora and Peter Vary. Repository: hive-git Description --- HIVE-22832: Parallelise direct insert directory cleaning process Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e9966e6364 Diff: https://reviews.apache.org/r/72181/diff/1/ Testing --- pre-commit build success: https://builds.apache.org/job/PreCommit-HIVE-Build/20874/ Thanks, Marton Bod
[jira] [Created] (HIVE-22938) Investigate possibility of removing empty bucket file creation mechanism in MR
Marton Bod created HIVE-22938: - Summary: Investigate possibility of removing empty bucket file creation mechanism in MR Key: HIVE-22938 URL: https://issues.apache.org/jira/browse/HIVE-22938 Project: Hive Issue Type: Task Reporter: Marton Bod As a follow-up to [HIVE-22918|https://issues.apache.org/jira/browse/HIVE-22918], this ticket is to investigate whether the empty bucket file creation mechanism can be removed safely from MR. For a bucketed table of N buckets, each insert will generate N bucket files in the delta directory, regardless of how many actual buckets are written to. As an example, if a table has 500 buckets, and we insert a single record, 499 empty bucket files are generated alongside the single bucket that contains the actual data. This makes the operation substantially slower in some cases. This behaviour only seems to happen when using MR as the execution engine. Some components/parts of the code might depend on this behaviour though, so it needs to be verified that removing this logic does not interfere with anything. -- This message was sent by Atlassian Jira (v8.3.4#803005)