[jira] [Created] (HIVE-26156) Iceberg delete writer should handle deleting from old partition specs

2022-04-20 Thread Marton Bod (Jira)
Marton Bod created HIVE-26156:
-

 Summary: Iceberg delete writer should handle deleting from old 
partition specs
 Key: HIVE-26156
 URL: https://issues.apache.org/jira/browse/HIVE-26156
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


While {{HiveIcebergRecordWriter}} always writes data out according to the 
latest spec, the {{HiveIcebergDeleteWriter}} might have to write delete files 
into partitions that correspond to a variety of specs, both old and new. 
Therefore we should pass the {{{}table.specs(){}}}map into the 
{{HiveIcebergWriter}} so that the delete writer can choose the appropriate spec 
on a per-record basis.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26151) Support range-based time travel queries for Iceberg

2022-04-19 Thread Marton Bod (Jira)
Marton Bod created HIVE-26151:
-

 Summary: Support range-based time travel queries for Iceberg
 Key: HIVE-26151
 URL: https://issues.apache.org/jira/browse/HIVE-26151
 Project: Hive
  Issue Type: New Feature
Reporter: Marton Bod
Assignee: Marton Bod


Allow querying which records have been inserted during a certain time window 
for Iceberg tables. The Iceberg TableScan API provides an implementation for 
that, so most of the work would go into adding syntax support and transporting 
the startTime and endTime parameters to the Iceberg input format.

Proposed new syntax: 

SELECT * FROM table FOR SYSTEM_TIME FROM '' TO ''

SELECT * FROM table FOR SYSTEM_VERSION FROM  TO 

(the TO clause is optional in both cases)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-03-31 Thread Marton Bod (Jira)
Marton Bod created HIVE-26102:
-

 Summary: Implement DELETE statements for Iceberg tables
 Key: HIVE-26102
 URL: https://issues.apache.org/jira/browse/HIVE-26102
 Project: Hive
  Issue Type: New Feature
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Apache Hive 4.0.0-alpha-1 Release Candidate 1

2022-03-22 Thread Marton Bod
+1 (non-binding)
Tested the checksums, signatures and built it successfully

On Tue, Mar 22, 2022 at 2:26 PM Peter Vary 
wrote:

> Hi Team,
>
> Apache Hive 4.0.0-alpha-1 Release Candidate 1 is available here:
>
> https://people.apache.org/~pvary/apache-hive-4.0.0-alpha-1-rc1/
>
> Maven artifacts are available here:
>
> https://repository.apache.org/content/repositories/orgapachehive-/
>
> The tag 4.0.0-alpha-1-rc1 has been applied to the source for this release
> in github, you can see it at
> https://github.com/apache/hive/tree/release-4.0.0-alpha-1-rc1
>
> Voting will conclude in 72 hours.
>
> All interested parties: Please test.
> Hive PMC Members: Please test and vote.
>
> Thanks.


[jira] [Created] (HIVE-26004) Upgrade Iceberg to 0.13.1

2022-03-04 Thread Marton Bod (Jira)
Marton Bod created HIVE-26004:
-

 Summary: Upgrade Iceberg to 0.13.1
 Key: HIVE-26004
 URL: https://issues.apache.org/jira/browse/HIVE-26004
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases

2022-01-24 Thread Marton Bod (Jira)
Marton Bod created HIVE-25891:
-

 Summary: Improve Iceberg error message for unsupported 
vectorization cases
 Key: HIVE-25891
 URL: https://issues.apache.org/jira/browse/HIVE-25891
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently, if you attempt to read a Parquet or Avro Iceberg table with 
vectorization turned on, you will eventually get an error message since it's 
not supported. However, this error message is very misleading and does not 
explain clearly what the problem is and how to work around it. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25890) Fix truncate problem with Iceberg CTAS tables

2022-01-24 Thread Marton Bod (Jira)
Marton Bod created HIVE-25890:
-

 Summary: Fix truncate problem with Iceberg CTAS tables
 Key: HIVE-25890
 URL: https://issues.apache.org/jira/browse/HIVE-25890
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


Currently Iceberg CTAS tables cannot be truncated in a subsequent operation. 
This is because we populate the table properties differently on the CTAS 
codepath, and the external.table.purge=true is not populated in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25849) Disable insert overwrite for bucket partitioned Iceberg tables

2022-01-06 Thread Marton Bod (Jira)
Marton Bod created HIVE-25849:
-

 Summary: Disable insert overwrite for bucket partitioned Iceberg 
tables
 Key: HIVE-25849
 URL: https://issues.apache.org/jira/browse/HIVE-25849
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Insert overwrite should be disabled where the target Iceberg table is a bucket 
partitioned table, since which existing partitions will be overwritten is very 
hard to predict from a user's POV, as it depends on the bucket hash values 
calculated for the new dataset's rows. It's better to be on the safe side and 
disable this operation to avoid unwanted data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization

2022-01-04 Thread Marton Bod (Jira)
Marton Bod created HIVE-25843:
-

 Summary: Add flag to disable Iceberg FileIO config serialization
 Key: HIVE-25843
 URL: https://issues.apache.org/jira/browse/HIVE-25843
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Hive serializes the Iceberg table object into each individual split. Since the 
FileIO is part of the Iceberg table and it has its own hadoop configuration, 
this configuration will be the dominant factor determining the size of the 
serialized split. In our tests we have found that due to this serialized 
config, iceberg splits are 15-20x larger than normal Hive splits (which led to 
OOM in some of our perf tests).

This PR proposes to introduce a config which can turn off this config 
serialization, and let the deserializer-side fill out the config values instead 
(which works for Hive executors, since they have all the config values in 
hand). This can reduce the Iceberg split size by ~20x based on local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25815) Add flag to skip maven validate phase for Iceberg modules

2021-12-16 Thread Marton Bod (Jira)
Marton Bod created HIVE-25815:
-

 Summary: Add flag to skip maven validate phase for Iceberg modules
 Key: HIVE-25815
 URL: https://issues.apache.org/jira/browse/HIVE-25815
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


The Iceberg checkstyle and spotless plugins which run during the validate phase 
are quite strict in terms of enforcing proper code style. This is great when 
checking in code to master, but it can be inconvenient when doing quick dev 
iterations locally. This PR introduces a maven {{validate.skip}} flag to be 
able to skip checkstyle and spotless checks on demand for the Iceberg modules.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25788) Iceberg CTAS should honor location clause and have correct table properties

2021-12-08 Thread Marton Bod (Jira)
Marton Bod created HIVE-25788:
-

 Summary: Iceberg CTAS should honor location clause and have 
correct table properties
 Key: HIVE-25788
 URL: https://issues.apache.org/jira/browse/HIVE-25788
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


Currently Iceberg CTAS does not take the LOCATION clause into consideration. 
Also, these tables end up with some unintended table properties coming from the 
SerDe, such as partition.columns or partition.columns.comments, etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25754) Fix column projection for union all queries with multiple aliases

2021-12-01 Thread Marton Bod (Jira)
Marton Bod created HIVE-25754:
-

 Summary: Fix column projection for union all queries with multiple 
aliases
 Key: HIVE-25754
 URL: https://issues.apache.org/jira/browse/HIVE-25754
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


Given two tables:

{{create table source1 (dt string, d1 int, d2 int) stored as orc;
create table source2 (dt string, d1 int, d2 int) stored as orc;
insert into source1 values ('20211107', 1, 2);
insert into source2 values ('20211108', 11, 22);}}
If you run this query with UNION ALL, the {{key}} column will be missing from 
the output:

{{select explode(map('D219', D219
,'D220', D220)) as (key, value) from (}}
   {{select '20211107' as date_key
,1 as D219
,2 as D220
) t}}
{{union all}}
{{select explode(map('D221', D221
,'D222', D222)) as (key, value)
from (}}
  {{select '20211107' as date_key
,1 as D221
,2 as D222
) t}}
Result:

{{1}}
{{2}}
{{11}}
{{22}}
 
Correct result should be:

{{D219  1}}
{{D220  2}}
{{D221  11}}
{{D222  22}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-26 Thread Marton Bod (Jira)
Marton Bod created HIVE-25741:
-

 Summary: HiveProtoLoggingHook EventLogger should always close old 
writer
 Key: HIVE-25741
 URL: https://issues.apache.org/jira/browse/HIVE-25741
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
the Hive proto {{EventLogger}} will create a new file for each proto event. 
However, if we already had an appropriate writer (i.e. 
maybeRolloverWriterForDay() returns false) from some previous operation - we 
don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread Marton Bod (Jira)
Marton Bod created HIVE-25740:
-

 Summary: Handle race condition between compaction txn abort/commit 
and heartbeater
 Key: HIVE-25740
 URL: https://issues.apache.org/jira/browse/HIVE-25740
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


This issue is the following: once the compaction worker finishes, 
commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
interrupted after that. This can lead to race conditions where the txn has 
already been deleted from the backend DB via commit/abort, but the concurrently 
running heartbeater thread still attempts to send a last heartbeat after that, 
but the txn id won't be found in the DB, leading to {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams

2021-11-19 Thread Marton Bod (Jira)
Marton Bod created HIVE-25727:
-

 Summary: Iceberg hive catalog should create table object with 
initialised SerdeParams
 Key: HIVE-25727
 URL: https://issues.apache.org/jira/browse/HIVE-25727
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently we leave the serdeInfo.parameters as null when we create the table 
object to be persisted during commit time in Iceberg hive catalog. We should 
init the params with an empty map to avoid any NPE possibilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25690) Fix column reorder detection for Iceberg schema evolution

2021-11-11 Thread Marton Bod (Jira)
Marton Bod created HIVE-25690:
-

 Summary: Fix column reorder detection for Iceberg schema evolution
 Key: HIVE-25690
 URL: https://issues.apache.org/jira/browse/HIVE-25690
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


Current algorithm for detecting schema differences between HMS and Iceberg 
schema is broken when it comes to column reorders. This patch should fix that 
up and add more extensive testing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread Marton Bod (Jira)
Marton Bod created HIVE-25658:
-

 Summary: Fix regex for masking totalSize table properties in 
Iceberg q-tests
 Key: HIVE-25658
 URL: https://issues.apache.org/jira/browse/HIVE-25658
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


HIVE-25607 introduced a text replace regex for masking out the totalSize table 
property values in Iceberg q.out files. The regex however did not cover all of 
the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-25 Thread Marton Bod (Jira)
Marton Bod created HIVE-25643:
-

 Summary: Disable replace cols and change col commands for migrated 
Iceberg tables
 Key: HIVE-25643
 URL: https://issues.apache.org/jira/browse/HIVE-25643
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Since the Iceberg table migration will intentionally not rewrite the data 
files, the migrated table will end up with data files that do not contain the 
Iceberg field IDs necessary for safe, reliable schema migration. For this 
purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands for 
these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25622) Change storage handler authz URI API to use the HMS table object

2021-10-19 Thread Marton Bod (Jira)
Marton Bod created HIVE-25622:
-

 Summary: Change storage handler authz URI API to use the HMS table 
object
 Key: HIVE-25622
 URL: https://issues.apache.org/jira/browse/HIVE-25622
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


* Moving the {{getURIForAuth}} method into {{HiveStorageHandler}} with a 
default implementation
 * Changing its signature to accept the HMS table object instead, as it 
provides implementations with more flexibility around constructing the URIs
 * Deleting the StorageAuthorizationHandler interface
 * Cleaning up code parts where {{getURIForAuth}} is invoked



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25607) Mask totalSize table property in Iceberg q-tests

2021-10-11 Thread Marton Bod (Jira)
Marton Bod created HIVE-25607:
-

 Summary: Mask totalSize table property in Iceberg q-tests
 Key: HIVE-25607
 URL: https://issues.apache.org/jira/browse/HIVE-25607
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


The totalSize tbl prop can change whenever the file format version changes, 
therefore potentially causing the q tests to be flaky when issuing describe 
formatted commands. We should mask this and not test against the exact value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25604) Iceberg should implement the authorization storage handler

2021-10-08 Thread Marton Bod (Jira)
Marton Bod created HIVE-25604:
-

 Summary: Iceberg should implement the authorization storage handler
 Key: HIVE-25604
 URL: https://issues.apache.org/jira/browse/HIVE-25604
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Iceberg's StorageHandler should implement the HiveStorageAuthorizationHandler 
interface for authorization purposes. We'll use the iceberg table root location 
as the basis for permission handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25587) Disable Iceberg table migration for unsupported source file formats

2021-10-01 Thread Marton Bod (Jira)
Marton Bod created HIVE-25587:
-

 Summary: Disable Iceberg table migration for unsupported source 
file formats
 Key: HIVE-25587
 URL: https://issues.apache.org/jira/browse/HIVE-25587
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently, we only support migrating ORC, Parquet and Avro tables to Iceberg. 
However, there is no check in the code to fail early for other formats (e.g. 
text, json, rcfile), which can lead to wasted effort at best, and leaving the 
source table unusable at worst. Therefore, we should check the source format 
early and shortcircuit for unsupported types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25581) Iceberg storage handler should set common projection pruning config

2021-09-30 Thread Marton Bod (Jira)
Marton Bod created HIVE-25581:
-

 Summary: Iceberg storage handler should set common projection 
pruning config
 Key: HIVE-25581
 URL: https://issues.apache.org/jira/browse/HIVE-25581
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently the value for the config "tez.mrreader.config.update.properties" is 
not set for Iceberg jobs, when in fact it needs to be part of the jobConf for 
all Iceberg queries. This change should ensure it's set by the Iceberg storage 
handler by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25529) Add tests for reading/writing Iceberg V2 tables with delete files

2021-09-16 Thread Marton Bod (Jira)
Marton Bod created HIVE-25529:
-

 Summary: Add tests for reading/writing Iceberg V2 tables with 
delete files
 Key: HIVE-25529
 URL: https://issues.apache.org/jira/browse/HIVE-25529
 Project: Hive
  Issue Type: Task
Reporter: Marton Bod
Assignee: Marton Bod


Since Iceberg V2 tables are now official, we can start testing out whether V2 
tables can be created/read/written by Hive. While Hive has no delete statement 
yet on Iceberg tables, we can nonetheless use the Iceberg API to create delete 
files manually and then check if Hive honors those deletes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25486) Upgrade to I

2021-08-27 Thread Marton Bod (Jira)
Marton Bod created HIVE-25486:
-

 Summary: Upgrade to I
 Key: HIVE-25486
 URL: https://issues.apache.org/jira/browse/HIVE-25486
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25354) Handle unsupported queries for Iceberg tables

2021-07-20 Thread Marton Bod (Jira)
Marton Bod created HIVE-25354:
-

 Summary: Handle unsupported queries for Iceberg tables
 Key: HIVE-25354
 URL: https://issues.apache.org/jira/browse/HIVE-25354
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


In Iceberg, there will be several Hive commands that will be unsupported either 
temporarily or else. For example, right now all commands containing the 
PARTITION keyword would fail the Semantic Analysis given that the HMS table is 
always unpartitioned for Iceberg. The resulting error message for the user 
would be confusing. We should provide a common, unified error handling for 
these queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

2021-07-14 Thread Marton Bod (Jira)
Marton Bod created HIVE-25328:
-

 Summary: Limit scope of REPLACE COLUMNS for Iceberg tables
 Key: HIVE-25328
 URL: https://issues.apache.org/jira/browse/HIVE-25328
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Replace columns is a rather wildcard operation which can do heavy-weight schema 
changes. We would only want to allow this operation for dropping columns for 
Iceberg tables. For other changes (adding cols, renaming, type promotion etc.), 
we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25308) Use new Tez API to get JobID for Iceberg commits

2021-07-06 Thread Marton Bod (Jira)
Marton Bod created HIVE-25308:
-

 Summary: Use new Tez API to get JobID for Iceberg commits
 Key: HIVE-25308
 URL: https://issues.apache.org/jira/browse/HIVE-25308
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


When committing Iceberg writes, currently we only have the JobID without the 
vertexID, therefore we have to list the folder {{/temp}} first, 
and parse out the full JobIDs (incl. vertexID) from the resulting folder names.

With Tez 0.10.1 released, now we have a new API we can call to acquire the full 
JobID, making the file listing unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25256) Support ALTER TABLE CHANGE COLUMN for Iceberg

2021-06-16 Thread Marton Bod (Jira)
Marton Bod created HIVE-25256:
-

 Summary: Support ALTER TABLE CHANGE COLUMN for Iceberg
 Key: HIVE-25256
 URL: https://issues.apache.org/jira/browse/HIVE-25256
 Project: Hive
  Issue Type: New Feature
Reporter: Marton Bod
Assignee: Marton Bod


In order to provide support for renaming/changing the data type of a single 
column, we should add alter table change column support for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25255) Support ALTER TABLE REPLACE COLUMNS for Iceberg

2021-06-16 Thread Marton Bod (Jira)
Marton Bod created HIVE-25255:
-

 Summary: Support ALTER TABLE REPLACE COLUMNS for Iceberg
 Key: HIVE-25255
 URL: https://issues.apache.org/jira/browse/HIVE-25255
 Project: Hive
  Issue Type: New Feature
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25222) Fix reading Iceberg tables with a comma in column names

2021-06-09 Thread Marton Bod (Jira)
Marton Bod created HIVE-25222:
-

 Summary: Fix reading Iceberg tables with a comma in column names
 Key: HIVE-25222
 URL: https://issues.apache.org/jira/browse/HIVE-25222
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


When using a table with a column name containing a comma (e.g. `employ,ee`), 
reading an Iceberg table fails because we rely on the property 
"hive.io.file.readcolumn.names" which encodes the read columns in a 
comma-separated list, put together by the ColumnProjectionUtils class.

Because it's comma-separated in all cases, it will produce a string like: 
"id,birth_date,employ,ee" which can cause problems for Iceberg readers which 
use this string list to construct their expected read schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread Marton Bod (Jira)
Marton Bod created HIVE-25195:
-

 Summary: Store Iceberg write commit and ctas information in 
QueryState 
 Key: HIVE-25195
 URL: https://issues.apache.org/jira/browse/HIVE-25195
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


We should replace the current method of passing Iceberg write commit-related 
information (jobID, task num) and CTAS info via the session conf using prefixed 
keys. We have a new way of doing that more cleanly, using the QueryState 
object. This should make the code easier to maintain and guard against 
accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-25 Thread Marton Bod (Jira)
Marton Bod created HIVE-25161:
-

 Summary: Implement CTAS for partitioned Iceberg tables
 Key: HIVE-25161
 URL: https://issues.apache.org/jira/browse/HIVE-25161
 Project: Hive
  Issue Type: New Feature
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25106) Do not exclude avatica and protobuf for Iceberg

2021-05-12 Thread Marton Bod (Jira)
Marton Bod created HIVE-25106:
-

 Summary: Do not exclude avatica and protobuf for Iceberg
 Key: HIVE-25106
 URL: https://issues.apache.org/jira/browse/HIVE-25106
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


When running tests from the IDE, the current dependency exclusions in the 
hive-iceberg pom can result in:
{code:java}
Caused by: java.lang.ClassNotFoundException: 
com.google.protobuf.GeneratedMessageV3 at 
java.net.URLClassLoader.findClass(URLClassLoader.java:382) at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418) at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 62 more
 
{code}
and
{code:java}
 Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.calcite.avatica.ConnectionPropertiesImpl at 
org.apache.calcite.avatica.MetaImpl.(MetaImpl.java:72) at 
org.apache.calcite.jdbc.CalciteMetaImpl.(CalciteMetaImpl.java:85) at 
org.apache.calcite.jdbc.Driver.createMeta(Driver.java:169) at 
org.apache.calcite.avatica.AvaticaConnection.(AvaticaConnection.java:121){code}
{{}}
when running {{testCBOWithSelectedColumnsOverlapJoin}} and 
{{testCBOWithSelectedColumnsNonOverlapJoin}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits

2021-04-29 Thread Marton Bod (Jira)
Marton Bod created HIVE-25076:
-

 Summary: Get number of write tasks from jobConf for Iceberg commits
 Key: HIVE-25076
 URL: https://issues.apache.org/jira/browse/HIVE-25076
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


When writing empty data into Iceberg tables, we can end up with 0 succeeded 
task count number. With the current logic, we might then erroneously end up 
taking the number of mapper tasks in the commit logic, which would result in 
failures. We should instead save the number of succeeded task count into the 
JobConf under a specified key and retrieve it from there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25052) Writing to Iceberg tables can fail when inserting empty result set

2021-04-23 Thread Marton Bod (Jira)
Marton Bod created HIVE-25052:
-

 Summary: Writing to Iceberg tables can fail when inserting empty 
result set
 Key: HIVE-25052
 URL: https://issues.apache.org/jira/browse/HIVE-25052
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25038) Increase Iceberg test timeout and remove mr tests

2021-04-20 Thread Marton Bod (Jira)
Marton Bod created HIVE-25038:
-

 Summary: Increase Iceberg test timeout and remove mr tests
 Key: HIVE-25038
 URL: https://issues.apache.org/jira/browse/HIVE-25038
 Project: Hive
  Issue Type: Task
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25034) Implement CTAS for Iceberg

2021-04-20 Thread Marton Bod (Jira)
Marton Bod created HIVE-25034:
-

 Summary: Implement CTAS for Iceberg
 Key: HIVE-25034
 URL: https://issues.apache.org/jira/browse/HIVE-25034
 Project: Hive
  Issue Type: New Feature
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25007) Implement insert overwrite for Iceberg tables

2021-04-13 Thread Marton Bod (Jira)
Marton Bod created HIVE-25007:
-

 Summary: Implement insert overwrite for Iceberg tables
 Key: HIVE-25007
 URL: https://issues.apache.org/jira/browse/HIVE-25007
 Project: Hive
  Issue Type: New Feature
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-13 Thread Marton Bod (Jira)
Marton Bod created HIVE-25006:
-

 Summary: Commit Iceberg writes in HiveMetaHook instead of TezAM
 Key: HIVE-25006
 URL: https://issues.apache.org/jira/browse/HIVE-25006
 Project: Hive
  Issue Type: Task
Reporter: Marton Bod
Assignee: Marton Bod


Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24958) Create Iceberg catalog module in Hive

2021-03-30 Thread Marton Bod (Jira)
Marton Bod created HIVE-24958:
-

 Summary: Create Iceberg catalog module in Hive
 Key: HIVE-24958
 URL: https://issues.apache.org/jira/browse/HIVE-24958
 Project: Hive
  Issue Type: Task
Reporter: Marton Bod
Assignee: Marton Bod


* Create a new iceberg-catalog module in Hive, with the code currently 
contained in Iceberg's iceberg-hive-metastore module
 * Make sure all tests pass (including static analysis and checkstyle)
 * Make iceberg-handler depend on this module instead of iceberg-hive-metastore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24947) Casting exception when reading vectorized parquet file for insert into

2021-03-26 Thread Marton Bod (Jira)
Marton Bod created HIVE-24947:
-

 Summary: Casting exception when reading vectorized parquet file 
for insert into
 Key: HIVE-24947
 URL: https://issues.apache.org/jira/browse/HIVE-24947
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod


We have two parquet tables (target and source).

Upon running the query:
{code:java}
set hive.vectorized.execution.enabled=true;
insert into target2 partition(tlmtc_fl_gnrtd_yr_nb, tlmtc_fl_gnrtd_mnth_nb) 
select * from source;{code}
The following exception is thrown:
{code:java}
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to [B
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.fillColumnVector(VectorizedListColumnReader.java:308)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:342)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:91)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:433)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:376)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:99)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
... 24 more
{code}
The same runs without problems when vectorization is turned off. 

Attaching the show create table statements and the sample parquet file so it 
can reproduced.

cc [~nareshpr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-10 Thread Marton Bod (Jira)
Marton Bod created HIVE-24867:
-

 Summary: Create iceberg-handler module in Hive
 Key: HIVE-24867
 URL: https://issues.apache.org/jira/browse/HIVE-24867
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


* Create a new iceberg-handler module in Hive
 * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
 * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
(iceberg-mr code was based on Hive 3.1)
 * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24857) Trigger Tez output commit after close operation

2021-03-08 Thread Marton Bod (Jira)
Marton Bod created HIVE-24857:
-

 Summary: Trigger Tez output commit after close operation
 Key: HIVE-24857
 URL: https://issues.apache.org/jira/browse/HIVE-24857
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently Tez triggers the OutputCommitter.commit() operation between the 
proc.run() and proc.close() operations in TezProcessor. However, when writing 
out data, calling the proc.close() operation may still produce some extra 
records, which would be missed by the output committer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24665) Add commitAlterTable method to the HiveMetaHook interface

2021-01-20 Thread Marton Bod (Jira)
Marton Bod created HIVE-24665:
-

 Summary: Add commitAlterTable method to the HiveMetaHook interface
 Key: HIVE-24665
 URL: https://issues.apache.org/jira/browse/HIVE-24665
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently we have pre and post hooks for create table and drop table commands, 
but only a pre hook for alter table commands. We should add a post hook as well 
(with a default implementation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24629) Invoke optional output committer in TezProcessor

2021-01-13 Thread Marton Bod (Jira)
Marton Bod created HIVE-24629:
-

 Summary: Invoke optional output committer in TezProcessor
 Key: HIVE-24629
 URL: https://issues.apache.org/jira/browse/HIVE-24629
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


In order to enable Hive to write to Iceberg tables, we need to use an output 
committer which will fire at the end of each Tez task execution (commitTask) 
and the after the execution of each vertex (commitOutput/commitJob). This 
output committer will issue a commit containing the written-out data files to 
the Iceberg table, replacing its previous snapshot pointer with a new one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24541) Add config to set a default storage handler class

2020-12-15 Thread Marton Bod (Jira)
Marton Bod created HIVE-24541:
-

 Summary: Add config to set a default storage handler class
 Key: HIVE-24541
 URL: https://issues.apache.org/jira/browse/HIVE-24541
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Add a config param "hive.default.storage.handler.class" so we can set a default 
storage handler class that can be used for all create table statements. By 
default it would be an empty string, taking no effect.

This would allow existing user queries to be reused for a new table format for 
example, such as Iceberg. 

For example, after setting in the config: hive.default.storage.handler.class= 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler

The query: CREATE TABLE abc (a int, b string) LOCATION ...

would be equivalent to: CREATE TABLE abc (a int, b string) STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24540) Add test hive shell for simpler execution tests and debugging

2020-12-15 Thread Marton Bod (Jira)
Marton Bod created HIVE-24540:
-

 Summary: Add test hive shell for simpler execution tests and 
debugging
 Key: HIVE-24540
 URL: https://issues.apache.org/jira/browse/HIVE-24540
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


In the Apache Iceberg project, we've been using a 
TestHiveShell/TestHiveMetastore class for running query execution unit tests, 
which made our life much easier both in terms of writing test and debugging the 
code from an IDE. It would have value bringing it to the Apache Hive project as 
well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24498) Package facebook thrift classes into hive-exec jar

2020-12-07 Thread Marton Bod (Jira)
Marton Bod created HIVE-24498:
-

 Summary: Package facebook thrift classes into hive-exec jar
 Key: HIVE-24498
 URL: https://issues.apache.org/jira/browse/HIVE-24498
 Project: Hive
  Issue Type: Task
Reporter: Marton Bod
Assignee: Marton Bod


When using Iceberg tables, the Tez AM needs to communicate with the HMS to 
fetch table information in the SerDe and InputFormat operations. The libfb303 
classes are currently missing from the hive-exec jar leading to Tez missing 
these classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24133) Hive query with Hbase storagehandler can give back incorrect results when predicate contains null check

2020-09-09 Thread Marton Bod (Jira)
Marton Bod created HIVE-24133:
-

 Summary: Hive query with Hbase storagehandler can give back 
incorrect results when predicate contains null check
 Key: HIVE-24133
 URL: https://issues.apache.org/jira/browse/HIVE-24133
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod


It has been observed that when using Hbase storage handler and the table 
contains null values, Hive can give back wrong query results, depending on what 
columns we select for and whether the where clause predicate contains any null 
checks.

For example:

create 'default:hive_test', 'cf'
put 'default:hive_test', '1', 'cf:col1', 'val1'
put 'default:hive_test', '1', 'cf:col2', 'val2'

put 'default:hive_test', '2', 'cf:col1', 'val1_2'
put 'default:hive_test', '2', 'cf:col2', 'val2_2'

put 'default:hive_test', '3', 'cf:col1', 'val1_3'
put 'default:hive_test', '3', 'cf:col2', 'val2_3'
put 'default:hive_test', '3', 'cf:col3', 'val3_3'
put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2"

put 'default:hive_test', '4', 'cf:col1', 'val1_4'
put 'default:hive_test', '4', 'cf:col2', 'val2_4'

scan 'default:hive_test'

= HIVE

CREATE EXTERNAL TABLE hbase_hive_test (
rowkey string,
col1 string,
col2 string,
col3 string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3"
)
TBLPROPERTIES("hbase.table.name" = "default:hive_test");

query: select * from hbase_hive_test where col3 is null;

result:
Total MapReduce CPU Time Spent: 10 seconds 980 msec
OK
1 val1 val2 NULL
2 val1_2 val2_2 NULL
4 val1_4 val2_4 NULL

query: select rowkey from hbase_hive_test where col3 is null;

This does not produce any records.

However, select rowkey, col2 from hbase_hive_test where col3 is null;

This gives back the correct results again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23759) Refactor CommitTxnRequest field order

2020-06-24 Thread Marton Bod (Jira)
Marton Bod created HIVE-23759:
-

 Summary: Refactor CommitTxnRequest field order
 Key: HIVE-23759
 URL: https://issues.apache.org/jira/browse/HIVE-23759
 Project: Hive
  Issue Type: Task
Reporter: Marton Bod
Assignee: Marton Bod


Refactor CommitTxnRequest field order (keyValue and replLastIdInfo). This 
should be a safe change as neither of these fields have been part of any 
official Hive release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23625) HS2 Web UI displays query drill-down results in plain text, not html

2020-06-05 Thread Marton Bod (Jira)
Marton Bod created HIVE-23625:
-

 Summary: HS2 Web UI displays query drill-down results in plain 
text, not html
 Key: HIVE-23625
 URL: https://issues.apache.org/jira/browse/HIVE-23625
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


Opening a drilldown link on the HS2 Web UI, you are directed to the following 
URL: /query_page?operationId=

Since the path /query_page contains no file extensions, Jetty cannot determine 
the mimetype and therefore the Hive HttpServer returns response header 
Content-Type: text/plain;charset=utf-8, and the information does not render as 
html in the browser. This should be corrected to return text/html;charset=utf-8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-14 Thread Marton Bod


> On May 14, 2020, 3:07 p.m., Denys Kuzmenko wrote:
> > LGTM, some minor comments

Thanks Denys, I've address your comments


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/#review220758
---


On May 14, 2020, 3:38 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72481/
> ---
> 
> (Updated May 14, 2020, 3:38 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Removed global mutex on writeId allocation, which means write ids can now be 
> allocated concurrently for different tables without blocking each other, 
> speeding up execution (perf test results below). Concurrent 
> allocateTableWriteIds() operations targeting the same table are still mutexed 
> by an S4U if the table is already present in next_write_id, otherwise a race 
> condition to insert the table into next_write_id is solved by retrying after 
> catching the duplicate key exception (the thread which commits later will be 
> the one to retry).
> 
> The situation is similar when allocateTableWriteIds() and 
> replTableWriteIdState() are running concurrently - if they target different 
> tables, they won't block each other anymore. If they target the same table, 
> and the table is already inserted into next_write_id, replTableWriteIdState() 
> returns early and allocateTableWriteIds() updates the next id. If the table 
> is not yet in next_write_id, they might attempt to insert the same row 
> concurrently, in which case who commits later will get a duplicate key 
> exception and retry the operation, just as above.
> 
> 
> Diffs
> -
> 
>   ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
> 868da0c7a0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  d59f863b11 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  cf41ef8aaf 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  1e177f4a7b 
> 
> 
> Diff: https://reviews.apache.org/r/72481/diff/2/
> 
> 
> Testing
> ---
> 
> Unit test in TestTxnHandler
> + Perf tests:
> dbTypesameTable variant  ms/op  error
> MYSQL FALSE original 46.93  3.041
> MYSQL FALSE patched  19.283 1.311
> MYSQL TRUE  original 50.185 3.595
> MYSQL TRUE  patched  32.254 2.164
> ORACLEFALSE original 57.609 4.461
> ORACLEFALSE patched  25.721 2.551
> ORACLETRUE  original 59.668 3.172
> ORACLETRUE  patched  39.061 2.548
> POSTGRES  FALSE original 39.364 2.94 
> POSTGRES  FALSE patched  18.518 1.038
> POSTGRES  TRUE  original 39.868 2.679
> POSTGRES  TRUE  patched  28.874 1.768
> SQLSERVER FALSE original 45.252 1.643
> SQLSERVER FALSE patched  24.583 1.529
> SQLSERVER TRUE  original 49.149 3.45 
> SQLSERVER TRUE  patched  32.918 1.654
> (sameTable=true means that all threads were trying to allocate ids for the 
> same db.table,
> false means they all targeted different tables)
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-14 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/
---

(Updated May 14, 2020, 3:38 p.m.)


Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

Removed global mutex on writeId allocation, which means write ids can now be 
allocated concurrently for different tables without blocking each other, 
speeding up execution (perf test results below). Concurrent 
allocateTableWriteIds() operations targeting the same table are still mutexed 
by an S4U if the table is already present in next_write_id, otherwise a race 
condition to insert the table into next_write_id is solved by retrying after 
catching the duplicate key exception (the thread which commits later will be 
the one to retry).

The situation is similar when allocateTableWriteIds() and 
replTableWriteIdState() are running concurrently - if they target different 
tables, they won't block each other anymore. If they target the same table, and 
the table is already inserted into next_write_id, replTableWriteIdState() 
returns early and allocateTableWriteIds() updates the next id. If the table is 
not yet in next_write_id, they might attempt to insert the same row 
concurrently, in which case who commits later will get a duplicate key 
exception and retry the operation, just as above.


Diffs (updated)
-

  ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
868da0c7a0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 d59f863b11 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 cf41ef8aaf 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 1e177f4a7b 


Diff: https://reviews.apache.org/r/72481/diff/2/

Changes: https://reviews.apache.org/r/72481/diff/1-2/


Testing
---

Unit test in TestTxnHandler
+ Perf tests:
dbTypesameTable variant  ms/op  error
MYSQL FALSE original 46.93  3.041
MYSQL FALSE patched  19.283 1.311
MYSQL TRUE  original 50.185 3.595
MYSQL TRUE  patched  32.254 2.164
ORACLEFALSE original 57.609 4.461
ORACLEFALSE patched  25.721 2.551
ORACLETRUE  original 59.668 3.172
ORACLETRUE  patched  39.061 2.548
POSTGRES  FALSE original 39.364 2.94 
POSTGRES  FALSE patched  18.518 1.038
POSTGRES  TRUE  original 39.868 2.679
POSTGRES  TRUE  patched  28.874 1.768
SQLSERVER FALSE original 45.252 1.643
SQLSERVER FALSE patched  24.583 1.529
SQLSERVER TRUE  original 49.149 3.45 
SQLSERVER TRUE  patched  32.918 1.654
(sameTable=true means that all threads were trying to allocate ids for the same 
db.table,
false means they all targeted different tables)


Thanks,

Marton Bod



Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-08 Thread Marton Bod


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > Thanks Marci,
> > Few querstions below - probably I just do not understand this part of the 
> > code enough.
> > 
> > Another question for the perf test: How many threads are you using?
> > 
> > Thanks,
> > Peter

Thanks Peti for the review. See my answers below, they are related to minor 
housekeeping changes that are not core to the optimization story.

The perf tests were run with 8 concurrent threads.


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
> > Line 1102 (original)
> > <https://reviews.apache.org/r/72481/diff/1/?file=2230753#file2230753line1102>
> >
> > Why did we remove this?

checkRetryable never throws MetaException (and hence the code never enters this 
catch block), so I removed it from its throws clause


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
> > Line 1137 (original)
> > <https://reviews.apache.org/r/72481/diff/1/?file=2230753#file2230753line1137>
> >
> > why did we remove this?

same as above


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 1762-1765 (original), 1759-1762 (patched)
> > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line1764>
> >
> > Is this a functionality or performance change?

neither really, just some readability refactor


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2021 (original), 2013 (patched)
> > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line2025>
> >
> > Why is this change required?

just seems counterintuitive to use string concat if we're already using a 
stringbuilder anyway


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2067 (original), 2057 (patched)
> > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line2071>
> >
> > Why is this change?

this was causing a checkstyle issue (line lenght too long)


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2079 (original), 2070 (patched)
> > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line2084>
> >
> > Why is this change?

unnecessary call to Long.toString


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2090 (original), 2081 (patched)
> > <https://reviews.apache.org/r/72481/diff/1/?file=2230754#file2230754line2095>
> >
> > Why is this change?

same as above


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/#review220689
---


On May 7, 2020, 3:55 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72481/
> ---
> 
> (Updated May 7, 2020, 3:55 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Removed global mutex on writeId allocation, which means write ids can now be 
> allocated concurrently for different tables without blocking each other, 
> speeding up execution (perf test results below). Concurrent 
> allocateTableWriteIds() operations targeting the same table are still mutexed 
> by an S4U if the table is already present in next_write_id, otherwise a race 
> condition to insert the table into next_write_id is solved by retrying after 
> catching the duplicate key exception (the thread which commits later will be 
> the one to retry).
> 
> The situation is similar when allocateTableWriteIds() and 
> replTableWriteIdState() are running concurrently - if they target different 
> tables, they won't block each other anymore. If they target the same table, 
> and 

Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-07 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

Removed global mutex on writeId allocation, which means write ids can now be 
allocated concurrently for different tables without blocking each other, 
speeding up execution (perf test results below). Concurrent 
allocateTableWriteIds() operations targeting the same table are still mutexed 
by an S4U if the table is already present in next_write_id, otherwise a race 
condition to insert the table into next_write_id is solved by retrying after 
catching the duplicate key exception (the thread which commits later will be 
the one to retry).

The situation is similar when allocateTableWriteIds() and 
replTableWriteIdState() are running concurrently - if they target different 
tables, they won't block each other anymore. If they target the same table, and 
the table is already inserted into next_write_id, replTableWriteIdState() 
returns early and allocateTableWriteIds() updates the next id. If the table is 
not yet in next_write_id, they might attempt to insert the same row 
concurrently, in which case who commits later will get a duplicate key 
exception and retry the operation, just as above.


Diffs
-

  ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
868da0c7a0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 d59f863b11 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 cf41ef8aaf 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 1e177f4a7b 


Diff: https://reviews.apache.org/r/72481/diff/1/


Testing
---

Unit test in TestTxnHandler
+ Perf tests:
dbTypesameTable variant  ms/op  error
MYSQL FALSE original 46.93  3.041
MYSQL FALSE patched  19.283 1.311
MYSQL TRUE  original 50.185 3.595
MYSQL TRUE  patched  32.254 2.164
ORACLEFALSE original 57.609 4.461
ORACLEFALSE patched  25.721 2.551
ORACLETRUE  original 59.668 3.172
ORACLETRUE  patched  39.061 2.548
POSTGRES  FALSE original 39.364 2.94 
POSTGRES  FALSE patched  18.518 1.038
POSTGRES  TRUE  original 39.868 2.679
POSTGRES  TRUE  patched  28.874 1.768
SQLSERVER FALSE original 45.252 1.643
SQLSERVER FALSE patched  24.583 1.529
SQLSERVER TRUE  original 49.149 3.45 
SQLSERVER TRUE  patched  32.918 1.654
(sameTable=true means that all threads were trying to allocate ids for the same 
db.table,
false means they all targeted different tables)


Thanks,

Marton Bod



Re: Review Request 72472: HIVE-23318: TxnHandler need not delete from MATERIALIZATION_REBUILD_LOCKS on need basis

2020-05-06 Thread Marton Bod


> On May 6, 2020, 11:18 a.m., Denys Kuzmenko wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
> > Line 3158 (original)
> > <https://reviews.apache.org/r/72472/diff/2/?file=2230577#file2230577line3159>
> >
> > nit: new line

added it back


> On May 6, 2020, 11:18 a.m., Denys Kuzmenko wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java
> > Line 104 (original), 108 (patched)
> > <https://reviews.apache.org/r/72472/diff/2/?file=2230578#file2230578line108>
> >
> > nit: I would just use ternary operator in-place, it's easier to 
> > understand why the fallback

replaced with ternary


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72472/#review220652
-------


On May 5, 2020, 8:58 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72472/
> ---
> 
> (Updated May 5, 2020, 8:58 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Jesús Camacho Rodríguez, and Peter 
> Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Introduces a new TxnType for materialization view rebuilds, and only issues 
> rebuild lock delete calls during commit for that specific type
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d 
>   ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java e91a7ed760 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/TxnType.java
>  349f37f914 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  302c340198 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  28c971f5bf 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  cdf97fc53c 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> c78aeb46d9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  8fded608d0 
> 
> 
> Diff: https://reviews.apache.org/r/72472/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72472: HIVE-23318: TxnHandler need not delete from MATERIALIZATION_REBUILD_LOCKS on need basis

2020-05-06 Thread Marton Bod


> On May 6, 2020, 4:54 a.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
> > Line 3153 (original), 3153 (patched)
> > <https://reviews.apache.org/r/72472/diff/1/?file=2230251#file2230251line3153>
> >
> > Unrelated to this patch but related to compilation latency. I am not 
> > sure if it is actually possible but would be great to find a way to avoid 
> > the AST traversal (assuming this method is executed for every incoming 
> > query), since it adds to compilation latency and can become noticible for 
> > large queries.

Makes sense. I've removed the traversal for mat view rebuild. As for the 
read-only check, we can create a separate story to figure out if we can 
optimize it.


> On May 6, 2020, 4:54 a.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
> > Lines 3165 (patched)
> > <https://reviews.apache.org/r/72472/diff/1/?file=2230251#file2230251line3166>
> >
> > LGTM. Have you confirmed that you need the traversal and that 
> > TOK_ALTER_MATERIALIZED_VIEW_REBUILD is not the root (I am not sure about 
> > this)?
> > 
> > Also wanted to mention. MV rebuild is translated internally into INSERT 
> > OVERWRITE, INSERT, or MERGE statements, depending on the MV semantics. 
> > Setting this TxnType to MATER_VIEW_REBUILD will not have any kind of side 
> > effect on those operations. Is that correct?

Good point, I've rechecked and we can use the root token instead of a traversal 
- updated the diff. 
This should not cause any side effects, as behaviour is only branched off for 
read-only txn types.


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72472/#review220647
---


On May 5, 2020, 8:58 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72472/
> ---
> 
> (Updated May 5, 2020, 8:58 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Jesús Camacho Rodríguez, and Peter 
> Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Introduces a new TxnType for materialization view rebuilds, and only issues 
> rebuild lock delete calls during commit for that specific type
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d 
>   ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java e91a7ed760 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/TxnType.java
>  349f37f914 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  e4b0bc726c 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  1a0fee319b 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  e6224ec16f 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 8462b3d7a3 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  8fded608d0 
> 
> 
> Diff: https://reviews.apache.org/r/72472/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72470: ACID: Concurrent MERGE INSERT operations produce duplicates

2020-05-05 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72470/#review220632
---


Ship it!




Ship It!

- Marton Bod


On May 5, 2020, 8:33 a.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72470/
> ---
> 
> (Updated May 5, 2020, 8:33 a.m.)
> 
> 
> Review request for hive, Marton Bod, Peter Varga, and Peter Vary.
> 
> 
> Bugs: HIVE-23349
> https://issues.apache.org/jira/browse/HIVE-23349
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 2 concurrent MERGE INSERT operations generate duplicates due to lack of 
> locking.
> MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that 
> doesn't prevent other INSERTs. We should use EXCLUSIVE lock here or 
> EXCL_WRITE if hive.txn.write.xlock=false;
> 
> create table target (a int, b int) stored as orc TBLPROPERTIES 
> ('transactional'='true')");
> insert into target values (1,2), (3,4)
> create table source (a int, b int)
> execute in parallel:
> 
> insert into source values (5,6), (7,8)
> 
> PS:
> 
> Current patch doesn cover following scenario:
> 1) T1 (merge-insert) opens txns & records snapshot;
> 2) T2 (insert/merge-insert) opens txns, records snapshot & locks it;
> 3) T2 commits it's changes and unlocks T1; 
> 4) T1 locks snapshot and validates txn list, however only txns with txnId 
> lower than T1's is beeing considered, T2 changes are ignored -> duplicates 
> are generated.
> 
> 
> merge-insert/merge-insert scenario could be fixed by leveraging write-write 
> conflict check mechanism. We just need to set operation type for merge-insert 
> to update.
> However it won't solve issue with merge-insert/insert. 
> 
> We should consider moving locking before snapshot generation, current 
> snapshot re-check mechanism doesn't handle described scenarios.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 9f59d4cea3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java c1f94d165b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java deaab89c1f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java 
> 3ffdcec528 
>   
> ql/src/test/org/apache/hadoop/hive/ql/lockmgr/DbTxnManagerEndToEndTestBase.java
>  b435e79c3c 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 1687425bcb 
>   ql/src/test/results/clientpositive/llap/explain_locks.q.out d62f6ccafd 
> 
> 
> Diff: https://reviews.apache.org/r/72470/diff/3/
> 
> 
> Testing
> ---
> 
> Manual, added number of merge related test scenarios into TestDbTxnManager2, 
> modified explain_locks.q
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Review Request 72472: HIVE-23318: TxnHandler need not delete from MATERIALIZATION_REBUILD_LOCKS on need basis

2020-05-05 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72472/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

Introduces a new TxnType for materialization view rebuilds, and only issues 
rebuild lock delete calls during commit for that specific type


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java e91a7ed760 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/TxnType.java
 349f37f914 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
 e4b0bc726c 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
 1a0fee319b 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
 e6224ec16f 
  standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
8462b3d7a3 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 8fded608d0 


Diff: https://reviews.apache.org/r/72472/diff/1/


Testing
---


Thanks,

Marton Bod



Re: Review Request 72469: HIVE-23325: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler

2020-05-05 Thread Marton Bod


> On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java
> > Line 31 (original), 32 (patched)
> > <https://reviews.apache.org/r/72469/diff/2/?file=2230151#file2230151line32>
> >
> > What does empty txn mean? Is it obsolete/abandoned/timed-out txns?
> 
> Marton Bod wrote:
> It refers to aborted/committed TXNs that do not have any entries in 
> TXN_COMPONENTS (hence the word empty).
> 
> Denys Kuzmenko wrote:
> could we name it TxnCleanerServic? EmptyTxn is a bit confusing.

Yeah, I was also a bit on the fence about the naming. Changed it.


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72469/#review220594
-------


On May 5, 2020, 7:42 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72469/
> ---
> 
> (Updated May 5, 2020, 7:42 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Consolidate different metastore housekeeper threads into one, move cleaner 
> methods out of compactor initiator. 
> Create separate txn cleaner service which will need to run more frequently.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 829791e0a9 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 23512e252e 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 48bf8529fa 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 5b8c6701e1 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
> eac2c6307f 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 1687425bcb 
>   ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java 
> e4ff14a140 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
>  842b7fe53d 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  7bba8d6ee6 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MaterializationsRebuildLockCleanerTask.java
>  d35c9602a6 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java
>  e96a7ba289 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java
>  c4a488bac0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidOpenTxnsCounterService.java
>  2ad5a89f03 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidWriteSetService.java
>  5ec513dfd4 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  8fded608d0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e8ac71ae55 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/conf/TestMetastoreConf.java
>  9905a14983 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/txn/TestAcidTxnCleanerService.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/72469/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72469: HIVE-23325: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler

2020-05-05 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72469/
---

(Updated May 5, 2020, 7:42 a.m.)


Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

Consolidate different metastore housekeeper threads into one, move cleaner 
methods out of compactor initiator. 
Create separate txn cleaner service which will need to run more frequently.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 829791e0a9 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 23512e252e 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 48bf8529fa 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 5b8c6701e1 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
eac2c6307f 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
1687425bcb 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java 
e4ff14a140 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
 842b7fe53d 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 7bba8d6ee6 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MaterializationsRebuildLockCleanerTask.java
 d35c9602a6 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java
 e96a7ba289 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java
 c4a488bac0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidOpenTxnsCounterService.java
 2ad5a89f03 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidWriteSetService.java
 5ec513dfd4 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 8fded608d0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 e8ac71ae55 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/conf/TestMetastoreConf.java
 9905a14983 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/txn/TestAcidTxnCleanerService.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/72469/diff/3/

Changes: https://reviews.apache.org/r/72469/diff/2-3/


Testing
---


Thanks,

Marton Bod



Re: Review Request 72469: HIVE-23325: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler

2020-05-04 Thread Marton Bod


> On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote:
> > LGTM , just few questions

Thanks for the review!


> On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
> > Line 153 (original)
> > <https://reviews.apache.org/r/72469/diff/2/?file=2230142#file2230142line155>
> >
> > Cleanup functionality was moved out of the Initiator and is covered by 
> > HousekeeperThreadnow. Is it right?

Yes, that's correct.


> On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java
> > Line 205 (original)
> > <https://reviews.apache.org/r/72469/diff/2/?file=2230147#file2230147line205>
> >
> > Do we have similar test for Housekeeper?

Yes. The various TxnStore operations (e.g. writeset pruning) that 
AcidHouseKeeper executes in sequence are already covered by tests in various 
places (e.g. TestTxnHandler, TestDbTxnManager2).


> On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java
> > Line 31 (original), 32 (patched)
> > <https://reviews.apache.org/r/72469/diff/2/?file=2230151#file2230151line32>
> >
> > What does empty txn mean? Is it obsolete/abandoned/timed-out txns?

It refers to aborted/committed TXNs that do not have any entries in 
TXN_COMPONENTS (hence the word empty).


> On May 4, 2020, 3:21 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java
> > Lines 84 (patched)
> > <https://reviews.apache.org/r/72469/diff/2/?file=2230152#file2230152line84>
> >
> > Old AcidCompactionHistoryService was doing same stuff 
> > (txnHandler.purgeCompactionHistory()). Do we need than 
> > AcidEmptyTxnCleanerService and if yes can we embed it here?

Those operations that can run with the about same frequency (timedout txn 
cleaning, writeset cleaning, compaction history pruning) have been merged and 
consolidated into a single HouseKeeper for simplicity (along with the cleaning 
ops Initiator used to do). However, AcidEmptyTxnCleaner needs to be treated 
separately it has to run significantly more frequently than AcidHouseKeeper due 
to how the new open txn mechanism works.


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72469/#review220594
---


On May 4, 2020, 12:58 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72469/
> ---
> 
> (Updated May 4, 2020, 12:58 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Consolidate different metastore housekeeper threads into one, move cleaner 
> methods out of compactor initiator. 
> Create separate txn cleaner service which will need to run more frequently.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 829791e0a9 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 23512e252e 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 48bf8529fa 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 5b8c6701e1 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
> eac2c6307f 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 1687425bcb 
>   ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java 
> e4ff14a140 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
>  842b7fe53d 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  7bba8d6ee6 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MaterializationsRebuildLockCleanerTask.java
>  d35c9602a6 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java
>  e96a7ba289 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java
>  c4a488bac0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidOpenTxnsCounterService.java
&g

Re: Review Request 72470: ACID: Concurrent MERGE INSERT operations produce duplicates

2020-05-04 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72470/#review220595
---



Looks good to me, just a couple of questions.


ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
Lines 2226 (patched)
<https://reviews.apache.org/r/72470/#comment309077>

nit: the .getLockid() part is a leftover?



ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
Lines 2231 (patched)
<https://reviews.apache.org/r/72470/#comment309080>

nit: this appears in a few places, might make sense to extract



ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
Lines 2241 (patched)
<https://reviews.apache.org/r/72470/#comment309079>

maybe worth running the same tests with 'false' too?


- Marton Bod


On May 4, 2020, 1:32 p.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72470/
> ---
> 
> (Updated May 4, 2020, 1:32 p.m.)
> 
> 
> Review request for hive, Marton Bod, Peter Varga, and Peter Vary.
> 
> 
> Bugs: HIVE-23349
> https://issues.apache.org/jira/browse/HIVE-23349
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 2 concurrent MERGE INSERT operations generate duplicates due to lack of 
> locking.
> MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that 
> doesn't prevent other INSERTs. We should use EXCLUSIVE lock here or 
> EXCL_WRITE if hive.txn.write.xlock=false;
> 
> create table target (a int, b int) stored as orc TBLPROPERTIES 
> ('transactional'='true')");
> insert into target values (1,2), (3,4)
> create table source (a int, b int)
> execute in parallel:
> 
> insert into source values (5,6), (7,8)
> 
> PS:
> 
> Current patch doesn cover following scenario:
> 1) T1 (merge-insert) opens txns & records snapshot;
> 2) T2 (insert/merge-insert) opens txns, records snapshot & locks it;
> 3) T2 commits it's changes and unlocks T1; 
> 4) T1 locks snapshot and validates txn list, however only txns with txnId 
> lower than T1's is beeing considered, T2 changes are ignored -> duplicates 
> are generated.
> 
> 
> merge-insert/merge-insert scenario could be fixed by leveraging write-write 
> conflict check mechanism. We just need to set operation type for merge-insert 
> to update.
> However it won't solve issue with merge-insert/insert. 
> 
> We should consider moving locking before snapshot generation, current 
> snapshot re-check mechanism doesn't handle described scenarios.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 9f59d4cea3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java c1f94d165b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 998c05e37d 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java deaab89c1f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java 
> 3ffdcec528 
>   
> ql/src/test/org/apache/hadoop/hive/ql/lockmgr/DbTxnManagerEndToEndTestBase.java
>  b435e79c3c 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 1687425bcb 
>   ql/src/test/results/clientpositive/llap/explain_locks.q.out d62f6ccafd 
> 
> 
> Diff: https://reviews.apache.org/r/72470/diff/1/
> 
> 
> Testing
> ---
> 
> Manual, added number of merge related test scenarios into TestDbTxnManager2, 
> modified explain_locks.q
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Review Request 72469: HIVE-23325: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler

2020-05-04 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72469/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

Consolidate different metastore housekeeper threads into one, move cleaner 
methods out of compactor initiator. 
Create separate txn cleaner service which will need to run more frequently.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 829791e0a9 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 23512e252e 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 48bf8529fa 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 5b8c6701e1 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
eac2c6307f 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
1687425bcb 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java 
e4ff14a140 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
 842b7fe53d 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 7bba8d6ee6 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MaterializationsRebuildLockCleanerTask.java
 d35c9602a6 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidCompactionHistoryService.java
 e96a7ba289 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidHouseKeeperService.java
 c4a488bac0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidOpenTxnsCounterService.java
 2ad5a89f03 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidWriteSetService.java
 5ec513dfd4 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 8fded608d0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 e8ac71ae55 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/conf/TestMetastoreConf.java
 9905a14983 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/txn/TestAcidEmptyTxnCleanerService.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/72469/diff/1/


Testing
---


Thanks,

Marton Bod



[jira] [Created] (HIVE-23325) Clean up cleanup tasks for TxnHandler/CompactionTxnHandler

2020-04-29 Thread Marton Bod (Jira)
Marton Bod created HIVE-23325:
-

 Summary: Clean up cleanup tasks for TxnHandler/CompactionTxnHandler
 Key: HIVE-23325
 URL: https://issues.apache.org/jira/browse/HIVE-23325
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently there are multiple ways to clean up ACID related stuff:
 * AcidHouseKeeperService
 * AcidWriteSetService
 * AcidCompactionHistoryService
 * Initiator
 * Cleaner
 * etc.

We should consolidate them where possible and improve logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23324) Par\

2020-04-29 Thread Marton Bod (Jira)
Marton Bod created HIVE-23324:
-

 Summary: Par\
 Key: HIVE-23324
 URL: https://issues.apache.org/jira/browse/HIVE-23324
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72436: Locks: Implement zero-wait readers

2020-04-29 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72436/#review220540
---


Ship it!




Ship It!

- Marton Bod


On April 29, 2020, 10:18 a.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72436/
> ---
> 
> (Updated April 29, 2020, 10:18 a.m.)
> 
> 
> Review request for hive, Marton Bod and Peter Vary.
> 
> 
> Bugs: HIVE-23293
> https://issues.apache.org/jira/browse/HIVE-23293
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> With a new lock type (EXCL_WRITE) for INSERT_OVERWRITE, SHARED_READ does not 
> have to wait for any lock - it can fails fast for a pending EXCLUSIVE, 
> because even if there is an EXCL_WRITE or SHARED_WRITE pending, there's no 
> semantic reason to wait for them.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8e643fe844 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java fb5a306ac4 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java b4dac4346e 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> f90396b2a3 
>   ql/src/test/results/clientnegative/insert_into1.q.out 066f04d1cc 
>   ql/src/test/results/clientnegative/insert_into2.q.out b839efeb7d 
>   ql/src/test/results/clientnegative/insert_into3.q.out 91aeb5b9a1 
>   ql/src/test/results/clientnegative/insert_into4.q.out 303fb6aa2e 
>   ql/src/test/results/clientnegative/lockneg_try_drop_locked_db.q.out 
> e66965e693 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockRequest.java
>  7402fb30eb 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockResponse.java
>  fdaab4b394 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  9fb7ff011a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  4f317b3453 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  e64ae0ead2 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockRequestBuilder.java
>  93da0f60ec 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 1e3d6e9b8b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  fe39b0b36e 
> 
> 
> Diff: https://reviews.apache.org/r/72436/diff/6/
> 
> 
> Testing
> ---
> 
> DbTxnManager tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Re: Review Request 72436: Locks: Implement zero-wait readers

2020-04-29 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72436/#review220538
---




ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java
Lines 145 (patched)
<https://reviews.apache.org/r/72436/#comment309021>

do we want to handle the case here when lockstate was NOT_ACQUIRED? e.g. 
log out the errorMsg on the client side



ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
Lines 2632 (patched)
<https://reviews.apache.org/r/72436/#comment309022>

this hardcoded part might break if we ever change the internal 
representation of LockResponse or LockInfo. Would it make sense to instead 
create those objects here and use their toString()?


- Marton Bod


On April 28, 2020, 5:59 p.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72436/
> ---
> 
> (Updated April 28, 2020, 5:59 p.m.)
> 
> 
> Review request for hive, Marton Bod and Peter Vary.
> 
> 
> Bugs: HIVE-23293
> https://issues.apache.org/jira/browse/HIVE-23293
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> With a new lock type (EXCL_WRITE) for INSERT_OVERWRITE, SHARED_READ does not 
> have to wait for any lock - it can fails fast for a pending EXCLUSIVE, 
> because even if there is an EXCL_WRITE or SHARED_WRITE pending, there's no 
> semantic reason to wait for them.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8e643fe844 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java fb5a306ac4 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java b4dac4346e 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> f90396b2a3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockRequest.java
>  7402fb30eb 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockResponse.java
>  fdaab4b394 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  9fb7ff011a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  4f317b3453 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  e64ae0ead2 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockRequestBuilder.java
>  93da0f60ec 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 1e3d6e9b8b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  fe39b0b36e 
> 
> 
> Diff: https://reviews.apache.org/r/72436/diff/5/
> 
> 
> Testing
> ---
> 
> DbTxnManager tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Re: Review Request 72436: Locks: Implement zero-wait readers

2020-04-27 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72436/#review220497
---



LGMT, just few questions


ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
Lines 2553 (patched)
<https://reviews.apache.org/r/72436/#comment308950>

nit: white space plus explanation mentions "wrong code" but it's the whole 
ErrorMsg that's being compared



ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
Lines 2555 (patched)
<https://reviews.apache.org/r/72436/#comment308949>

perhaps move this errorMsg into a constant since it's being referenced more 
than once (or into ErrorMsg)



ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
Line 2559 (original), 2588 (patched)
<https://reviews.apache.org/r/72436/#comment308955>

it might make sense to extract this setup/query running part into a common 
method, and put the different assertions into testFairness and 
testFairnessZeroWaitRead, that might make the assertion part easier to follow. 
Same above



standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockRequest.java
Lines 59 (patched)
<https://reviews.apache.org/r/72436/#comment308951>

quick question: does this need to be part of the API, or can we just derive 
it from the conf vars (TXN_WRITE_X_LOCK, TXN_OVERWRITE_X_LOCK) as we do in the 
tests?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 4389 (patched)
<https://reviews.apache.org/r/72436/#comment308952>

do we need this delete? if the txn is aborted, wouldn't all the locks be 
removed in abortTxn?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 4391 (patched)
<https://reviews.apache.org/r/72436/#comment308953>

shouldn't we check blockedBy for certainty that it was an exclusive lock 
that blocked us? (and log the anomaly if not)


- Marton Bod


On April 27, 2020, 9:01 a.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72436/
> ---
> 
> (Updated April 27, 2020, 9:01 a.m.)
> 
> 
> Review request for hive, Marton Bod and Peter Vary.
> 
> 
> Bugs: HIVE-23293
> https://issues.apache.org/jira/browse/HIVE-23293
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> With a new lock type (EXCL_WRITE) for INSERT_OVERWRITE, SHARED_READ does not 
> have to wait for any lock - it can fails fast for a pending EXCLUSIVE, 
> because even if there is an EXCL_WRITE or SHARED_WRITE pending, there's no 
> semantic reason to wait for them.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java b4dac4346e 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 497cedd61f 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockRequest.java
>  7402fb30eb 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  9fb7ff011a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  4f317b3453 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  e64ae0ead2 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockRequestBuilder.java
>  93da0f60ec 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 1e3d6e9b8b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  580786832e 
> 
> 
> Diff: https://reviews.apache.org/r/72436/diff/1/
> 
> 
> Testing
> ---
> 
> DbTxnManager tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



[jira] [Created] (HIVE-23297) Precompile statements where needed across TxnHandler

2020-04-24 Thread Marton Bod (Jira)
Marton Bod created HIVE-23297:
-

 Summary: Precompile statements where needed across TxnHandler
 Key: HIVE-23297
 URL: https://issues.apache.org/jira/browse/HIVE-23297
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


There are multiple places in TxnHandler that could benefit from pre-compiling 
sql queries using prepared statements. Some queries are complex in structure 
(e.g. checkLock) and we use string concats to recreate them every time the 
query is to run, in addition to re-parsing the query structure as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23283) Generate random temp ID for lock enqueue and commitTxn

2020-04-23 Thread Marton Bod (Jira)
Marton Bod created HIVE-23283:
-

 Summary: Generate random temp ID for lock enqueue and commitTxn
 Key: HIVE-23283
 URL: https://issues.apache.org/jira/browse/HIVE-23283
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


In order to optimize the S4U scope of enqueue lock and commitTxn, currently a 
hardcoded constant (-1) is used to first insert all the lock and ws entries 
with a temporary lockID/commitID. However, in a concurrent environment this 
seems to cause some performance degradation (and deadlock issues with some 
rdbms) as multiple concurrent transactions are trying to insert rows with the 
same primary key (e.g. (-1, 1), (-1, 2), (-1, 3), .. etc. for (extID/intID) in 
HIVE_LOCKS). The proposed solution is to replace the constant with a random 
generated negative number, which seems to resolve this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72378: HIVE-23201: Improve logging in locking

2020-04-23 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72378/
---

(Updated April 23, 2020, 8:56 a.m.)


Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

HIVE-23201: Improve logging in locking


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 e7910c1c5d 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
 b3a1f826bb 


Diff: https://reviews.apache.org/r/72378/diff/5/

Changes: https://reviews.apache.org/r/72378/diff/4-5/


Testing
---

Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/


Thanks,

Marton Bod



Re: Review Request 72378: HIVE-23201: Improve logging in locking

2020-04-23 Thread Marton Bod


> On April 22, 2020, 11:46 a.m., Denys Kuzmenko wrote:
> > Looks good, however I don't really like change with 
> > REQ_LOCKS_ALIASED_FIELDS as it generats much bigger query (keep in mind 
> > there 3 of them joined by union, will be 4 for excl_write) + we are 
> > transfering more data that we are not very interested in. To get more 
> > information about the lock we can simply run show locks.

Thanks Denys. As discussed, there was a misunderstanding related to logging out 
the blocked lock in more detail. 
I've reverted that part and updated the diff.


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72378/#review220419
---


On April 17, 2020, 11:30 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72378/
> ---
> 
> (Updated April 17, 2020, 11:30 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-23201: Improve logging in locking
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  e7910c1c5d 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  b3a1f826bb 
> 
> 
> Diff: https://reviews.apache.org/r/72378/diff/4/
> 
> 
> Testing
> ---
> 
> Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72378: HIVE-23201: Improve logging in locking

2020-04-22 Thread Marton Bod


> On April 21, 2020, 10:38 a.m., Denys Kuzmenko wrote:
> > LGTM, just number of comments

Thanks for the review Denys! 
As per your comments, I have elevated the log levels, removed the less useful 
log lines and changed the original statements to prepared statements.


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72378/#review220389
---


On April 17, 2020, 11:30 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72378/
> ---
> 
> (Updated April 17, 2020, 11:30 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-23201: Improve logging in locking
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  d080df417b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  b3a1f826bb 
> 
> 
> Diff: https://reviews.apache.org/r/72378/diff/2/
> 
> 
> Testing
> ---
> 
> Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72392: HIVE-23103 Oracle statement batching

2020-04-21 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72392/#review220401
---


Ship it!




Ship It!

- Marton Bod


On April 21, 2020, 12:41 p.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72392/
> ---
> 
> (Updated April 21, 2020, 12:41 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Marton Bod.
> 
> 
> Bugs: HIVE-23103
> https://issues.apache.org/jira/browse/HIVE-23103
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Examine how to really get better performance for oracle statement batches.
> 
> Oracle JDBC doc describes:
> 
> The Oracle implementation of standard update batching does not implement true 
> batching for generic statements and callable statements. Even though Oracle 
> JDBC supports the use of standard batching for Statement and 
> CallableStatement objects, you are unlikely to see performance improvement.
> 
> I would look for connection properties to set, so it is handled anyway, or if 
> not, then use:
> 
> begin
>   query1;
>   query2;
>   query3;
> end;
> to we will have only a single roundtrip for the db.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
>  bb29410e7d 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  d080df417b 
> 
> 
> Diff: https://reviews.apache.org/r/72392/diff/2/
> 
> 
> Testing
> ---
> 
> Baseline:
> Benchmark(dbProduct)  (txnType)  Mode  Cnt   Score   
> Error  Units
> TxnHandlerBenchRunner.commitTxn   ORACLEDEFAULTss  100  42.988 ± 
> 4.569  ms/op
> TxnHandlerBenchRunner.commitTxn   ORACLE  READ_ONLYss  100  45.029 ± 
> 4.686  ms/op
> 
> After patch:
> Benchmark(dbProduct)  (txnType)  Mode  Cnt   Score   
> Error  Units
> TxnHandlerBenchRunner.commitTxn   ORACLEDEFAULTss  100  36.208 ± 
> 3.869  ms/op
> TxnHandlerBenchRunner.commitTxn   ORACLE  READ_ONLYss  100  37.038 ± 
> 3.746  ms/op
> 
> 
> Thanks,
> 
> Peter Vary
> 
>



Re: Review Request 72392: HIVE-23103 Oracle statement batching

2020-04-21 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72392/#review220387
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
Lines 662 (patched)
<https://reviews.apache.org/r/72392/#comment308749>

Once this part executes, we wouldn't have 'begin' for the next batch, no? 
Also, the sb would need to be cleared I think


- Marton Bod


On April 20, 2020, 2:53 p.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72392/
> ---
> 
> (Updated April 20, 2020, 2:53 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Marton Bod.
> 
> 
> Bugs: HIVE-23103
> https://issues.apache.org/jira/browse/HIVE-23103
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Examine how to really get better performance for oracle statement batches.
> 
> Oracle JDBC doc describes:
> 
> The Oracle implementation of standard update batching does not implement true 
> batching for generic statements and callable statements. Even though Oracle 
> JDBC supports the use of standard batching for Statement and 
> CallableStatement objects, you are unlikely to see performance improvement.
> 
> I would look for connection properties to set, so it is handled anyway, or if 
> not, then use:
> 
> begin
>   query1;
>   query2;
>   query3;
> end;
> to we will have only a single roundtrip for the db.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
>  bb29410e7d 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  d080df417b 
> 
> 
> Diff: https://reviews.apache.org/r/72392/diff/1/
> 
> 
> Testing
> ---
> 
> Baseline:
> Benchmark(dbProduct)  (txnType)  Mode  Cnt   Score   
> Error  Units
> TxnHandlerBenchRunner.commitTxn   ORACLEDEFAULTss  100  42.988 ± 
> 4.569  ms/op
> TxnHandlerBenchRunner.commitTxn   ORACLE  READ_ONLYss  100  45.029 ± 
> 4.686  ms/op
> 
> After patch:
> Benchmark(dbProduct)  (txnType)  Mode  Cnt   Score   
> Error  Units
> TxnHandlerBenchRunner.commitTxn   ORACLEDEFAULTss  100  36.208 ± 
> 3.869  ms/op
> TxnHandlerBenchRunner.commitTxn   ORACLE  READ_ONLYss  100  37.038 ± 
> 3.746  ms/op
> 
> 
> Thanks,
> 
> Peter Vary
> 
>



Re: Review Request 72378: HIVE-23201: Improve logging in locking

2020-04-20 Thread Marton Bod


> On April 17, 2020, 1:46 p.m., Peter Vary wrote:
> > Mostly agree, few comments.
> > I would like to ask you to go through them, and if there are any places 
> > where the problem should not happen use at least info level.
> > 
> > Thanks,
> > Peter

Thanks for the review Peter! I've made changes based on your comments


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72378/#review220347
---


On April 17, 2020, 11:30 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72378/
> ---
> 
> (Updated April 17, 2020, 11:30 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-23201: Improve logging in locking
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  d080df417b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  b3a1f826bb 
> 
> 
> Diff: https://reviews.apache.org/r/72378/diff/2/
> 
> 
> Testing
> ---
> 
> Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



[jira] [Created] (HIVE-23236) Remove the global lock from acquireLock

2020-04-17 Thread Marton Bod (Jira)
Marton Bod created HIVE-23236:
-

 Summary: Remove the global lock from acquireLock
 Key: HIVE-23236
 URL: https://issues.apache.org/jira/browse/HIVE-23236
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently we have a global lock (NEXT_LOCK_ID) when running enqueueLock, 
because the algorithm in checkLock requires the locks to have a well defined 
order, and also requires that every lock component is already stored in the 
RDBMS before checking the locks.

Proposed approach:
 * Enqueue lock without a global lock, using a sequence for getting the next 
lock ID
 * Insert the locks as usual, but with an additional timestamp (enqueued_at)
 * Check lock should check all enqueued locks for the given db/table/partition 
but order should be determined based on the enqueued_at time instead of the 
lockID
 * If there is no blocking lock, update the state to acquired as usual



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72378: HIVE-23201: Improve logging in locking

2020-04-17 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72378/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

HIVE-23201: Improve logging in locking


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 4b6bc3e1e3 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 962a63d418 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
 b3a1f826bb 


Diff: https://reviews.apache.org/r/72378/diff/1/


Testing
---

Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21717/


Thanks,

Marton Bod



[jira] [Created] (HIVE-23234) Optimize TxnHandler::allocateTableWriteIds

2020-04-17 Thread Marton Bod (Jira)
Marton Bod created HIVE-23234:
-

 Summary: Optimize TxnHandler::allocateTableWriteIds
 Key: HIVE-23234
 URL: https://issues.apache.org/jira/browse/HIVE-23234
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod


Table write id allocation should be examined and optimized. One low hanging 
fruit is batching all the PreparedStatement inserts, but there might be other 
opportunities as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72360: HIVE-23093: Create new metastore config value for jdbc max batch size

2020-04-16 Thread Marton Bod


> On April 16, 2020, 9:24 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
> > Lines 650-651 (patched)
> > <https://reviews.apache.org/r/72360/diff/1/?file=2217548#file2217548line650>
> >
> > The text describing the config value is for administrators and not for 
> > developers. So this part should be a java comment instead of being part of 
> > the description

moved into a comment


> On April 16, 2020, 9:24 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 2520-2525 (patched)
> > <https://reviews.apache.org/r/72360/diff/1/?file=2217550#file2217550line2520>
> >
> > I would do it in setConf, like we did with the numOpenTxns and friends

makes perfect sense, updated the diff


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72360/#review220331
-----------


On April 14, 2020, 2:25 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72360/
> ---
> 
> (Updated April 14, 2020, 2:25 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-23093: Create new metastore config value for jdbc max batch size
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
>  3bfb0e69cb 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
>  a66e16973f 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  331fd4cc8d 
> 
> 
> Diff: https://reviews.apache.org/r/72360/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



[jira] [Created] (HIVE-23201) Improve logging in locking

2020-04-14 Thread Marton Bod (Jira)
Marton Bod created HIVE-23201:
-

 Summary: Improve logging in locking
 Key: HIVE-23201
 URL: https://issues.apache.org/jira/browse/HIVE-23201
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently it can be quite difficult to troubleshoot issues related to locking. 
To understand why a particular txn gave up after a while on acquiring a lock, 
you have to connect directly to the backend DB, since we are not logging right 
now which exact locks the txn is waiting for.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72360: HIVE-23093: Create new metastore config value for jdbc max batch size

2020-04-14 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72360/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

HIVE-23093: Create new metastore config value for jdbc max batch size


Diffs
-

  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
 3bfb0e69cb 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
 620c77e589 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 7d0db0c3a0 


Diff: https://reviews.apache.org/r/72360/diff/1/


Testing
---


Thanks,

Marton Bod



Re: Review Request 72359: HIVE-23104: Minimize critical paths of TxnHandler::commitTxn and abortTxn

2020-04-14 Thread Marton Bod


> On April 14, 2020, 11:22 a.m., Peter Vary wrote:
> > Thanks for the patch Marton!
> > Some questions, ideas.
> > 
> > Thanks,
> > Peter

Thanks for the review Peter! 
See my answers below and the updated diff


> On April 14, 2020, 11:22 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 1248 (original), 1228 (patched)
> > <https://reviews.apache.org/r/72359/diff/1/?file=2217523#file2217523line1267>
> >
> > Is this 'rs' not reused later? Maybe use a local scoped rs here?

No, it's not reused anymore. I've made it locally-scoped.


> On April 14, 2020, 11:22 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 1325 (patched)
> > <https://reviews.apache.org/r/72359/diff/1/?file=2217523#file2217523line1388>
> >
> > Not really important since this is a private method, but why not return 
> > boolean instead?

Would be nice, but we need the ResultSet to log out the details of what exact 
conflict has been found.


> On April 14, 2020, 11:22 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 1351 (patched)
> > <https://reviews.apache.org/r/72359/diff/1/?file=2217523#file2217523line1414>
> >
> > Maybe log the query in debug level?

Sure


> On April 14, 2020, 11:22 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 1375 (patched)
> > <https://reviews.apache.org/r/72359/diff/1/?file=2217523#file2217523line1438>
> >
> > Might worth to initialize with a size

Done


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72359/#review220308
---


On April 14, 2020, 8:57 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72359/
> ---
> 
> (Updated April 14, 2020, 8:57 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-23104: Minimize critical paths of TxnHandler::commitTxn and abortTxn
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  7d0db0c3a0 
> 
> 
> Diff: https://reviews.apache.org/r/72359/diff/1/
> 
> 
> Testing
> ---
> 
> Green build: 
> https://builds.apache.org/job/PreCommit-HIVE-Build/21539/testReport
> Benchmark results attached to ticket: 
> https://issues.apache.org/jira/browse/HIVE-23104
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Review Request 72359: HIVE-23104: Minimize critical paths of TxnHandler::commitTxn and abortTxn

2020-04-14 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72359/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

HIVE-23104: Minimize critical paths of TxnHandler::commitTxn and abortTxn


Diffs
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 7d0db0c3a0 


Diff: https://reviews.apache.org/r/72359/diff/1/


Testing
---

Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21539/testReport
Benchmark results attached to ticket: 
https://issues.apache.org/jira/browse/HIVE-23104


Thanks,

Marton Bod



Re: Review Request 72324: HIVE-22750: Consolidate LockType naming

2020-04-08 Thread Marton Bod


> On April 8, 2020, 3:23 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java
> > Line 60 (original), 60 (patched)
> > <https://reviews.apache.org/r/72324/diff/3/?file=2217221#file2217221line60>
> >
> > add builder for ExclWrite

added


> On April 8, 2020, 3:23 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java
> > Lines 14 (patched)
> > <https://reviews.apache.org/r/72324/diff/3/?file=2217225#file2217225line14>
> >
> > why not enum?

That was my initial approach too, but in the end, having a second enum required 
lots of mapping between the two enums, and both having the same value set 
caused quite a bit of duplication too. I think by using this util class, we can 
more cleanly extend the LockType enum with any necessary state and behaviour, 
without introducing duplication.


> On April 8, 2020, 3:23 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java
> > Lines 34 (patched)
> > <https://reviews.apache.org/r/72324/diff/3/?file=2217225#file2217225line34>
> >
> > you can move inverse into static and do no calc every time

good spot, fixed


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72324/#review220258
---


On April 8, 2020, 3:09 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72324/
> ---
> 
> (Updated April 8, 2020, 3:09 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22750: Consolidate LockType naming
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java
>  e249b7775e 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/Lock.java
>  52eb6133e7 
>   metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql 03540bba4d 
>   metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql fa518747de 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 
>   ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
> 72f095d264 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 80fb1aff78 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockType.java
>  8ae4351129 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  db4cfb996a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  cf3137928f 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  849970eb56 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java
>  c739d4d196 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 098ddec5dc 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  7d0db0c3a0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  da38a6bbd3 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
>  1dfc105958 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtilTest.java
>  PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 
> d44065018f 
> 
> 
> Diff: https://reviews.apache.org/r/72324/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72324: HIVE-22750: Consolidate LockType naming

2020-04-08 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72324/
---

(Updated April 8, 2020, 3:09 p.m.)


Review request for hive, Denys Kuzmenko, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
---

HIVE-22750: Consolidate LockType naming


Diffs (updated)
-

  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java
 e249b7775e 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/Lock.java
 52eb6133e7 
  metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql 03540bba4d 
  metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql fa518747de 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 
  ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
72f095d264 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
80fb1aff78 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockType.java
 8ae4351129 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
 db4cfb996a 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
 cf3137928f 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
 849970eb56 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java
 c739d4d196 
  standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
098ddec5dc 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 7d0db0c3a0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
 da38a6bbd3 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java
 PRE-CREATION 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
 1dfc105958 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtilTest.java
 PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/TransactionBatch.java d44065018f 


Diff: https://reviews.apache.org/r/72324/diff/3/

Changes: https://reviews.apache.org/r/72324/diff/2-3/


Testing
---


Thanks,

Marton Bod



Review Request 72324: HIVE-22750: Consolidate LockType naming

2020-04-06 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72324/
---

Review request for hive, Denys Kuzmenko, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
---

HIVE-22750: Consolidate LockType naming


Diffs
-

  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java
 e249b7775e 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/Lock.java
 52eb6133e7 
  metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql 03540bba4d 
  metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql fa518747de 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 
  ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
72f095d264 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
80fb1aff78 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockType.java
 8ae4351129 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
 db4cfb996a 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
 cf3137928f 
  
standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
 849970eb56 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java
 c739d4d196 
  standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
098ddec5dc 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 e8a988cca8 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
 da38a6bbd3 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java
 PRE-CREATION 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
 1dfc105958 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtilTest.java
 PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/TransactionBatch.java d44065018f 


Diff: https://reviews.apache.org/r/72324/diff/1/


Testing
---


Thanks,

Marton Bod



Re: Review Request 72290: HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns

2020-04-01 Thread Marton Bod


> On April 1, 2020, 9:02 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 1379 (patched)
> > <https://reviews.apache.org/r/72290/diff/1/?file=2216627#file2216627line1410>
> >
> > Will this issue unnecessary queries for read only queries? For Oracle 
> > this could increase exexution time
> > 
> > Also why not use executeQueryiesInBatch for this?

As discussed, in the current state, these delete queries are executed every 
time during commitTxn (not during abortTxn). Regarding reusing 
executeQueryiesInBatch, that's a good point, will do.


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72290/#review220167
-----------


On April 1, 2020, 6:53 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72290/
> ---
> 
> (Updated April 1, 2020, 6:53 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  74ef88545e 
> 
> 
> Diff: https://reviews.apache.org/r/72290/diff/1/
> 
> 
> Testing
> ---
> 
> Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21347/
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72290: HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns

2020-04-01 Thread Marton Bod


> On April 1, 2020, 7:38 a.m., Denys Kuzmenko wrote:
> > LGTM, just a few comments.

Thanks for the review!


> On April 1, 2020, 7:38 a.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 1300 (original), 1305 (patched)
> > <https://reviews.apache.org/r/72290/diff/1/?file=2216627#file2216627line1306>
> >
> > Use setLong, setObject would require type inference

sure


> On April 1, 2020, 7:38 a.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 1390 (patched)
> > <https://reviews.apache.org/r/72290/diff/1/?file=2216627#file2216627line1421>
> >
> > Not 100% sure, I think Laci P was working on removal of 
> > MIN_HISTORY_LEVEL table (HIVE-23107), please check with him.

yes he is, I will rebase onto his change once it's committed


> On April 1, 2020, 7:38 a.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 4295 (patched)
> > <https://reviews.apache.org/r/72290/diff/1/?file=2216627#file2216627line4335>
> >
> > Please extract this into 
> > org.apache.hadoop.hive.metastore.tools.SQLGenerator

as discussed, let's move it to TxnDbUtil


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72290/#review220164
---


On April 1, 2020, 6:53 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72290/
> ---
> 
> (Updated April 1, 2020, 6:53 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  74ef88545e 
> 
> 
> Diff: https://reviews.apache.org/r/72290/diff/1/
> 
> 
> Testing
> ---
> 
> Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21347/
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Review Request 72290: HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns

2020-04-01 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72290/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

HIVE-23067: Use batch DB calls in TxnHandler for commitTxn and abortTxns


Diffs
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 74ef88545e 


Diff: https://reviews.apache.org/r/72290/diff/1/


Testing
---

Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21347/


Thanks,

Marton Bod



Re: Review Request 72264: HIVE-23052: Optimize lock enqueueing in TxnHandler

2020-03-31 Thread Marton Bod


> On March 31, 2020, 9:35 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 199 (patched)
> > <https://reviews.apache.org/r/72264/diff/2/?file=2216464#file2216464line199>
> >
> > %d could be problematic with different locales

I will change to be on the safe side


> On March 31, 2020, 9:35 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 2412 (patched)
> > <https://reviews.apache.org/r/72264/diff/2/?file=2216464#file2216464line2416>
> >
> > nit: unnecessary move, might make backporting harder

sure, makes sense


> On March 31, 2020, 9:35 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 2545 (patched)
> > <https://reviews.apache.org/r/72264/diff/2/?file=2216464#file2216464line2550>
> >
> > Is it possible to collect the all of the data before starting the RDBMS 
> > query, and get all of the data for a enqueueLockRequest in one go?

As discussed, good idea, but it would overcomplicate the code and the query 
structure, whereas usually only 1-2 dbName-tblName combos are locked in one go


> On March 31, 2020, 9:35 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 2599 (patched)
> > <https://reviews.apache.org/r/72264/diff/2/?file=2216464#file2216464line2652>
> >
> > Do we need this? We do not commit temp lock id, so we can have just a 
> > single value which definitely will not appear in the table

makes sense, will change to a cheaper placeholder value


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72264/#review220146
---


On March 24, 2020, 2:24 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72264/
> ---
> 
> (Updated March 24, 2020, 2:24 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-23052: Optimize lock enqueueing in TxnHandler
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java
>  f92ce7325e 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java
>  85719fdf84 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java
>  76bbf3bc1e 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  74ef88545e 
> 
> 
> Diff: https://reviews.apache.org/r/72264/diff/2/
> 
> 
> Testing
> ---
> 
> Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21249/
> Custom benchmark results attached to ticket: 
> https://issues.apache.org/jira/browse/HIVE-23052
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



[jira] [Created] (HIVE-23104) Minimize critical paths of TxnHandler::commitTxn and abortTxn

2020-03-30 Thread Marton Bod (Jira)
Marton Bod created HIVE-23104:
-

 Summary: Minimize critical paths of TxnHandler::commitTxn and 
abortTxn
 Key: HIVE-23104
 URL: https://issues.apache.org/jira/browse/HIVE-23104
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod


Investigate whether any code sections in TxnHandler::commitTxn and abortTxn can 
be lifted out/executed async in order to reduce the overall execution time of 
these methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23093) Create new metastore config value for jdbc max batch size

2020-03-27 Thread Marton Bod (Jira)
Marton Bod created HIVE-23093:
-

 Summary: Create new metastore config value for jdbc max batch size
 Key: HIVE-23093
 URL: https://issues.apache.org/jira/browse/HIVE-23093
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


In order to reduce the number of network calls to the metastore DB, there is an 
effort to batch acid/locking DB calls where it makes sense. Existing metastore 
config params do not provide a suitable, generic option to control the jdbc 
statement max batch size. Solution is to create a new config param, and use 
that in the appropriate batching calls in TxnHandler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72264: HIVE-23052: Optimize lock enqueueing in TxnHandler

2020-03-24 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72264/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

HIVE-23052: Optimize lock enqueueing in TxnHandler


Diffs
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 06defdb910 


Diff: https://reviews.apache.org/r/72264/diff/1/


Testing
---

Green build: https://builds.apache.org/job/PreCommit-HIVE-Build/21249/
Custom benchmark tests with local and remote DB


Thanks,

Marton Bod



[jira] [Created] (HIVE-23067) Use batch inserts in TxnHandler

2020-03-23 Thread Marton Bod (Jira)
Marton Bod created HIVE-23067:
-

 Summary: Use batch inserts in TxnHandler
 Key: HIVE-23067
 URL: https://issues.apache.org/jira/browse/HIVE-23067
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


To reduce the number of database calls and network roundtrips, we could use 
more batching in TxnHandler, where currently in many places we call insert 
commands in loops sequentially.

Some examples:
 * openTxns (TXNS, REPL_TXN_MAP
 * commitTxn (COMPLETED_TXN_COMPONENTS)
 * replTableWriteIdState (TXN_TO_WRITE_ID)
 * allocateTableWriteIds (TXN_TO_WRITE_ID)
 * 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23052) Optimize lock enqueueing in TxnHandler

2020-03-19 Thread Marton Bod (Jira)
Marton Bod created HIVE-23052:
-

 Summary: Optimize lock enqueueing in TxnHandler
 Key: HIVE-23052
 URL: https://issues.apache.org/jira/browse/HIVE-23052
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


TxnHandler::enqueueLockWithRetry uses select for update (S4U) to retrieve the 
next lock id from the metastore. This locks the NEXT_LOCK_ID table for 
concurrent lock requests until the transaction has been committed.

Since this creates contention, we should reduce the scope of this S4U, thereby 
locking the table for only as long as necessary. Currently, within the same 
txn, after the initial S4U, we also insert the transaction components 
(TXN_COMPONENTS table) before committing, even though they are independent and 
do not make use of the lock id. So factoring out these txn_component inserts 
should decrease the time the next_lock_id table is locked and thus increase 
concurrency.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72181: HIVE-22832: Parallelise direct insert directory cleaning process

2020-03-02 Thread Marton Bod

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72181/
---

Review request for hive, Marta Kuczora and Peter Vary.


Repository: hive-git


Description
---

HIVE-22832: Parallelise direct insert directory cleaning process


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e9966e6364 


Diff: https://reviews.apache.org/r/72181/diff/1/


Testing
---

pre-commit build success: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20874/


Thanks,

Marton Bod



[jira] [Created] (HIVE-22938) Investigate possibility of removing empty bucket file creation mechanism in MR

2020-02-27 Thread Marton Bod (Jira)
Marton Bod created HIVE-22938:
-

 Summary: Investigate possibility of removing empty bucket file 
creation mechanism in MR
 Key: HIVE-22938
 URL: https://issues.apache.org/jira/browse/HIVE-22938
 Project: Hive
  Issue Type: Task
Reporter: Marton Bod


As a follow-up to 
[HIVE-22918|https://issues.apache.org/jira/browse/HIVE-22918], this ticket is 
to investigate whether the empty bucket file creation mechanism can be removed 
safely from MR. 

For a bucketed table of N buckets, each insert will generate N bucket files in 
the delta directory, regardless of how many actual buckets are written to. As 
an example, if a table has 500 buckets, and we insert a single record, 499 
empty bucket files are generated alongside the single bucket that contains the 
actual data. This makes the operation substantially slower in some cases. This 
behaviour only seems to happen when using MR as the execution engine.

Some components/parts of the code might depend on this behaviour though, so it 
needs to be verified that removing this logic does not interfere with anything.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >