Re: Review Request 71091: ACID: getAcidState() should cache a recursive dir listing locally
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71091/ --- (Updated July 22, 2019, 5:35 p.m.) Review request for hive, Gopal V and Vineet Garg. Bugs: HIVE-21225 https://issues.apache.org/jira/browse/HIVE-21225 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-21225 Diffs (updated) - hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java 4dc04f46fd hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingAssert.java 78cae7263b itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java d59cfe51e9 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 295fe7cbd0 ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java 9d5ba3d310 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cff7e04b9a ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 707e38c321 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java b1ede0556f ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java 15f1f945ce ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9d631ed43d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 57eb506996 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 67a5e6de46 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 6168fc0f79 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d4abf4277b ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java ea31557741 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java b5958fa9cc ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java 8451462023 streaming/src/test/org/apache/hive/streaming/TestStreaming.java c6d7e7f27c Diff: https://reviews.apache.org/r/71091/diff/3/ Changes: https://reviews.apache.org/r/71091/diff/2-3/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 71091: ACID: getAcidState() should cache a recursive dir listing locally
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71091/ --- (Updated July 18, 2019, 10:21 p.m.) Review request for hive, Gopal V and Vineet Garg. Bugs: HIVE-21225 https://issues.apache.org/jira/browse/HIVE-21225 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-21225 Diffs (updated) - hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java 4dc04f46fd hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingAssert.java 78cae7263b itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java d59cfe51e9 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 295fe7cbd0 ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java 9d5ba3d310 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cff7e04b9a ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 707e38c321 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java b1ede0556f ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java 15f1f945ce ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9d631ed43d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 57eb506996 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 67a5e6de46 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 6168fc0f79 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d4abf4277b ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java c5faec5e95 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java b5958fa9cc ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java 8451462023 streaming/src/test/org/apache/hive/streaming/TestStreaming.java c6d7e7f27c Diff: https://reviews.apache.org/r/71091/diff/2/ Changes: https://reviews.apache.org/r/71091/diff/1-2/ Testing --- Thanks, Vaibhav Gumashta
Review Request 71091: ACID: getAcidState() should cache a recursive dir listing locally
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71091/ --- Review request for hive, Gopal V and Vineet Garg. Bugs: HIVE-21225 https://issues.apache.org/jira/browse/HIVE-21225 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-21225 Diffs - hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java 4dc04f46fd hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingAssert.java 78cae7263b itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java d59cfe51e9 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 295fe7cbd0 ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java 9d5ba3d310 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cff7e04b9a ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 707e38c321 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java b1ede0556f ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java 15f1f945ce ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9d631ed43d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 57eb506996 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 67a5e6de46 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 6168fc0f79 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d4abf4277b ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java c5faec5e95 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java b5958fa9cc ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java 8451462023 streaming/src/test/org/apache/hive/streaming/TestStreaming.java c6d7e7f27c Diff: https://reviews.apache.org/r/71091/diff/1/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 71044: ACID: explore how we can avoid a move step during inserts/compaction
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71044/ --- (Updated July 12, 2019, 9:40 p.m.) Review request for hive and Gopal V. Bugs: HIVE-21164 https://issues.apache.org/jira/browse/HIVE-21164 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-21164 Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 5fd0ef9161 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java d59cfe51e9 ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java bb89f803d5 ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 695d08bbe2 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 437266355a ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 295fe7cbd0 ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c4c56f8477 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 3fa61d3560 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 691f3ee2e9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5d6143d6a4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d47457857c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 757cb7af4d ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 61ea28a5f5 ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 78f25856a4 ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java a75103d60d ql/src/test/results/clientpositive/acid_subquery.q.out 1dc1775557 ql/src/test/results/clientpositive/create_transactional_full_acid.q.out e324d5ec43 ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_dynamic.q.out 61b0057adb ql/src/test/results/clientpositive/llap/acid_no_buckets.q.out ae1d97fa21 ql/src/test/results/clientpositive/llap/insert_overwrite.q.out fbc3326b39 ql/src/test/results/clientpositive/llap/mm_all.q.out 6cb34e2c79 ql/src/test/results/clientpositive/mm_all.q.out 2c0247a539 Diff: https://reviews.apache.org/r/71044/diff/4/ Changes: https://reviews.apache.org/r/71044/diff/3-4/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-21990) ACID: remove any difference between an mm table insert and full acid table insert
Vaibhav Gumashta created HIVE-21990: --- Summary: ACID: remove any difference between an mm table insert and full acid table insert Key: HIVE-21990 URL: https://issues.apache.org/jira/browse/HIVE-21990 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Vaibhav Gumashta HIVE-21164 makes acid insert work like mm-table insert by writing directly to the destination and using manifest files to track committed files in a task and the job. After that, while there should be no difference in the insert code paths, there may be some cases where the difference remains (e.g. HIVE-17695). This jira will investigate any such issues and fix it. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
Re: Review Request 71044: ACID: explore how we can avoid a move step during inserts/compaction
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71044/ --- (Updated July 12, 2019, 10:58 a.m.) Review request for hive and Gopal V. Bugs: HIVE-21164 https://issues.apache.org/jira/browse/HIVE-21164 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-21164 Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 5fd0ef9161 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java d59cfe51e9 ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java bb89f803d5 ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 695d08bbe2 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 437266355a ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 295fe7cbd0 ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c4c56f8477 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 3fa61d3560 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 691f3ee2e9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5d6143d6a4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d47457857c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 757cb7af4d ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 61ea28a5f5 ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 78f25856a4 ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java a75103d60d Diff: https://reviews.apache.org/r/71044/diff/3/ Changes: https://reviews.apache.org/r/71044/diff/2-3/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 71044: ACID: explore how we can avoid a move step during inserts/compaction
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71044/ --- (Updated July 11, 2019, 5:02 a.m.) Review request for hive and Gopal V. Bugs: HIVE-21164 https://issues.apache.org/jira/browse/HIVE-21164 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-21164 Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 5fd0ef9161 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java d59cfe51e9 ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java bb89f803d5 ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 695d08bbe2 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 437266355a ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 295fe7cbd0 ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c4c56f8477 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 3fa61d3560 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 691f3ee2e9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5d6143d6a4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d47457857c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 757cb7af4d ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 61ea28a5f5 ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 78f25856a4 ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java a75103d60d Diff: https://reviews.apache.org/r/71044/diff/2/ Changes: https://reviews.apache.org/r/71044/diff/1-2/ Testing --- Thanks, Vaibhav Gumashta
Review Request 71044: ACID: explore how we can avoid a move step during inserts/compaction
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71044/ --- Review request for hive and Gopal V. Bugs: HIVE-21164 https://issues.apache.org/jira/browse/HIVE-21164 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-21164 Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 5fd0ef9161 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java d59cfe51e9 ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java bb89f803d5 ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 695d08bbe2 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1346bed5a7 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 295fe7cbd0 ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c4c56f8477 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 3fa61d3560 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 691f3ee2e9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5d6143d6a4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 7c58072413 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 757cb7af4d ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 61ea28a5f5 ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 78f25856a4 ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java a75103d60d Diff: https://reviews.apache.org/r/71044/diff/1/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-21757) ACID: use a new write id for compaction's output instead of the visibility id
Vaibhav Gumashta created HIVE-21757: --- Summary: ACID: use a new write id for compaction's output instead of the visibility id Key: HIVE-21757 URL: https://issues.apache.org/jira/browse/HIVE-21757 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Vaibhav Gumashta HIVE-20823 added support for running compaction within a transaction. To control the visibility of the output directory, it uses base_writeId_visibilityId, where visibilityId is the transaction id of the transaction that the compactor ran in. Perhaps we can keep using the base_writeId format, by allocating a new writeId for the compactor and creating the new base/delta with that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21749) ACID: Provide an option to run Cleaner thread from Hive client
Vaibhav Gumashta created HIVE-21749: --- Summary: ACID: Provide an option to run Cleaner thread from Hive client Key: HIVE-21749 URL: https://issues.apache.org/jira/browse/HIVE-21749 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Vaibhav Gumashta In some cases, it could be useful to trigger the cleaner thread manually. We should provide an option for that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21513) ACID: Running merge concurrently with minor compaction causes a later select * to throw exception
Vaibhav Gumashta created HIVE-21513: --- Summary: ACID: Running merge concurrently with minor compaction causes a later select * to throw exception Key: HIVE-21513 URL: https://issues.apache.org/jira/browse/HIVE-21513 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta Repro steps: - Create table - Load some data - Run merge so records gets updated and delete_delta dirs are created - Manually initiate minor compaction: ALTER TABLE ... COMPACT 'minor'; - While the compaction is running keep executing the merge statement - After some time try to do simple select *; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21470) ACID: Optimize RecordReader creation when SearchArgument is provided
Vaibhav Gumashta created HIVE-21470: --- Summary: ACID: Optimize RecordReader creation when SearchArgument is provided Key: HIVE-21470 URL: https://issues.apache.org/jira/browse/HIVE-21470 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.3.4, 3.1.1 Reporter: Vaibhav Gumashta Consider the following query: {code} select col1 from tbl1 where year_partition=2019; {code} If the table has a lot of columns, currently we end up creating a TreeReader for each column, even when it won't pass the SearchArgument: {code} TreeReaderFactory.createTreeReader(TypeDescription, TreeReaderFactory$Context) line: 2339 TreeReaderFactory$StructTreeReader.(int, TypeDescription, TreeReaderFactory$Context) line: 1974 TreeReaderFactory.createTreeReader(TypeDescription, TreeReaderFactory$Context) line: 2390 RecordReaderImpl(RecordReaderImpl).(ReaderImpl, Reader$Options) line: 267 RecordReaderImpl.(ReaderImpl, Reader$Options, Configuration) line: 67 ReaderImpl.rowsOptions(Reader$Options, Configuration) line: 83 OrcRawRecordMerger$OriginalReaderPairToRead.(OrcRawRecordMerger$ReaderKey, Reader, int, RecordIdentifier, RecordIdentifier, Reader$Options, OrcRawRecordMerger$Options, Configuration, ValidWriteIdList, int) line: 446 OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 1057 OrcInputFormat.getReader(InputSplit, Options) line: 2108 OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 {code} If the table has 1000 column, and spans N splits, we will end up creating 1000*N TreeReader objects when we might need only N (1/split). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21460) ACID: Load data followed by a select * query results in incorrect results
Vaibhav Gumashta created HIVE-21460: --- Summary: ACID: Load data followed by a select * query results in incorrect results Key: HIVE-21460 URL: https://issues.apache.org/jira/browse/HIVE-21460 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta This affects current master as well. Created an orc file such that it spans multiple stripes and ran a simple select *, and got incorrect row counts (when comparing with select count(*). The problem seems to be that after split generation and creating min/max rowId for each row (note that since the loaded file is not written by Hive ACID, it does not have ROW__ID in the file; but the ROW__ID is applied on read by discovering min/max bounds which are used for calculating ROW__ID.rowId for each row of a split), Hive is only reading the last split. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check by caching the split reader
Vaibhav Gumashta created HIVE-21458: --- Summary: ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check by caching the split reader Key: HIVE-21458 URL: https://issues.apache.org/jira/browse/HIVE-21458 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta In the transactional subsystems, in several places we check to see if a data file has ROW__ID fields or not. Every time we do that (even within the context of the same query), we open a Reader for that file/split. We could optimize this by caching. Also, perhaps we don't need to do this for every split. An example call stack: {code} OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007 OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, Configuration) line: 1231 OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, Configuration, OrcRawRecordMerger$Options) line: 722 OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 1022 OrcInputFormat.getReader(InputSplit, Options) line: 2108 OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 FetchOperator.getRecordReader() line: 344 FetchOperator.getNextRow() line: 540 FetchOperator.pushRow() line: 509 FetchTask.fetch(List) line: 146 {code} Here, for each split we'll make that check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21451) ACID: Avoid using hive.acid.key.index to determine if the file is original or not
Vaibhav Gumashta created HIVE-21451: --- Summary: ACID: Avoid using hive.acid.key.index to determine if the file is original or not Key: HIVE-21451 URL: https://issues.apache.org/jira/browse/HIVE-21451 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta The transactional files written in hive have each row decorated with ROW__ID column. However, when we bring in files using LOAD DATA... command to the transactional tables, they do not have these metadata columns (in Hive ACID parlance, these are called original files). These original files are decorated with an inferred ROW__ID generated while reading these. However, after these are compacted, the ROW__ID metadata column, becomes part of the file itself. To determine if a file is original or not, currently we use check for the presence of hive.acid.key.index. For query based compaction, currently we do not write hive.acid.key.index (HIVE-21165). This means, there is a possibility that that even after compaction, they get treated as original files. Irrespective of HIVE-21165, we should avoid hive.acid.key.index to decide whether the file is original or not, and instead look for the presence of ROW__ID to do that. hive.acid.key.index should be treated as a performance optimization, as it was seemingly meant to be. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Null pointer exception on running compaction against an MM table
The approach is similar, but it is not identical. Let me go over the query based compaction codepath to see if I spot this bug there. Thanks, --Vaibhav From: Aditya Shah Date: Saturday, February 16, 2019 at 3:44 AM To: Vaibhav Gumashta Cc: "dev@hive.apache.org" , Eugene Koifman , Gopal Vijayaraghavan Subject: Re: Null pointer exception on running compaction against an MM table [mage removed by sender.] Hi, Thanks for the reply, have opened a JIRA (HIVE-21280) for the same and will upload a patch soon. But I further had doubts on the new query based compactor for full CRUD tables that has gone into master in HIVE-20699. Does major compaction work there using query based compactor similar to the one for MM table, because I expect the same problem to exist there as well? Aditya On Sat, Feb 16, 2019 at 2:34 AM Vaibhav Gumashta mailto:vgumas...@hortonworks.com>> wrote: Aditya, Thanks for reporting this. Would you like to create a jira for this (https://issues.apache.org/jira/projects/HIVE)? Additionally, if you would like to work on a fix, I’m happy to help in reviewing. --Vaibhav From: Aditya Shah mailto:adityashah3...@gmail.com>> Date: Friday, February 15, 2019 at 2:05 AM To: "dev@hive.apache.org<mailto:dev@hive.apache.org>" mailto:dev@hive.apache.org>> Cc: Eugene Koifman mailto:ekoif...@hortonworks.com>>, Vaibhav Gumashta mailto:vgumas...@hortonworks.com>>, Gopal Vijayaraghavan mailto:go...@hortonworks.com>> Subject: Null pointer exception on running compaction against an MM table Error! Filename not specified. Hi, I was trying to run compaction on MM table but got a null pointer exception while getting HDFS session path. The error seemed to me that session state was not started for this queries. Am I missing something here? I do think session state needs to be started for each of the queries (insert into temp table etc) running for compaction (I'm also doubtful for statsupdater thread's queries) on HMS. Some details are as follows: Env./Versions: Using Hive-3.1.1 (rel/release-3.1.1) Steps to reproduce: 1) Using beeline with HS2 and HMS 2) create an MM table 3) Insert a few values in the table 4) alter table mm_table compact 'major' and wait; Stack trace on HMS: compactor.Worker: Caught exception while trying to compact id:8,dbname:default,tableName:acid_mm_orc,partName:null,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestWriteId:0. Marking failed to avoid repeated failures, java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to run create temporary table default.tmp_compactor_acid_mm_orc_1550222367257(`a` int, `b` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'WITH SERDEPROPERTIES ( 'serialization.format'='1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://localhost:9000/user/hive/warehouse/acid_mm_orc/_tmp_2d8a096c-2db5-4ed8-921c-b3f6d31e079e/_base' TBLPROPERTIES ('transactional'='false') at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runMmCompaction(CompactorMR.java:373) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:241) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:174) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to run create temporary table default.tmp_compactor_acid_mm_orc_1550222367257(`a` int, `b` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'WITH SERDEPROPERTIES ( 'serialization.format'='1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://localhost:9000/user/hive/warehouse/acid_mm_orc/_tmp_2d8a096c-2db5-4ed8-921c-b3f6d31e079e/_base' TBLPROPERTIES ('transactional'='false') at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runOnDriver(CompactorMR.java:525) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runMmCompaction(CompactorMR.java:365) ... 2 more Caused by: java.lang.NullPointerException: Non-local session path expected to be non-null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:228) at org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:815) at org.apache.hadoop.hive.ql.Context.(Context.java:309) at org.apache.hadoop.hive.ql.Context.(Context.java:295) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:591) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1684) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1807) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1567) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1556) at org.apache.hadoop.hive
Re: Null pointer exception on running compaction against an MM table
Aditya, Thanks for reporting this. Would you like to create a jira for this (https://issues.apache.org/jira/projects/HIVE)? Additionally, if you would like to work on a fix, I’m happy to help in reviewing. --Vaibhav From: Aditya Shah Date: Friday, February 15, 2019 at 2:05 AM To: "dev@hive.apache.org" Cc: Eugene Koifman , Vaibhav Gumashta , Gopal Vijayaraghavan Subject: Null pointer exception on running compaction against an MM table [mage removed by sender.] Hi, I was trying to run compaction on MM table but got a null pointer exception while getting HDFS session path. The error seemed to me that session state was not started for this queries. Am I missing something here? I do think session state needs to be started for each of the queries (insert into temp table etc) running for compaction (I'm also doubtful for statsupdater thread's queries) on HMS. Some details are as follows: Env./Versions: Using Hive-3.1.1 (rel/release-3.1.1) Steps to reproduce: 1) Using beeline with HS2 and HMS 2) create an MM table 3) Insert a few values in the table 4) alter table mm_table compact 'major' and wait; Stack trace on HMS: compactor.Worker: Caught exception while trying to compact id:8,dbname:default,tableName:acid_mm_orc,partName:null,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestWriteId:0. Marking failed to avoid repeated failures, java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to run create temporary table default.tmp_compactor_acid_mm_orc_1550222367257(`a` int, `b` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'WITH SERDEPROPERTIES ( 'serialization.format'='1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://localhost:9000/user/hive/warehouse/acid_mm_orc/_tmp_2d8a096c-2db5-4ed8-921c-b3f6d31e079e/_base' TBLPROPERTIES ('transactional'='false') at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runMmCompaction(CompactorMR.java:373) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:241) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:174) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to run create temporary table default.tmp_compactor_acid_mm_orc_1550222367257(`a` int, `b` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'WITH SERDEPROPERTIES ( 'serialization.format'='1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://localhost:9000/user/hive/warehouse/acid_mm_orc/_tmp_2d8a096c-2db5-4ed8-921c-b3f6d31e079e/_base' TBLPROPERTIES ('transactional'='false') at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runOnDriver(CompactorMR.java:525) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runMmCompaction(CompactorMR.java:365) ... 2 more Caused by: java.lang.NullPointerException: Non-local session path expected to be non-null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:228) at org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:815) at org.apache.hadoop.hive.ql.Context.(Context.java:309) at org.apache.hadoop.hive.ql.Context.(Context.java:295) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:591) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1684) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1807) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1567) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1556) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runOnDriver(CompactorMR.java:522) ... 3 more Observations:1) SessionState.start() initializes paths, hivehist etc. 2) SessionState is not started in setupSessionState() in runMMCompaction(). (There is also a comment by Sergey in the code regarding the same) 3) Even after making it start the session state it further fails in running a Teztask for insert overwrite on temp table with the contents of the original table. 4) The cause for 3) is Tezsession state is not able to initialize due to Illegal Argument exception being thrown at the time of setting up caller context in Tez task due to caller id being empty 5) Reason for 4) is queryid is an empty string for such queries. 6) A possible solution for 5) Building querystate with queryid in runOnDriver() in DriverUtils.java Do let me know if you need some more information for the same. Thanks and Regards, Aditya Shah 5th Year M.Sc.(Hons.) Mathematics & B.E.(Hons.) Computer Science and Engineering Birla Institute of Technology & Science, Pilani Vidhya Vihar
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
> On Jan. 29, 2019, 2:04 a.m., Eugene Koifman wrote: > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java > > Lines 299 (patched) > > <https://reviews.apache.org/r/69367/diff/7-9/?file=2121174#file2121174line328> > > > > testMoreBucketsThanReducers/testMoreBucketsThanReducers2 in > > TestTxnCommands force a specific number of reducers I used conf.setIntVar(HiveConf.ConfVars.HADOOPNUMREDUCERS, 2) on an unbucketed table. However, the inserts created 2 different buckets. For example: vgumashta:hive vgumashta$ ../orc-git/build/_CPack_Packages/Darwin/TGZ/ORC-1.5.4-Darwin/bin/orc-contents /Users/vgumashta/Documents/workspace/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez-1549010487358_1258013156/warehouse/testcompactionwithschemaevolutionnobucketsmultiplereducers/ds\=today/delta_003_003_/bucket_0 {"operation": 0, "originalTransaction": 3, "bucket": 536870912, "rowId": 0, "currentTransaction": 3, "row": {"a": 3, "b": 3, "c": 1001}} {"operation": 0, "originalTransaction": 3, "bucket": 536870912, "rowId": 1, "currentTransaction": 3, "row": {"a": 4, "b": 2, "c": 1003}} {"operation": 0, "originalTransaction": 3, "bucket": 536870912, "rowId": 2, "currentTransaction": 3, "row": {"a": 4, "b": 4, "c": 1005}} vgumashta:hive vgumashta$ ../orc-git/build/_CPack_Packages/Darwin/TGZ/ORC-1.5.4-Darwin/bin/orc-contents /Users/vgumashta/Documents/workspace/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez-1549010487358_1258013156/warehouse/testcompactionwithschemaevolutionnobucketsmultiplereducers/ds\=yesterday/delta_003_003_/bucket_1 {"operation": 0, "originalTransaction": 3, "bucket": 536936448, "rowId": 0, "currentTransaction": 3, "row": {"a": 3, "b": 2, "c": 1000}} {"operation": 0, "originalTransaction": 3, "bucket": 536936448, "rowId": 1, "currentTransaction": 3, "row": {"a": 3, "b": 4, "c": 1002}} {"operation": 0, "originalTransaction": 3, "bucket": 536936448, "rowId": 2, "currentTransaction": 3, "row": {"a": 4, "b": 3, "c": 1004}} - Vaibhav --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review212399 --- On Jan. 28, 2019, 7:49 p.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > --- > > (Updated Jan. 28, 2019, 7:49 p.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > --- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > d6a41919bf > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java > 15c14c9be5 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > fbb931cbcd > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java > 6d4578e7a0 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > db3b427adc > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > dc05e1990e > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > a0df82cb20 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java > PRE-CREATION > ql/src/test/results/clientpositive/show_functions.q.out c9716e904c > > > Diff: https://reviews.apache.org/r/69367/diff/9/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- (Updated Jan. 28, 2019, 7:49 p.m.) Review request for hive and Eugene Koifman. Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java db3b427adc ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out c9716e904c Diff: https://reviews.apache.org/r/69367/diff/9/ Changes: https://reviews.apache.org/r/69367/diff/8-9/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- (Updated Jan. 26, 2019, 12:32 a.m.) Review request for hive and Eugene Koifman. Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java db3b427adc ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out c9716e904c Diff: https://reviews.apache.org/r/69367/diff/8/ Changes: https://reviews.apache.org/r/69367/diff/7-8/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
.org/r/69367/diff/7/?file=2121179#file2121179line638> > > > > is there a followup Jira for this? https://jira.apache.org/jira/browse/HIVE-21165 > On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote: > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > > Lines 248 (patched) > > <https://reviews.apache.org/r/69367/diff/7/?file=2121182#file2121182line248> > > > > What does this do for MM table? H, bug - fixed it. > On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote: > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java > > Lines 91 (patched) > > <https://reviews.apache.org/r/69367/diff/7/?file=2121184#file2121184line91> > > > > when is it ok for 2 consecutive ROW_IDs to be equal? Throwing an exception now if comparison returs 0. - Vaibhav --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review212198 --- On Jan. 22, 2019, 7:04 a.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > --- > > (Updated Jan. 22, 2019, 7:04 a.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > --- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > d6a41919bf > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java > 15c14c9be5 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > fbb931cbcd > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java > 6d4578e7a0 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > 0e5b3e5473 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > dc05e1990e > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > a0df82cb20 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java > PRE-CREATION > ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f > > > Diff: https://reviews.apache.org/r/69367/diff/7/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
[jira] [Created] (HIVE-21167) Bucketing: Bucketing version 1 is incorrectly partitioning data
Vaibhav Gumashta created HIVE-21167: --- Summary: Bucketing: Bucketing version 1 is incorrectly partitioning data Key: HIVE-21167 URL: https://issues.apache.org/jira/browse/HIVE-21167 Project: Hive Issue Type: Bug Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta Using murmur hash for bucketing columns was introduced in HIVE-18910, following which {{'bucketing_version'='1'}} stands for the old behaviour (where for example integer columns were partitioned based on mod values). Looks like we have a bug in the old bucketing scheme now. I could repro it when modified the existing schema using an alter table add column and adding new data. Repro: {code} 0: jdbc:hive2://localhost:10010> create transactional table acid_ptn_bucket1 (a int, b int) partitioned by(ds string) clustered by (a) into 2 buckets stored as ORC TBLPROPERTIES('bucketing_version'='1', 'transactional'='true', 'transactional_properties'='default'); No rows affected (0.418 seconds) 0: jdbc:hive2://localhost:10010> insert into acid_ptn_bucket1 partition (ds) values(1,2,'today'),(1,3,'today'),(1,4,'yesterday'),(2,2,'yesterday'),(2,3,'today'),(2,4,'today'); 6 rows affected (3.695 seconds) {code} Data from ORC file (data as expected): {code} /apps/hive/warehouse/acid_ptn_bucket1/ds=today/delta_001_001_/bucket_0 {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"a": 2, "b": 4}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 1, "currentTransaction": 1, "row": {"a": 2, "b": 3}} /apps/hive/warehouse/acid_ptn_bucket1/ds=today/delta_001_001_/bucket_1 {"operation": 0, "originalTransaction": 1, "bucket": 536936448, "rowId": 0, "currentTransaction": 1, "row": {"a": 1, "b": 3}} {"operation": 0, "originalTransaction": 1, "bucket": 536936448, "rowId": 1, "currentTransaction": 1, "row": {"a": 1, "b": 2}} {code} Modifying table schema and inserting new data: {code} 0: jdbc:hive2://localhost:10010> alter table acid_ptn_bucket1 add columns(c int); No rows affected (0.541 seconds) 0: jdbc:hive2://localhost:10010> insert into acid_ptn_bucket1 partition (ds) values(3,2,1000,'yesterday'),(3,3,1001,'today'),(3,4,1002,'yesterday'),(4,2,1003,'today'), (4,3,1004,'yesterday'),(4,4,1005,'today'); 6 rows affected (3.699 seconds) {code} Data from ORC file (wrong partitioning): {code} /apps/hive/warehouse/acid_ptn_bucket1/ds=today/delta_003_003_/bucket_0 {"operation": 0, "originalTransaction": 3, "bucket": 536870912, "rowId": 0, "currentTransaction": 3, "row": {"a": 3, "b": 3, "c": 1001}} /apps/hive/warehouse/acid_ptn_bucket1/ds=today/delta_003_003_/bucket_1 {"operation": 0, "originalTransaction": 3, "bucket": 536936448, "rowId": 0, "currentTransaction": 3, "row": {"a": 4, "b": 4, "c": 1005}} {"operation": 0, "originalTransaction": 3, "bucket": 536936448, "rowId": 1, "currentTransaction": 3, "row": {"a": 4, "b": 2, "c": 1003}} {code} As seen above, the expected behaviour is that new data with column 'a' being 3 should go to bucket1 and column 'a' being 4 should go to bucket0, but the partitioning is wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21165) ACID: pass query hint to the writers to write hive.acid.key.index
Vaibhav Gumashta created HIVE-21165: --- Summary: ACID: pass query hint to the writers to write hive.acid.key.index Key: HIVE-21165 URL: https://issues.apache.org/jira/browse/HIVE-21165 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta For the query based compactor from HIVE-20699, the compaction runs as a sql query. However, this mechanism skips over writing hive.acid.key.index for each stripe, which is used to skip over stripes that are not supposed to be read. We need a way to pass a query hint to the writer so that it can write this index data, when invoked from a sql query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21164) ACID: explore how we can avoid a move step during compaction
Vaibhav Gumashta created HIVE-21164: --- Summary: ACID: explore how we can avoid a move step during compaction Key: HIVE-21164 URL: https://issues.apache.org/jira/browse/HIVE-21164 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta Currently, we write compacted data to a temporary location and then move the files to a final location, which is an expensive operation on some cloud file systems. Since HIVE-20823 is already in, it can control the visibility of compacted data for the readers. Therefore, we can perhaps avoid writing data to a temporary location and directly write compacted data to the intended final path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- (Updated Jan. 22, 2019, 7:04 a.m.) Review request for hive and Eugene Koifman. Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e5b3e5473 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f Diff: https://reviews.apache.org/r/69367/diff/7/ Changes: https://reviews.apache.org/r/69367/diff/6-7/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-21137) JDBC: HiveDatabaseMetaData.getTables does not adhere to jdbc spec
Vaibhav Gumashta created HIVE-21137: --- Summary: JDBC: HiveDatabaseMetaData.getTables does not adhere to jdbc spec Key: HIVE-21137 URL: https://issues.apache.org/jira/browse/HIVE-21137 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 2.3.4, 3.1.1 Reporter: Vaibhav Gumashta Attachments: HiveJdbcClient.java The {{types}} parameter in {{HiveDatabaseMetaData.getTable(String catalog, String schemaPattern, String tableNamePattern, String[] types)}} is supposed to honor only the return values from {{HiveDatabaseMetaData.getTableTypes}}. However following is the output from the attached test jdbc programs: {code} *** Using dbMetadata.getTables *** *** With only EXTERNAL TABLE *** Table: test1 Table: test_2 Table: names_text Table: names_text_1 *** With only TABLE *** Table: test1 Table: test_2 Table: names_text Table: names_text_1 *** With only EXTERNAL_TABLE *** Table: test1 Table: test_2 Table: names_text Table: names_text_1 *** With empty array *** Table: test1 Table: test_2 Table: names_text Table: names_text_1 *** With VIEW *** *** With INDEX_TABLE *** Table: test1 Table: test_2 Table: names_text Table: names_text_1 *** With VIEW, INDEX_TABLE *** *** With EXTERNAL_TABLE, VIEW, INDEX_TABLE *** *** With TABLE, VIEW, INDEX_TABLE *** Table: test1 Table: test_2 Table: names_text Table: names_text_1 *** With a random string *** Table: test1 Table: test_2 Table: names_text Table: names_text_1 *** getTableTypes *** Table: TABLE Table: TABLE Table: VIEW Table: MATERIALIZED_VIEW {code} We should fix the api so that clients can see expected behaviour. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- (Updated Jan. 18, 2019, 10:41 p.m.) Review request for hive and Eugene Koifman. Changes --- Rebased on master Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e5b3e5473 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f Diff: https://reviews.apache.org/r/69367/diff/6/ Changes: https://reviews.apache.org/r/69367/diff/5-6/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- (Updated Jan. 18, 2019, 9:40 p.m.) Review request for hive and Eugene Koifman. Changes --- Rebased on master Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e5b3e5473 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f Diff: https://reviews.apache.org/r/69367/diff/5/ Changes: https://reviews.apache.org/r/69367/diff/4-5/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- (Updated Jan. 10, 2019, 7:23 p.m.) Review request for hive and Eugene Koifman. Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d7f069eaa7 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java b477480e0a ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 7d5ee4a59f ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION Diff: https://reviews.apache.org/r/69367/diff/4/ Changes: https://reviews.apache.org/r/69367/diff/3-4/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-21105) LLAP: provide an option to configure #retries and time between retries for LLAP-ZK SecretManager
Vaibhav Gumashta created HIVE-21105: --- Summary: LLAP: provide an option to configure #retries and time between retries for LLAP-ZK SecretManager Key: HIVE-21105 URL: https://issues.apache.org/jira/browse/HIVE-21105 Project: Hive Issue Type: Bug Components: llap Affects Versions: 2.3.4, 3.1.1 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- (Updated Jan. 8, 2019, 7:18 p.m.) Review request for hive and Eugene Koifman. Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorOnTez.java PRE-CREATION pom.xml 26b662e4c3 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 578b16cc7c ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 8cabf960db ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 6e7c78bd17 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java beb6902d7d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION Diff: https://reviews.apache.org/r/69367/diff/3/ Changes: https://reviews.apache.org/r/69367/diff/2-3/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-21080) Update Hive to use ORC-1.5.4
Vaibhav Gumashta created HIVE-21080: --- Summary: Update Hive to use ORC-1.5.4 Key: HIVE-21080 URL: https://issues.apache.org/jira/browse/HIVE-21080 Project: Hive Issue Type: Bug Components: ORC Reporter: Vaibhav Gumashta Now that ORC-1.5.4 is released, we should update Hive's version of ORC so that HIVE-20699 can use it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- (Updated Nov. 19, 2018, 11:49 a.m.) Review request for hive and Eugene Koifman. Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 pom.xml 26b662e4c3 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 578b16cc7c ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 8cabf960db ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 6e7c78bd17 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION Diff: https://reviews.apache.org/r/69367/diff/2/ Changes: https://reviews.apache.org/r/69367/diff/1-2/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
> On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote: > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > > Lines 2685 (patched) > > <https://reviews.apache.org/r/69367/diff/1/?file=2108425#file2108425line2685> > > > > "And minor compaction will be disabled." - should make sure Initiator > > doesn't start minor and that Alter Table commands requesting Minor are > > no-op or throw so that these don't get into the compactor queue. We should > > also, perhaps think about how Initiator triggers Major compactions - are > > current config params adequate? Should do at least the 2nd part in a > > follow up jira, maybe both. > > Vaibhav Gumashta wrote: > Created https://jira.apache.org/jira/browse/HIVE-20933 NM, addressing this as part of this jira itself as it's a small change and leaving this out might result in garbage data getting accumulated in metastore table - Vaibhav --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review210589 ------- On Nov. 16, 2018, 12:59 a.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > --- > > (Updated Nov. 16, 2018, 12:59 a.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > --- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > 40dd992455 > pom.xml 26b662e4c3 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 92c74e1d06 > > > Diff: https://reviews.apache.org/r/69367/diff/1/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
> On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote: > > Addressed the comments; will upload changes in the patch. > On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote: > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > > Lines 2685 (patched) > > <https://reviews.apache.org/r/69367/diff/1/?file=2108425#file2108425line2685> > > > > "And minor compaction will be disabled." - should make sure Initiator > > doesn't start minor and that Alter Table commands requesting Minor are > > no-op or throw so that these don't get into the compactor queue. We should > > also, perhaps think about how Initiator triggers Major compactions - are > > current config params adequate? Should do at least the 2nd part in a > > follow up jira, maybe both. Created https://jira.apache.org/jira/browse/HIVE-20933 > On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote: > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > > Lines 245 (patched) > > <https://reviews.apache.org/r/69367/diff/1/?file=2108430#file2108430line252> > > > > Ideally this should be prevented before it gets into the > > compction_queue. throwing here will cause failed compactions to accumulate > > in SHOW COMPACTIONS and prevent auto-scheduling of new ones. Created https://jira.apache.org/jira/browse/HIVE-20933 for this. > On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote: > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > > Lines 399 (patched) > > <https://reviews.apache.org/r/69367/diff/1/?file=2108430#file2108430line406> > > > > should this be in a finally{}? SessionState is threadLocal so it may > > get reused... or do we shutdown the session each time? Made the change. We do remove the threadlocal via SessionState.detachSession() in the finally block in DriverUtils.runOnDriver > On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote: > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > > Lines 513 (patched) > > <https://reviews.apache.org/r/69367/diff/1/?file=2108430#file2108430line521> > > > > why do you need partition key/values in the query? we are always > > reading a single partition. This is achieved by getAcidState() which takes > > partition dir as input (i.e. all the files it returns are within a given > > partition) As discussed, I'll throw an exeption from SplitGrouper if we somehow end up getting splits across partitions in there (unlikely, since the sql query we run adds a partition filter). - Vaibhav ------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review210589 --- On Nov. 16, 2018, 12:59 a.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > --- > > (Updated Nov. 16, 2018, 12:59 a.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > --- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > 40dd992455 > pom.xml 26b662e4c3 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 92c74e1d06 > > > Diff: https://reviews.apache.org/r/69367/diff/1/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
[jira] [Created] (HIVE-20934) Query based compactor for full CRUD Acid tables
Vaibhav Gumashta created HIVE-20934: --- Summary: Query based compactor for full CRUD Acid tables Key: HIVE-20934 URL: https://issues.apache.org/jira/browse/HIVE-20934 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta Follow up of HIVE-20699. This is to enable running minor compactions as a HiveQL query -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20933) Disable minor compactions when using query based compactor
Vaibhav Gumashta created HIVE-20933: --- Summary: Disable minor compactions when using query based compactor Key: HIVE-20933 URL: https://issues.apache.org/jira/browse/HIVE-20933 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta HIVE-20699 introduces a new config, which can enable major compaction to be run as a HiveQL query, so that it can take advantage of the underlying execution engine and not run compaction as an MR job. This is particularly useful for cloud deployments. For now, we will disable minor compactions if this config flag is enabled. We will work on enabling minor compactions in a separate jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/ --- Review request for hive and Eugene Koifman. Bugs: HIVE-20699 https://issues.apache.org/jira/browse/HIVE-20699 Repository: hive-git Description --- https://jira.apache.org/jira/browse/HIVE-20699 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 pom.xml 26b662e4c3 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 Diff: https://reviews.apache.org/r/69367/diff/1/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 69143: CachedStore: Add more UT coverage (outside of .q files)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69143/ --- (Updated Nov. 2, 2018, 12:46 a.m.) Review request for hive, Daniel Dai and Thejas Nair. Bugs: HIVE-20613 https://issues.apache.org/jira/browse/HIVE-20613 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-20613 Diffs (updated) - standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java 944c81313a standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 70490f09e7 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java c24e7160ac standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java bb20d9f42a standalone-metastore/metastore-server/src/test/resources/log4j2.properties 365687e1c9 Diff: https://reviews.apache.org/r/69143/diff/2/ Changes: https://reviews.apache.org/r/69143/diff/1-2/ Testing --- Thanks, Vaibhav Gumashta
Review Request 69143: CachedStore: Add more UT coverage (outside of .q files)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69143/ --- Review request for hive, Daniel Dai and Thejas Nair. Bugs: HIVE-20613 https://issues.apache.org/jira/browse/HIVE-20613 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-20613 Diffs - standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java 944c81313a standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 70490f09e7 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java c24e7160ac standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java bb20d9f42a standalone-metastore/metastore-server/src/test/resources/log4j2.properties 365687e1c9 Diff: https://reviews.apache.org/r/69143/diff/1/ Testing --- Thanks, Vaibhav Gumashta
Re: [ANNOUNCE] New committer: Nishant Bangarwa
Congrats Nishant. On 10/19/18, 11:12 AM, "Prasanth Jayachandran" wrote: Congratulations! > On Oct 19, 2018, at 11:10 AM, Vihang Karajgaonkar wrote: > > Congrats Nishant! > > On Fri, Oct 19, 2018 at 11:02 AM, Deepak Jaiswal > wrote: > >> Congratulations! >> >> On 10/19/18, 10:14 AM, "Vineet Garg" wrote: >> >>Congrats Nishant! >> >>> On Oct 19, 2018, at 8:36 AM, Gunther Hagleitner < >> ghagleit...@hortonworks.com> wrote: >>> >>> Congrats Nishant! >>> >>> Cheers, >>> Gunther. >>> >>> From: Andrew Sherman >>> Sent: Friday, October 19, 2018 8:34 AM >>> To: dev@hive.apache.org >>> Subject: Re: [ANNOUNCE] New committer: Nishant Bangarwa >>> >>> Congratulations Nishant! >>> >>> On Fri, Oct 19, 2018 at 4:29 AM Peter Vary >> >>> wrote: >>> Congratulations Nishant! > On Oct 19, 2018, at 07:42, Sankar Hariappan < >> shariap...@hortonworks.com> wrote: > > Congrats Nishant! > > Best regards > Sankar > > > > > > > > > > On 15/10/18, 12:45 PM, "Ashutosh Chauhan" >> wrote: > >> Apache Hive's Project Management Committee (PMC) has invited >> Nishant >> Bangarwa >> to become a committer, and we are pleased to announce that he has accepted. >> >> Nishant, welcome, thank you for your contributions, and we look >> forward your >> further interactions with the community! >> >> Ashutosh Chauhan (on behalf of the Apache Hive PMC) >>> >>> >> >> >> >>
[jira] [Created] (HIVE-20754) JDBC: Add some missing classes to jdbc standalone jar (follow up to HIVE-19801)
Vaibhav Gumashta created HIVE-20754: --- Summary: JDBC: Add some missing classes to jdbc standalone jar (follow up to HIVE-19801) Key: HIVE-20754 URL: https://issues.apache.org/jira/browse/HIVE-20754 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 3.1.0 Reporter: Vaibhav Gumashta Some more classes are needed in a secure cluster -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20734) Beeline: When beeline-site.xml is and hive CLI redirects to beeline, it should use the system username/dummy password instead of prompting for one
Vaibhav Gumashta created HIVE-20734: --- Summary: Beeline: When beeline-site.xml is and hive CLI redirects to beeline, it should use the system username/dummy password instead of prompting for one Key: HIVE-20734 URL: https://issues.apache.org/jira/browse/HIVE-20734 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 3.1.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20676) HiveServer2: PrivilegeSynchronizer is not set to daemon status
Vaibhav Gumashta created HIVE-20676: --- Summary: HiveServer2: PrivilegeSynchronizer is not set to daemon status Key: HIVE-20676 URL: https://issues.apache.org/jira/browse/HIVE-20676 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20615) CachedStore: Background refresh thread bug fixes
Vaibhav Gumashta created HIVE-20615: --- Summary: CachedStore: Background refresh thread bug fixes Key: HIVE-20615 URL: https://issues.apache.org/jira/browse/HIVE-20615 Project: Hive Issue Type: Sub-task Components: Metastore Affects Versions: 3.1.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20614) CachedStore: Run a select q file tests with CachedStore enabled
Vaibhav Gumashta created HIVE-20614: --- Summary: CachedStore: Run a select q file tests with CachedStore enabled Key: HIVE-20614 URL: https://issues.apache.org/jira/browse/HIVE-20614 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20613) CachedStore: Add more UT coverage (outside of .q files)
Vaibhav Gumashta created HIVE-20613: --- Summary: CachedStore: Add more UT coverage (outside of .q files) Key: HIVE-20613 URL: https://issues.apache.org/jira/browse/HIVE-20613 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20578) Enable org.apache.hive.jdbc.miniHS2.TestHs2ConnectionMetricsHttp.testOpenConnectionMetrics
Vaibhav Gumashta created HIVE-20578: --- Summary: Enable org.apache.hive.jdbc.miniHS2.TestHs2ConnectionMetricsHttp.testOpenConnectionMetrics Key: HIVE-20578 URL: https://issues.apache.org/jira/browse/HIVE-20578 Project: Hive Issue Type: Bug Components: Test Affects Versions: 4.0.0 Reporter: Vaibhav Gumashta Disabled in HIVE-20577 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20577) Disable org.apache.hive.jdbc.miniHS2.TestHs2ConnectionMetricsHttp.testOpenConnectionMetrics
Vaibhav Gumashta created HIVE-20577: --- Summary: Disable org.apache.hive.jdbc.miniHS2.TestHs2ConnectionMetricsHttp.testOpenConnectionMetrics Key: HIVE-20577 URL: https://issues.apache.org/jira/browse/HIVE-20577 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 4.0.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20555) HiveServer2: Preauthenticated subject for http transport is not retained for entire duration of http communication in some cases
Vaibhav Gumashta created HIVE-20555: --- Summary: HiveServer2: Preauthenticated subject for http transport is not retained for entire duration of http communication in some cases Key: HIVE-20555 URL: https://issues.apache.org/jira/browse/HIVE-20555 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 3.1.0, 2.3.2 Reporter: Vaibhav Gumashta As implemented in HIVE-8705, for http transport, we add the logged in subject's credentials in the http header via a request interceptor. The request interceptor doesn't seem to be getting used for some http traffic (e.g. knox ssl in the same rpc). It would also be better to cache the logged in subject for the duration of the whole session. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20507) Beeline: Add a utility command to retrieve all uris from beeline-site.xml
Vaibhav Gumashta created HIVE-20507: --- Summary: Beeline: Add a utility command to retrieve all uris from beeline-site.xml Key: HIVE-20507 URL: https://issues.apache.org/jira/browse/HIVE-20507 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 3.1.0 Reporter: Vaibhav Gumashta It will be useful for some clients to get the url list when beeline-site is present. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New committer: Andrew Sherman
Congratulations Andrew! On 9/4/18, 10:36 AM, "Sahil Takiar" wrote: Congrats Andrew! On Tue, Sep 4, 2018 at 12:02 AM Antal Sinkovits wrote: > Congratulations Andrew! > > Deepak Jaiswal ezt írta (időpont: 2018. szept. > 4., K 7:34): > > > Congratulation Andrew. > > > > Deepak > > > > On 9/3/18, 10:17 PM, "Zoltan Haindrich" wrote: > > > > Congratulations Andrew! > > > > On 2 September 2018 04:49:00 CEST, Lefty Leverenz < > > leftylever...@gmail.com> wrote: > > >Congratulations Andrew! > > > > > >-- Lefty > > > > > > > > >On Tue, Aug 28, 2018 at 11:36 AM Ashutosh Chauhan > > > > > >wrote: > > > > > >> Apache Hive's Project Management Committee (PMC) has invited > Andrew > > >Sherman > > >> to become a committer, and we are pleased to announce that he has > > >accepted. > > >> > > >> Andrew, welcome, thank you for your contributions, and we look > > >forward to > > >> your > > >> further interactions with the community! > > >> > > >> Ashutosh Chauhan (on behalf of the Apache Hive PMC) > > >> > > > > > > > -- Sahil Takiar Software Engineer takiar.sa...@gmail.com | (510) 673-0309
[jira] [Created] (HIVE-20478) Metastore: Null checks needed in DecimalColumnStatsAggregator
Vaibhav Gumashta created HIVE-20478: --- Summary: Metastore: Null checks needed in DecimalColumnStatsAggregator Key: HIVE-20478 URL: https://issues.apache.org/jira/browse/HIVE-20478 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 3.1.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20446) CachedStore: bug fixes for q file tests: TestMiniLlapCliDriver, TestMiniTezCliDriver, TestMinimrCliDriver when CachedStore is enabled
Vaibhav Gumashta created HIVE-20446: --- Summary: CachedStore: bug fixes for q file tests: TestMiniLlapCliDriver, TestMiniTezCliDriver, TestMinimrCliDriver when CachedStore is enabled Key: HIVE-20446 URL: https://issues.apache.org/jira/browse/HIVE-20446 Project: Hive Issue Type: Sub-task Components: Standalone Metastore Affects Versions: 3.1.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20430) CachedStore: getTableObjectsByName incorrectly adds a null object to the table list
Vaibhav Gumashta created HIVE-20430: --- Summary: CachedStore: getTableObjectsByName incorrectly adds a null object to the table list Key: HIVE-20430 URL: https://issues.apache.org/jira/browse/HIVE-20430 Project: Hive Issue Type: Sub-task Components: Standalone Metastore Affects Versions: 3.1.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20337) CachedStore: getPartitionsByExpr is not populating the partition list correctly
Vaibhav Gumashta created HIVE-20337: --- Summary: CachedStore: getPartitionsByExpr is not populating the partition list correctly Key: HIVE-20337 URL: https://issues.apache.org/jira/browse/HIVE-20337 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.1.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar
Congrats Vihang! On 8/1/18, 11:33 AM, "Chaoyu Tang" wrote: Congratulations Vihang. On Tue, Jul 31, 2018 at 7:39 AM, Rajesh Balamohan wrote: > Congratulations Vihang! > > ~Rajesh.B > > > On Tue, Jul 31, 2018 at 3:35 PM Marta Kuczora > > wrote: > > > Congratulations Vihang! > > > > On Mon, Jul 30, 2018 at 9:44 AM Peter Vary > > wrote: > > > > > Congratulations Vihang! > > > > > > > On Jul 29, 2018, at 22:32, Vineet Garg > wrote: > > > > > > > > Congratulations Vihang! > > > > > > > >> On Jul 26, 2018, at 11:27 AM, Ashutosh Chauhan < > hashut...@apache.org> > > > wrote: > > > >> > > > >> On behalf of the Hive PMC I am delighted to announce Vihang > > > Karajgaonkar > > > >> is joining Hive PMC. > > > >> Thanks Vihang for all your contributions till now. Looking forward > to > > > many > > > >> more. > > > >> > > > >> Welcome, Vihang! > > > >> > > > >> Thanks, > > > >> Ashutosh > > > > > > > > > > >
Re: [ANNOUNCE] New PMC Member : Vineet Garg
Congrats Vineet! On 8/1/18, 11:33 AM, "Chaoyu Tang" wrote: Congratulations Vineet! On Tue, Jul 31, 2018 at 7:39 AM, Rajesh Balamohan < rajesh.balamo...@gmail.com> wrote: > Congratulations Vineet! > > ~Rajesh.B > > > On Tue, Jul 31, 2018 at 3:34 PM Marta Kuczora > > wrote: > > > Congratulations Vineet! > > > > On Mon, Jul 30, 2018 at 9:45 AM Peter Vary > > wrote: > > > > > Congratulations Vineet! > > > > > > > On Jul 30, 2018, at 01:59, Ashutosh Chauhan > > > wrote: > > > > > > > > On behalf of the Hive PMC I am delighted to announce Vineet Garg is > > > joining > > > > Hive PMC. > > > > Thanks Vineet for all your contributions till now. Looking forward to > > > many > > > > more. > > > > > > > > Welcome, Vineet! > > > > > > > > Thanks, > > > > Ashutosh > > > > > > > > > > > -- > ~Rajesh.B >
Re: [ANNOUNCE] New PMC Member : Sahil Takiar
Congrats Sahil! On 8/1/18, 11:32 AM, "Chaoyu Tang" wrote: Congratulations Sahil! On Tue, Jul 31, 2018 at 7:40 AM, Rajesh Balamohan wrote: > Congratulations Sahil! > > ~Rajesh.B > > > On Tue, Jul 31, 2018 at 3:57 PM Marta Kuczora > > wrote: > > > Congratulations Sahil! > > > > On Mon, Jul 30, 2018 at 9:44 AM Peter Vary > > wrote: > > > > > Congratulations Sahil! > > > > > > > On Jul 29, 2018, at 22:32, Vineet Garg > wrote: > > > > > > > > Congratulations Sahil! > > > > > > > >> On Jul 26, 2018, at 11:28 AM, Ashutosh Chauhan < > hashut...@apache.org> > > > wrote: > > > >> > > > >> On behalf of the Hive PMC I am delighted to announce Sahil Takiar is > > > >> joining Hive PMC. > > > >> Thanks Sahil for all your contributions till now. Looking forward to > > > many > > > >> more. > > > >> > > > >> Welcome, Sahil! > > > >> > > > >> Thanks, > > > >> Ashutosh > > > > > > > > > > >
Re: [ANNOUNCE] New PMC Member : Peter Vary
Congrats Peter! On 8/1/18, 11:31 AM, "Chaoyu Tang" wrote: Congratulations, Peter. On Wed, Aug 1, 2018 at 2:08 PM, Peter Vary wrote: > Thanks everyone! > > Rajesh Balamohan ezt írta (időpont: 2018. júl. > 31., > Ke 13:41): > > > Congratulations Peter! > > > > ~Rajesh.B > > > > > > On Tue, Jul 31, 2018 at 3:58 PM Marta Kuczora > > > > wrote: > > > > > Congratulations Peter! > > > > > > On Mon, Jul 30, 2018 at 7:53 PM Andrew Sherman > > > wrote: > > > > > > > Congratulations Peter! > > > > > > > > On Sun, Jul 29, 2018 at 1:32 PM Vineet Garg > > > wrote: > > > > > > > > > Congratulations Peter! > > > > > > > > > > > On Jul 26, 2018, at 11:25 AM, Ashutosh Chauhan < > > hashut...@apache.org > > > > > > > > > wrote: > > > > > > > > > > > > On behalf of the Hive PMC I am delighted to announce Peter Vary > is > > > > > joining > > > > > > Hive PMC. > > > > > > Thanks Peter for all your contributions till now. Looking forward > > to > > > > many > > > > > > more. > > > > > > > > > > > > Welcome, Peter! > > > > > > > > > > > > Thanks, > > > > > > Ashutosh > > > > > > > > > > > > > > > > >
[jira] [Created] (HIVE-19801) JDBC: Add some missing classes to jdbc standalone jar and remove hbase classes
Vaibhav Gumashta created HIVE-19801: --- Summary: JDBC: Add some missing classes to jdbc standalone jar and remove hbase classes Key: HIVE-19801 URL: https://issues.apache.org/jira/browse/HIVE-19801 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 3.0.0, 3.1.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19748) Add appropriate null checks to DecimalColumnStatsAggregator
Vaibhav Gumashta created HIVE-19748: --- Summary: Add appropriate null checks to DecimalColumnStatsAggregator Key: HIVE-19748 URL: https://issues.apache.org/jira/browse/HIVE-19748 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0 Reporter: Vaibhav Gumashta In some of our internal testing, we noticed that calls to MetaStoreUtils.decimalToDoublee(Decimal decimal) from within DecimalColumnStatsAggregator end up passing null Decimal values to the method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19734) Beeline: When beeline-site.xml is present, beeline does not honor -n (username) and -p (password) arguments
Vaibhav Gumashta created HIVE-19734: --- Summary: Beeline: When beeline-site.xml is present, beeline does not honor -n (username) and -p (password) arguments Key: HIVE-19734 URL: https://issues.apache.org/jira/browse/HIVE-19734 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 3.0.0 Reporter: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19528) Beeline: When beeline-site.xml is present and the default named url is incorrect, throw an exception instead of relying on resolution via hive-site.xml/beeline-hs2-connec
Vaibhav Gumashta created HIVE-19528: --- Summary: Beeline: When beeline-site.xml is present and the default named url is incorrect, throw an exception instead of relying on resolution via hive-site.xml/beeline-hs2-connection.xml Key: HIVE-19528 URL: https://issues.apache.org/jira/browse/HIVE-19528 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 3.0.0, 3.1.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19389) Schematool: For Hive's Information Schema, use embedded HS2 as default
Vaibhav Gumashta created HIVE-19389: --- Summary: Schematool: For Hive's Information Schema, use embedded HS2 as default Key: HIVE-19389 URL: https://issues.apache.org/jira/browse/HIVE-19389 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0, 3.1.0 Reporter: Vaibhav Gumashta Currently, for initializing/upgrading Hive's information schema, we require a full jdbc url (for HS2). It will be good to have it connect using embedded HS2 by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19385) Optional hive env variable to redirect bin/hive to use Beeline
Vaibhav Gumashta created HIVE-19385: --- Summary: Optional hive env variable to redirect bin/hive to use Beeline Key: HIVE-19385 URL: https://issues.apache.org/jira/browse/HIVE-19385 Project: Hive Issue Type: Bug Affects Versions: 3.0.0, 3.1.0 Reporter: Vaibhav Gumashta With beeline-site and beeline-user-site, the user can easily specify default hs2 urls to connect. We can use an optional env variable, which when set, will enable bin/hive to use beeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Review Request 66857: Replication: The file uris being dumped should contain information about the uri of the source cluster's cm root
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66857/ --- Review request for hive and Thejas Nair. Bugs: HIVE-19343 https://issues.apache.org/jira/browse/HIVE-19343 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-19343 Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestReplChangeManager.java 6ade76d0c2 ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java de270cfcdb ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/CreateFunctionHandler.java f7c90409b7 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java 7c1d5f5cca Diff: https://reviews.apache.org/r/66857/diff/1/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-19343) Replication: The file uris being dumped should contain information about the uri of the source cluster's cm root
Vaibhav Gumashta created HIVE-19343: --- Summary: Replication: The file uris being dumped should contain information about the uri of the source cluster's cm root Key: HIVE-19343 URL: https://issues.apache.org/jira/browse/HIVE-19343 Project: Hive Issue Type: Bug Components: repl Affects Versions: 3.0.0, 3.1.0 Reporter: Vaibhav Gumashta In replication v2, we use change manager (the location is specified by cmroot: {{hive.repl.cmrootdir}}) to archive deleted files from the source cluster so that they can later be copied on the target cluster. When files are read from the cmroot, the target needs to know the appropriate file system. This patch adds the fs information of the cmroot on the source to the filenames that get written in the repldump command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19310) Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might need to be run only in test env
Vaibhav Gumashta created HIVE-19310: --- Summary: Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might need to be run only in test env Key: HIVE-19310 URL: https://issues.apache.org/jira/browse/HIVE-19310 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0, 3.1.0 Reporter: Vaibhav Gumashta MetaStoreDirectSql.ensureDbInit has the following 2 calls which we have observed taking a long time in our testing: {code} initQueries.add(pm.newQuery(MNotificationLog.class, "dbName == ''")); initQueries.add(pm.newQuery(MNotificationNextId.class, "nextEventId < -1")); {code} In a production environment, these tables should be initialized using schematool, however in a test environment, these calls might be needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19249) Replication: The WITH clause is not passing the configuration to Task correctly in all cases
Vaibhav Gumashta created HIVE-19249: --- Summary: Replication: The WITH clause is not passing the configuration to Task correctly in all cases Key: HIVE-19249 URL: https://issues.apache.org/jira/browse/HIVE-19249 Project: Hive Issue Type: Bug Components: repl Affects Versions: 3.0.0, 3.1.0 Reporter: Vaibhav Gumashta When running repl load like following: {code} REPL LOAD `repldb_kms207` FROM 'hdfs://url:8020/apps/hive/repl/f8b057a7-c3f2-43bd-8baa-f7408a9008fc' WITH ('hive.exec.parallel'='true','hive.distcp.privileged.doAs'='beacon','hive.metastore.uris'='thrift://metastore-url:9083','hive.metastore.warehouse.dir'='s3a://s3-warehouse','hive.warehouse.subdir.inherit.perms'='false','hive.repl.replica.functions.root.dir'='s3a://s3-warehouse','fs.s3a.bucket.ss-datasets.endpoint'='s3-bucket-endpoint','fs.s3a.impl.disable.cache'='true','fs.s3a.server-side-encryption-algorithm'='SSE-KMS','fs.s3a.server-side-encryption.key'='encr-key','distcp.options.pp'='','distcp.options.pg'='','distcp.options.pu'=''); {code} the task that get created need to use the configs that are passed in the USING clause. However, in some cases the wrong config object gets used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 66503: HIVE-19126: CachedStore: Use memory estimation to limit cache size during prewarm
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66503/ --- (Updated April 16, 2018, 9:42 p.m.) Review request for hive and Thejas Nair. Bugs: HIVE-19126 https://issues.apache.org/jira/browse/HIVE-19126 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-19126 Diffs (updated) - llap-server/src/java/org/apache/hadoop/hive/llap/IncrementalObjectSizeEstimator.java 6f4ec6f1ea llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java 2f7fa24558 llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java 0bbaf7e459 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 1ce86bbdba standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 89b400697b standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java f007261daf standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/SizeValidator.java PRE-CREATION standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java d451f966b0 Diff: https://reviews.apache.org/r/66503/diff/6/ Changes: https://reviews.apache.org/r/66503/diff/5-6/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 66503: HIVE-19126: CachedStore: Use memory estimation to limit cache size during prewarm
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66503/ --- (Updated April 11, 2018, 10:53 p.m.) Review request for hive and Thejas Nair. Changes --- Changed some logging to trace level Bugs: HIVE-19126 https://issues.apache.org/jira/browse/HIVE-19126 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-19126 Diffs (updated) - llap-server/src/java/org/apache/hadoop/hive/llap/IncrementalObjectSizeEstimator.java 6f4ec6f1ea llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java 2f7fa24558 llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java 0bbaf7e459 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java c47856de87 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 89b400697b standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java f007261daf standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/SizeValidator.java PRE-CREATION Diff: https://reviews.apache.org/r/66503/diff/5/ Changes: https://reviews.apache.org/r/66503/diff/4-5/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 66503: HIVE-19126: CachedStore: Use memory estimation to limit cache size during prewarm
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66503/ --- (Updated April 11, 2018, 10:38 p.m.) Review request for hive and Thejas Nair. Bugs: HIVE-19126 https://issues.apache.org/jira/browse/HIVE-19126 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-19126 Diffs (updated) - llap-server/src/java/org/apache/hadoop/hive/llap/IncrementalObjectSizeEstimator.java 6f4ec6f1ea llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java 2f7fa24558 llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java 0bbaf7e459 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java c47856de87 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 89b400697b standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java f007261daf standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/SizeValidator.java PRE-CREATION Diff: https://reviews.apache.org/r/66503/diff/4/ Changes: https://reviews.apache.org/r/66503/diff/3-4/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 66503: HIVE-19126: CachedStore: Use memory estimation to limit cache size during prewarm
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66503/ --- (Updated April 10, 2018, 12:25 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-19126 https://issues.apache.org/jira/browse/HIVE-19126 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-19126 Diffs (updated) - llap-server/src/java/org/apache/hadoop/hive/llap/IncrementalObjectSizeEstimator.java 6f4ec6f1ea llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java 2f7fa24558 llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java 0bbaf7e459 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java c47856de87 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 89b400697b standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 940a1bf276 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/SizeValidator.java PRE-CREATION Diff: https://reviews.apache.org/r/66503/diff/3/ Changes: https://reviews.apache.org/r/66503/diff/2-3/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 66503: HIVE-19126: CachedStore: Use memory estimation to limit cache size during prewarm
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66503/ --- (Updated April 9, 2018, 10:42 p.m.) Review request for hive and Thejas Nair. Bugs: HIVE-19126 https://issues.apache.org/jira/browse/HIVE-19126 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-19126 Diffs (updated) - llap-server/src/java/org/apache/hadoop/hive/llap/IncrementalObjectSizeEstimator.java 6f4ec6f1ea llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java 2f7fa24558 llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java 0bbaf7e459 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java c47856de87 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 89b400697b standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 940a1bf276 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/SizeValidator.java PRE-CREATION Diff: https://reviews.apache.org/r/66503/diff/2/ Changes: https://reviews.apache.org/r/66503/diff/1-2/ Testing --- Thanks, Vaibhav Gumashta
Review Request 66503: HIVE-19126: CachedStore: Use memory estimation to limit cache size during prewarm
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66503/ --- Review request for hive and Thejas Nair. Bugs: HIVE-19126 https://issues.apache.org/jira/browse/HIVE-19126 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-19126 Diffs - llap-server/src/java/org/apache/hadoop/hive/llap/IncrementalObjectSizeEstimator.java 6f4ec6f1ea llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java 2f7fa24558 llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java 0bbaf7e459 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java c47856de87 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 89b400697b standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 995137f967 Diff: https://reviews.apache.org/r/66503/diff/1/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-19126) CachedStore: Use memory estimation to limit cache size during prewarm
Vaibhav Gumashta created HIVE-19126: --- Summary: CachedStore: Use memory estimation to limit cache size during prewarm Key: HIVE-19126 URL: https://issues.apache.org/jira/browse/HIVE-19126 Project: Hive Issue Type: Sub-task Components: Metastore Affects Versions: 3.0.0 Reporter: Vaibhav Gumashta We can rely on https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/IncrementalObjectSizeEstimator.java to estimate memory of SharedCache. This jira addresses the size estimation during prewarm, so that we can stop when we hit the memory limit. In a follow-up jira, we will work on estimation/eviction after prewarm is complete, so that we can keep the frequently used tables and their partitions in cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 66185: JDBC: Provide an option to simplify beeline usage by supporting default and named URL for beeline
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66185/ --- (Updated April 2, 2018, 5:04 a.m.) Review request for hive, Thejas Nair and Vihang Karajgaonkar. Bugs: HIVE-18963 https://issues.apache.org/jira/browse/HIVE-18963 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18963 Diffs (updated) - beeline/src/java/org/apache/hive/beeline/BeeLine.java 4928761565 beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineConfFileParseException.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineHS2ConnectionFileParseException.java acddf82a67 beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineSiteParseException.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineSiteParser.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/hs2connection/HS2ConnectionFileParser.java b769e8581f beeline/src/java/org/apache/hive/beeline/hs2connection/HS2ConnectionFileUtils.java f635b40633 beeline/src/java/org/apache/hive/beeline/hs2connection/UserHS2ConnectionFileParser.java 2801ebee09 beeline/src/main/resources/BeeLine.properties 6fca953836 beeline/src/test/org/apache/hive/beeline/hs2connection/TestUserHS2ConnectionFileParser.java 1d17887417 beeline/src/test/resources/test-hs2-named-connection-config.xml PRE-CREATION itests/hive-unit/src/test/java/org/apache/hive/beeline/hs2connection/BeelineWithHS2ConnectionFileTestBase.java 3da31ad8a9 jdbc/src/java/org/apache/hive/jdbc/Utils.java 6d7787da7d Diff: https://reviews.apache.org/r/66185/diff/3/ Changes: https://reviews.apache.org/r/66185/diff/2-3/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 66185: JDBC: Provide an option to simplify beeline usage by supporting default and named URL for beeline
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66185/ --- (Updated March 28, 2018, 8:27 p.m.) Review request for hive, Thejas Nair and Vihang Karajgaonkar. Bugs: HIVE-18963 https://issues.apache.org/jira/browse/HIVE-18963 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18963 Diffs (updated) - beeline/src/java/org/apache/hive/beeline/BeeLine.java 402fae beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineConfFileParseException.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineHS2ConnectionFileParseException.java acddf82a67 beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineSiteParseException.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineSiteParser.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/hs2connection/HS2ConnectionFileParser.java b769e8581f beeline/src/java/org/apache/hive/beeline/hs2connection/HS2ConnectionFileUtils.java f635b40633 beeline/src/java/org/apache/hive/beeline/hs2connection/UserHS2ConnectionFileParser.java 2801ebee09 beeline/src/main/resources/BeeLine.properties 6fca953836 beeline/src/test/org/apache/hive/beeline/hs2connection/TestUserHS2ConnectionFileParser.java 1d17887417 beeline/src/test/resources/test-hs2-named-connection-config.xml PRE-CREATION itests/hive-unit/src/test/java/org/apache/hive/beeline/hs2connection/BeelineWithHS2ConnectionFileTestBase.java 3da31ad8a9 jdbc/src/java/org/apache/hive/jdbc/Utils.java 6d7787da7d Diff: https://reviews.apache.org/r/66185/diff/2/ Changes: https://reviews.apache.org/r/66185/diff/1-2/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 66185: JDBC: Provide an option to simplify beeline usage by supporting default and named URL for beeline
> On March 22, 2018, 4:22 p.m., Vihang Karajgaonkar wrote: > > I am a bit confused here. If the full url can be provided in the config > > file by the user, how is it better than just creating a environment > > variable like BEELINE_URL_ and use it instead of adding it in the > > config file? I think the objective of this config file was to automatically > > figure out the connection url based on hive-site.xml and the additional > > beeline-hs2-connection.xml to override/augment the information from > > hive-site.xml > > > > The current code is structured such that all keys start with > > beeline.hs2.connection. and components of the url are parsed automatically > > using the values of those keys. If we want to add full support of named > > urls which can have completely different url components like session vars > > etc, what do you think of adding a new prefix key of the form > > beeline.hs2.connection. and then the existing code will work exactly > > like it does currently but instead will parse the keys starting with > > beeline.hs2.connection.. For example, a named url called "blue" will > > be constructed using all the keys from beeline.hs2.connection.blue. That > > way we reuse existing logic. The beeline will be invoked like beeline -c > > blue. Do you see any problems with this approach? This way the user doesn't > > have to provide all the url components which can be reused from > > hive-site.xml (like the nasty ssl, kerberos settings) Thanks for your feedback. Discussed a bit offline with Thejas as well. Let me add more details on the use case: Suppose you have 2 different sets of HS2 instances running on the cluster, a beeline shell will only be able to parse one hive-site.xml (set 1 for example). To be able to connect to set 2, it would be nice to have an installer (something like Apache Ambari) managed beeline-site.xml, which can publish the named urls (and also regenerate the named urls if the admin makes any change in the cluster manager), which can be used by the beeline shell. Once the base connection url is figured out, beeline-hs2-connection.xml can then be used to overlay user specific driver configs like it is doing right now. Hope that clarifies the use case. I'll post an updated patch based on your feedback above. - Vaibhav --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66185/#review199770 --- On March 20, 2018, 10:54 p.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66185/ > --- > > (Updated March 20, 2018, 10:54 p.m.) > > > Review request for hive, Thejas Nair and Vihang Karajgaonkar. > > > Bugs: HIVE-18963 > https://issues.apache.org/jira/browse/HIVE-18963 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-18963 > > > Diffs > - > > beeline/src/java/org/apache/hive/beeline/BeeLine.java 402fae > > beeline/src/java/org/apache/hive/beeline/hs2connection/HS2ConnectionFileParser.java > b769e8581f > > beeline/src/java/org/apache/hive/beeline/hs2connection/HS2ConnectionFileUtils.java > f635b40633 > > beeline/src/java/org/apache/hive/beeline/hs2connection/UserHS2ConnectionFileParser.java > 2801ebee09 > beeline/src/main/resources/BeeLine.properties 6fca953836 > > beeline/src/test/org/apache/hive/beeline/hs2connection/TestUserHS2ConnectionFileParser.java > 1d17887417 > beeline/src/test/resources/test-hs2-named-connection-config.xml > PRE-CREATION > > itests/hive-unit/src/test/java/org/apache/hive/beeline/hs2connection/BeelineWithHS2ConnectionFileTestBase.java > 3da31ad8a9 > > > Diff: https://reviews.apache.org/r/66185/diff/1/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
Review Request 66185: JDBC: Provide an option to simplify beeline usage by supporting default and named URL for beeline
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66185/ --- Review request for hive, Thejas Nair and Vihang Karajgaonkar. Bugs: HIVE-18963 https://issues.apache.org/jira/browse/HIVE-18963 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18963 Diffs - beeline/src/java/org/apache/hive/beeline/BeeLine.java 402fae beeline/src/java/org/apache/hive/beeline/hs2connection/HS2ConnectionFileParser.java b769e8581f beeline/src/java/org/apache/hive/beeline/hs2connection/HS2ConnectionFileUtils.java f635b40633 beeline/src/java/org/apache/hive/beeline/hs2connection/UserHS2ConnectionFileParser.java 2801ebee09 beeline/src/main/resources/BeeLine.properties 6fca953836 beeline/src/test/org/apache/hive/beeline/hs2connection/TestUserHS2ConnectionFileParser.java 1d17887417 beeline/src/test/resources/test-hs2-named-connection-config.xml PRE-CREATION itests/hive-unit/src/test/java/org/apache/hive/beeline/hs2connection/BeelineWithHS2ConnectionFileTestBase.java 3da31ad8a9 Diff: https://reviews.apache.org/r/66185/diff/1/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-18963) JDBC: Provide an option to simplify beeline usage
Vaibhav Gumashta created HIVE-18963: --- Summary: JDBC: Provide an option to simplify beeline usage Key: HIVE-18963 URL: https://issues.apache.org/jira/browse/HIVE-18963 Project: Hive Issue Type: Bug Components: Beeline Reporter: Vaibhav Gumashta Currently, after opening Beeline CLI, the user needs to supply a connection string to use the HS2 instance and set up the jdbc driver. Since we plan to replace Hive CLI with Beeline in future (HIVE-10511), it will help the usability if the user can simply type {{beeline}} and get start the hive session. The jdbc url can be specified in a beeline-site.xml (which can contain other named jdbc urls as well, and they can be accessed by something like: {{beeline -c namedUrl}}. The use of beeline-site.xml can also be potentially expanded later if needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65634/ --- (Updated March 9, 2018, 9 p.m.) Review request for hive, Daniel Dai and Thejas Nair. Bugs: HIVE-18264 https://issues.apache.org/jira/browse/HIVE-18264 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18264 Diffs (updated) - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java a3725c5395 service/src/java/org/apache/hive/service/server/HiveServer2.java 86c9c2b33c standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ac71d0882f standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 7b44df4128 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java f500d63725 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java f0f650ddcf standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 0d132f2074 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 32ea17495f standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 50f873a013 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 75ea8c4a77 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 207d842f94 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ab6feb6f0b standalone-metastore/src/test/resources/log4j2.properties 365687e1c9 Diff: https://reviews.apache.org/r/65634/diff/5/ Changes: https://reviews.apache.org/r/65634/diff/4-5/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache
634/diff/4/?file=1968310#file1968310line292> > > > > Please document this method. Among other things - can prewarm() be > > called multiple times? If not, should it be somehow enforced? Have enforced that. Currently it was being called just once from the background thread. > On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > > Line 271 (original), 207 (patched) > > <https://reviews.apache.org/r/65634/diff/4/?file=1968310#file1968310line297> > > > > Please remove and add explicit size: > > > > `List databases = new ArrayList<>(dbNames.size());` I had kept it for Java 6 compatibility, but looks like Hive2+ doesn't support it. Removed > On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > > Line 285 (original), 217 (patched) > > <https://reviews.apache.org/r/65634/diff/4/?file=1968310#file1968310line318> > > > > It is quite possible that there is another metastore instance running > > and someone removes this database, so this call will fail due to missing > > database. I think this code should continue prewarm for other databases in > > such cases. rawStore.getAllTables(dbName) should return an empty list, so processing will continue. > On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > > Line 368 (original), 305 (patched) > > <https://reviews.apache.org/r/65634/diff/4/?file=1968310#file1968310line419> > > > > ANy reason not to use lambda here? Not really, just a matter of preference here. > On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > > Line 746 (original), 595 (patched) > > <https://reviews.apache.org/r/65634/diff/4/?file=1968310#file1968310line853> > > > > Wouldn;t dropDatabase also throw exception if it fails? Yes, we're throwing it if we get an exception from rawStore.dropDatabase > On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > > Line 758 (original), 599 (patched) > > <https://reviews.apache.org/r/65634/diff/4/?file=1968310#file1968310line865> > > > > This is rather useless since in case of failure tu'll throw > > MetaException anyway. Here we are working with the Rawstore API which returns a boolean which can potentially be false. In that case we don't want to work on SharedCache. The underlying ObjectStore implementation may change in future > On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > > Line 902 (original), 701 (patched) > > <https://reviews.apache.org/r/65634/diff/4/?file=1968310#file1968310line1013> > > > > tbl would never be null, there will be an exception if the call above > > fails. If the table is not yet in the cache (cache not prewarmed yet), sharedCache.getTableFromCache will return null. In that case we would like to serve the call from the metastore db. - Vaibhav --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65634/#review198507 --- On March 1, 2018, 11:09 a.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65634/ > --- > > (Updated March 1, 2018, 11:09 a.m.) > > > Review request for hive, Daniel Dai and Thejas Nair. > > > Bugs: HIVE-18264 > https://issues.apache.org/jira/browse/HIVE-18264 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-18264 > > > Diffs > - > > > itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java > a3725c5395 > service/src/java/org/apache/hive/service/server/HiveServer2.java 86c9c2b33c > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > ac71d0882f > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 7b44df4128 >
[jira] [Created] (HIVE-18847) CachedStore: Investigate TestCachedStore#testTableColStatsOps
Vaibhav Gumashta created HIVE-18847: --- Summary: CachedStore: Investigate TestCachedStore#testTableColStatsOps Key: HIVE-18847 URL: https://issues.apache.org/jira/browse/HIVE-18847 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0 Reporter: Vaibhav Gumashta Currently commented out due to ObjectStore.updateTableColumnStatistics call unable to persist stats to derby. Needs investigation -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18840) CachedStore: Prioritize loading of recently accessed tables during prewarm
Vaibhav Gumashta created HIVE-18840: --- Summary: CachedStore: Prioritize loading of recently accessed tables during prewarm Key: HIVE-18840 URL: https://issues.apache.org/jira/browse/HIVE-18840 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0 Reporter: Vaibhav Gumashta On clusters with large metadata, prewarming the cache can take several hours. Now that CachedStore does not block on prewarm anymore (after HIVE-18264), we should prioritize loading of recently accessed tables during prewarm. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65634/ --- (Updated March 1, 2018, 11:09 a.m.) Review request for hive, Daniel Dai and Thejas Nair. Bugs: HIVE-18264 https://issues.apache.org/jira/browse/HIVE-18264 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18264 Diffs (updated) - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java a3725c5395 service/src/java/org/apache/hive/service/server/HiveServer2.java 86c9c2b33c standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ac71d0882f standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 7b44df4128 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java f500d63725 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java f0f650ddcf standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 0d132f2074 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 32ea17495f standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 50f873a013 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 75ea8c4a77 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 207d842f94 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ab6feb6f0b standalone-metastore/src/test/resources/log4j2.properties 365687e1c9 Diff: https://reviews.apache.org/r/65634/diff/4/ Changes: https://reviews.apache.org/r/65634/diff/3-4/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65634/ --- (Updated Feb. 26, 2018, 9:47 p.m.) Review request for hive, Daniel Dai and Thejas Nair. Bugs: HIVE-18264 https://issues.apache.org/jira/browse/HIVE-18264 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18264 Diffs (updated) - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java a3725c5395 service/src/java/org/apache/hive/service/server/HiveServer2.java 6c1a0b98cc standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c6e34a8a22 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 7b44df4128 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java f500d63725 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java f0f650ddcf standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 0d132f2074 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 32ea17495f standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 50f873a013 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 75ea8c4a77 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 207d842f94 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ab6feb6f0b standalone-metastore/src/test/resources/log4j2.properties 365687e1c9 Diff: https://reviews.apache.org/r/65634/diff/3/ Changes: https://reviews.apache.org/r/65634/diff/2-3/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65634/ --- (Updated Feb. 23, 2018, 8:14 p.m.) Review request for hive, Daniel Dai and Thejas Nair. Bugs: HIVE-18264 https://issues.apache.org/jira/browse/HIVE-18264 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18264 Diffs (updated) - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java a3725c5395 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 7b44df4128 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java f500d63725 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java f0f650ddcf standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 0d132f2074 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 32ea17495f standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 50f873a013 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 75ea8c4a77 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 207d842f94 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ab6feb6f0b Diff: https://reviews.apache.org/r/65634/diff/2/ Changes: https://reviews.apache.org/r/65634/diff/1-2/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache
> On Feb. 21, 2018, 10:07 p.m., Daniel Dai wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > > Line 311 (original), 215 (patched) > > <https://reviews.apache.org/r/65634/diff/1/?file=1958990#file1958990line316> > > > > This is not introduced in this patch, but getting columns for table and > > apply to partition will not work for schema revolution. We shall get > > columns for every individual partition. I agree, but not sure if current stats works with schema evolution. Let me take this up in a follow up jira as this might need a little more thought. > On Feb. 21, 2018, 10:07 p.m., Daniel Dai wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > > Line 800 (original), 632 (patched) > > <https://reviews.apache.org/r/65634/diff/1/?file=1958990#file1958990line881> > > > > I don't remember but why this is get() not getUnsafe()? It sounds the > > same as getAllTables etc. Also apply to getDatabases, alterDatabase, > > dropDatabase, getDatabase and createDatabase We're using get() here so that this call blocks till the database cache is populated. We're letting reads go through the cache while the tables are getting populated, but not for databases. Let me know if you think otherwise. - Vaibhav --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65634/#review197794 --- On Feb. 13, 2018, 12:08 p.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65634/ > --- > > (Updated Feb. 13, 2018, 12:08 p.m.) > > > Review request for hive, Daniel Dai and Thejas Nair. > > > Bugs: HIVE-18264 > https://issues.apache.org/jira/browse/HIVE-18264 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-18264 > > > Diffs > - > > > itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java > 78b26374f2 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java > d58ed677f3 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java > e4e7d4239d > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java > f0f650ddcf > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > 80aa3bcdb4 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java > 32ea17495f > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java > 9100c73beb > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java > 86e72d8d76 > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java > bd61df654a > > > Diff: https://reviews.apache.org/r/65634/diff/1/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65634/ --- Review request for hive, Daniel Dai and Thejas Nair. Bugs: HIVE-18264 https://issues.apache.org/jira/browse/HIVE-18264 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18264 Diffs - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 78b26374f2 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java d58ed677f3 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java e4e7d4239d standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java f0f650ddcf standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 80aa3bcdb4 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 32ea17495f standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 9100c73beb standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 86e72d8d76 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java bd61df654a Diff: https://reviews.apache.org/r/65634/diff/1/ Testing --- Thanks, Vaibhav Gumashta
Re: Question on CachedStore cache update
Hi Alan, To add to Daniel’s response, as part of https://issues.apache.org/jira/browse/HIVE-18264 and https://issues.apache.org/jira/browse/HIVE-18661 (I’m actively working on these), we plan to remove the current mechanism of updating the cache (which is very inefficient anyway) and instead use the NOTIFICATION_LOG table to update the cache incrementally. The code that you pointed was meant to not let the background update thread block the metastore client calls for a long time, but with the plan to update the cache incrementally we may not need to worry about that, as applying the notification incrementally will not be a long blocking execution. Thanks, --Vaibhav On 2/8/18, 11:41 AM, "Daniel Dai"wrote: Hi, Alan, If database cache is changed locally, we don’t want to bring remote copy to overwrite it as the remote copy doesn’t carry local changes (ideally, we shall also apply local changes to the remote copy images we bring in from db, but we are not there yet). That’s why we skip the update if there’s local changes, and wait for the next iteration to sync with remote. isDatabaseCacheDirty is initially set to false unless there’s local update, and will be reset during cache swap, thus give a chance for the next iteration to update the cache if there’s no local changes. Thanks, Daniel On 2/6/18, 11:57 AM, "Alan Gates" wrote: I’m confused by the following code in the CachedStore. This in in the CacheUpdateMasterWork thread, in the updateDatabases method (which is called by update()): *// Skip background updates if we detect change* *if *(*isDatabaseCacheDirty*.compareAndSet(*true*, *false*)) { *LOG*.debug(*"Skipping database cache update; the database list we have is dirty."*); *return*; } Why are we not updating the cache if we’ve dirtied it? Also, AFAICT no one ever sets isDatabaseCacheDirty to false, meaning once one database is created the cache will never be updated. Am I missing something? Alan.
[jira] [Created] (HIVE-18661) CachedStore: Use metastore notification log events to update cache
Vaibhav Gumashta created HIVE-18661: --- Summary: CachedStore: Use metastore notification log events to update cache Key: HIVE-18661 URL: https://issues.apache.org/jira/browse/HIVE-18661 Project: Hive Issue Type: Bug Components: Metastore Reporter: Vaibhav Gumashta Currently, a background thread updates the entire cache which is pretty inefficient. We capture the updates to metadata in NOTIFICATION_LOG table which is getting used in the Replication work. We should have the background thread apply these notifications to incrementally update the cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 65271: JDBC: Provide a way for JDBC users to pass cookie info via connection string
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65271/ --- (Updated Jan. 31, 2018, 10:50 p.m.) Review request for hive and Thejas Nair. Bugs: HIVE-18447 https://issues.apache.org/jira/browse/HIVE-18447 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18447 Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hive/service/cli/thrift/TestThriftHttpCLIServiceFeatures.java 93b10fb4b4 jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cb2f09cbf2 jdbc/src/java/org/apache/hive/jdbc/HttpBasicAuthInterceptor.java 5d2ddb5c21 jdbc/src/java/org/apache/hive/jdbc/HttpKerberosRequestInterceptor.java 37862be804 jdbc/src/java/org/apache/hive/jdbc/HttpRequestInterceptorBase.java cf1a11ecb6 jdbc/src/java/org/apache/hive/jdbc/HttpTokenAuthInterceptor.java 59a91dd14c jdbc/src/java/org/apache/hive/jdbc/Utils.java f7f3854b86 Diff: https://reviews.apache.org/r/65271/diff/2/ Changes: https://reviews.apache.org/r/65271/diff/1-2/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-18528) Stats: In the bitvector codepath, when extrapolating column stats for String type columnStringColumnStatsAggregator uses the min value instead of max
Vaibhav Gumashta created HIVE-18528: --- Summary: Stats: In the bitvector codepath, when extrapolating column stats for String type columnStringColumnStatsAggregator uses the min value instead of max Key: HIVE-18528 URL: https://issues.apache.org/jira/browse/HIVE-18528 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0 Reporter: Vaibhav Gumashta This line: [https://github.com/apache/hive/blob/456a65180dcb84f69f26b4c9b9265165ad16dfe4/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java#L181] Should be: aggregateData.setAvgColLen(Math.max(aggregateData.getAvgColLen(), newData.getAvgColLen())); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Review Request 65271: JDBC: Provide a way for JDBC users to pass cookie info via connection string
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65271/ --- Review request for hive and Thejas Nair. Bugs: HIVE-18447 https://issues.apache.org/jira/browse/HIVE-18447 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-18447 Diffs - itests/hive-unit/src/test/java/org/apache/hive/service/cli/thrift/TestThriftHttpCLIServiceFeatures.java 1911d2ce17 jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java b5d289e023 jdbc/src/java/org/apache/hive/jdbc/HttpBasicAuthInterceptor.java 8cb7a69df7 jdbc/src/java/org/apache/hive/jdbc/HttpKerberosRequestInterceptor.java 3509cab775 jdbc/src/java/org/apache/hive/jdbc/HttpRequestInterceptorBase.java 68b0ff19b8 jdbc/src/java/org/apache/hive/jdbc/HttpTokenAuthInterceptor.java 207ed9e663 jdbc/src/java/org/apache/hive/jdbc/Utils.java 1c1f644c92 Diff: https://reviews.apache.org/r/65271/diff/1/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-18447) JDBC: Provide a way for JDBC users to pass cookie info via connection string
Vaibhav Gumashta created HIVE-18447: --- Summary: JDBC: Provide a way for JDBC users to pass cookie info via connection string Key: HIVE-18447 URL: https://issues.apache.org/jira/browse/HIVE-18447 Project: Hive Issue Type: Bug Components: JDBC Reporter: Vaibhav Gumashta Some authentication mechanisms like Single Sign On, need the ability to pass a cookie to some intermediate authentication service like Knox via the JDBC driver. We need to add the mechanism in Hive's JDBC driver (when used in HTTP transport mode). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Review Request 62228: HIVE-17495: CachedStore: prewarm improvements, refactoring and caching some aggregate stats
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62228/ --- (Updated Dec. 22, 2017, 11:13 p.m.) Review request for hive, Ashutosh Chauhan, Daniel Dai, and Thejas Nair. Bugs: HIVE-17495 https://issues.apache.org/jira/browse/HIVE-17495 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-17495 Diffs (updated) - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 6dc052db45 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 0aa1d4e16a standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 14653b4043 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b708fae7ec standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 8af96db0bc standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java ab6b90fb6b standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9856f8a195 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java b606779709 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BinaryColumnStatsAggregator.java 45d5d8c984 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BooleanColumnStatsAggregator.java 8aac0fe33d standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregator.java cd0392d6c0 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DateColumnStatsAggregator.java 7f2956152c standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DecimalColumnStatsAggregator.java 05c0280262 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DoubleColumnStatsAggregator.java faf22dcd7c standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java d12cdc08ea standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java 4539e6b026 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 8bc4ce752e standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java e59e3496bf standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 8be099cbcb Diff: https://reviews.apache.org/r/62228/diff/6/ Changes: https://reviews.apache.org/r/62228/diff/5-6/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Created] (HIVE-18264) CachedStore: Store cached partitions within the table cache
Vaibhav Gumashta created HIVE-18264: --- Summary: CachedStore: Store cached partitions within the table cache Key: HIVE-18264 URL: https://issues.apache.org/jira/browse/HIVE-18264 Project: Hive Issue Type: Bug Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Currently we have a separate cache for partitions and partition col stats which results in some calls iterating through each of these for retrieving/updating. We can get better performance by organizing hierarchically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Review Request 62228: HIVE-17495: CachedStore: prewarm improvements, refactoring and caching some aggregate stats
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62228/ --- (Updated Dec. 8, 2017, 12:06 a.m.) Review request for hive, Ashutosh Chauhan, Daniel Dai, and Thejas Nair. Changes --- Rebased on master Bugs: HIVE-17495 https://issues.apache.org/jira/browse/HIVE-17495 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-17495 Diffs (updated) - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 62c9172ef5 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java f344c47443 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 14653b4043 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 2e80c9d3b1 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 75fbfa23d2 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java ab6b90fb6b standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java da518ab6e3 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java b606779709 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BinaryColumnStatsAggregator.java 45d5d8c984 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BooleanColumnStatsAggregator.java 8aac0fe33d standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregator.java cd0392d6c0 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DateColumnStatsAggregator.java 7f2956152c standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DecimalColumnStatsAggregator.java 05c0280262 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DoubleColumnStatsAggregator.java faf22dcd7c standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java d12cdc08ea standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java 4539e6b026 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java cde34bcf42 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 24c59f2f1b standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 1e4fe5d973 Diff: https://reviews.apache.org/r/62228/diff/5/ Changes: https://reviews.apache.org/r/62228/diff/4-5/ Testing --- Thanks, Vaibhav Gumashta