[jira] [Created] (HIVE-24880) Add host and version information to compection queue
Peter Varga created HIVE-24880: -- Summary: Add host and version information to compection queue Key: HIVE-24880 URL: https://issues.apache.org/jira/browse/HIVE-24880 Project: Hive Issue Type: Sub-task Reporter: Peter Varga Assignee: Peter Varga The Initiator host and version should be added to compaction and completed compaction queue. The worker version should be added to compaction and completed compaction queue. They should be available in sys tables and view. The version should come from the runtime version (not the schema): Initiator.class.getPackage().getImplementationVersion() works on clusters (hive exec has manifest), might not work in unit tests. This would make it possible to create checks on use cases like these: * multiple hosts are running Initiator ** in some scenarios with different runtime verion * the worker and initiator runtime version are not the same -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24825) Create AcidMetricsService
Peter Varga created HIVE-24825: -- Summary: Create AcidMetricsService Key: HIVE-24825 URL: https://issues.apache.org/jira/browse/HIVE-24825 Project: Hive Issue Type: Sub-task Reporter: Peter Varga Assignee: Peter Varga Create a new service in HMS, that will collect and publish JMX metrics about ACID related processes and metadata. * There should be a subconfig other than METRICS_ENABLED for acid metrics * The collection frequency should be configurable * The existing oldest initiated compaction and the number of compactions in different statuses metrics collection should be moved here from Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24824) Define metrics for compaction observability
Peter Varga created HIVE-24824: -- Summary: Define metrics for compaction observability Key: HIVE-24824 URL: https://issues.apache.org/jira/browse/HIVE-24824 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Many times if there are failures in the Compaction background processes (Initiator, Worker, Cleaner) it is hard notice the problem until it causes serious performance degradation. We should create new JMX metrics, that would make it easier to monitor the compaction health. Examples are: * number of failed / initiated compaction * number of aborted txns, oldest aborted txns * tables with disabled compactions and writes * Initiator and Cleaner cycle runtime * Size of ACID metadata tables that should have ~ constant rows (txn_to_writeId, completed_txns) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24785) Fix HIVE_COMPACTOR_COMPACT_MM property
Peter Varga created HIVE-24785: -- Summary: Fix HIVE_COMPACTOR_COMPACT_MM property Key: HIVE-24785 URL: https://issues.apache.org/jira/browse/HIVE-24785 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga Currently it will disable query based compaction for mm tables, but the Worker will fall back to MR based compaction which is not implemented for mm tables. This property should disable compaction in the Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24738) Reuse committed filelist from directInsert manifest during loadPartition
Peter Varga created HIVE-24738: -- Summary: Reuse committed filelist from directInsert manifest during loadPartition Key: HIVE-24738 URL: https://issues.apache.org/jira/browse/HIVE-24738 Project: Hive Issue Type: Sub-task Reporter: Peter Varga Assignee: Peter Varga This way the costly FileSystem listing can be avoided -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24682) Collect dynamic partition info in FileSink for direct insert and reuse it in Movetask
Peter Varga created HIVE-24682: -- Summary: Collect dynamic partition info in FileSink for direct insert and reuse it in Movetask Key: HIVE-24682 URL: https://issues.apache.org/jira/browse/HIVE-24682 Project: Hive Issue Type: Sub-task Reporter: Peter Varga Assignee: Peter Varga The dynamic partition infos can be collected from the manifest files, no need to do a costly file listing later -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24679) Reuse FullDPSpecs in loadDynamicPartitions to avoid double listing
Peter Varga created HIVE-24679: -- Summary: Reuse FullDPSpecs in loadDynamicPartitions to avoid double listing Key: HIVE-24679 URL: https://issues.apache.org/jira/browse/HIVE-24679 Project: Hive Issue Type: Sub-task Reporter: Peter Varga Assignee: Peter Varga -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal
Peter Varga created HIVE-24669: -- Summary: Improve Filesystem usage in Hive::loadPartitionInternal Key: HIVE-24669 URL: https://issues.apache.org/jira/browse/HIVE-24669 Project: Hive Issue Type: Sub-task Reporter: Peter Varga Assignee: Peter Varga * Use native recursive listing instead doing it on the Hive side * Reuse the file list determined for writeNotificationlogs in quickstat generation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24668) Improve FileSystem usage in dynamic partition handling
Peter Varga created HIVE-24668: -- Summary: Improve FileSystem usage in dynamic partition handling Key: HIVE-24668 URL: https://issues.apache.org/jira/browse/HIVE-24668 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Possible improvements: * In the Movetask process both getFullDPSpecs and later Hive::getValidPartitionsInPath do a listing for dynamic partitions in the table, the result of the first can be reused * Hive::listFilesCreatedByQuery does the recursive listing on Hive side, the native recursive listing should be used * if we add a new partition we populate the quickstats, that will do another listing for the new partition, the files are already collected for the writeNotificationlogs, that can be used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24655) Improve FileSystem usage in OrcRawRecordMerger
Peter Varga created HIVE-24655: -- Summary: Improve FileSystem usage in OrcRawRecordMerger Key: HIVE-24655 URL: https://issues.apache.org/jira/browse/HIVE-24655 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Some minor performance improvements: * Remove exists calls and catch the FileNotFound exception since the exists call just does the same * Remove unnecessary delta parsing and file listing when processing deltas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24643) Access Operation state directly where possible
Peter Varga created HIVE-24643: -- Summary: Access Operation state directly where possible Key: HIVE-24643 URL: https://issues.apache.org/jira/browse/HIVE-24643 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Operation state is accessed during query execution by calling operation.getStatus, that will serialise TaskStatuses into Json, which can we skipped if only the state is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24630) clean up multiple parseDelta implementation in AcidUtils
Peter Varga created HIVE-24630: -- Summary: clean up multiple parseDelta implementation in AcidUtils Key: HIVE-24630 URL: https://issues.apache.org/jira/browse/HIVE-24630 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga * Remove code duplication * Use ParsedDeltaLight everywhere where rawformat is not used, because parsing that is cheaper -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24615) Remove unnecessary FileSystem listing from Initiator
Peter Varga created HIVE-24615: -- Summary: Remove unnecessary FileSystem listing from Initiator Key: HIVE-24615 URL: https://issues.apache.org/jira/browse/HIVE-24615 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga AcidUtils already returns the file list in base and delta directories if it does recursive listing on S3, listing those directories can be removed from the Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24602) Retry compaction after configured time
Peter Varga created HIVE-24602: -- Summary: Retry compaction after configured time Key: HIVE-24602 URL: https://issues.apache.org/jira/browse/HIVE-24602 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Currently if compaction fails two consecutive times it will stop compaction forever for the given partition / table unless someone manually intervenes. See COMPACTOR_INITIATOR_FAILED_THRESHOLD. The Initiator should retry again after a configurable amount of time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24581) Remove AcidUtils call from OrcInputformat for non transactional tables
Peter Varga created HIVE-24581: -- Summary: Remove AcidUtils call from OrcInputformat for non transactional tables Key: HIVE-24581 URL: https://issues.apache.org/jira/browse/HIVE-24581 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Currently the split generation in OrcInputformat is tightly coupled with acid and AcidUtils.getAcidState is called even if the table is not transactional. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24548) CompactionHeartbeater leaks metastore connections
Peter Varga created HIVE-24548: -- Summary: CompactionHeartbeater leaks metastore connections Key: HIVE-24548 URL: https://issues.apache.org/jira/browse/HIVE-24548 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga Every Heartbeater thread creates a new metastore client, that is never closed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24535) Cleanup AcidUtils.Directory and remove unnecessary filesystem listings
Peter Varga created HIVE-24535: -- Summary: Cleanup AcidUtils.Directory and remove unnecessary filesystem listings Key: HIVE-24535 URL: https://issues.apache.org/jira/browse/HIVE-24535 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga * AcidUtils.getAcidState is doing a recursive listing on S3 FileSystem, it already knows the content of each delta and base directory, this could be returned to OrcInputFormat, to avoid listing each delta directory again there. * AcidUtils.getAcidstate submethods are collecting more and more infos about the state of the data directory. This could be done directly to the final Directory object to avoid 10+ parameters in methods. * AcidUtils.Directory, OrcInputFormat.AcidDirInfo and AcidUtils.TxnBase can be merged to one class, to clean up duplications. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24516) Txnhandler onrename might ignore exceptions
Peter Varga created HIVE-24516: -- Summary: Txnhandler onrename might ignore exceptions Key: HIVE-24516 URL: https://issues.apache.org/jira/browse/HIVE-24516 Project: Hive Issue Type: Bug Components: Hive Reporter: Peter Varga Assignee: Peter Varga This is a followup on HIVE-24193. Table not exists errors shouldn't be ignored in the first place. {code} } catch (SQLException e) { LOG.debug("Going to rollback: " + callSig); rollbackDBConn(dbConn); checkRetryable(dbConn, e, callSig); if (e.getMessage().contains("does not exist")) { LOG.warn("Cannot perform " + callSig + " since metastore table does not exist"); } else { throw new MetaException("Unable to " + callSig + ":" + StringUtils.stringifyException(e)); } } {code} This error handling might have been put there for backard compatibility for missing acid metadata tables, but this is not needed anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24481) Skipped compaction can cause data corruption with streaming
Peter Varga created HIVE-24481: -- Summary: Skipped compaction can cause data corruption with streaming Key: HIVE-24481 URL: https://issues.apache.org/jira/browse/HIVE-24481 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga Timeline: 1. create a partitioned table, add one static partition 2. transaction 1 writes delta_1, and aborts 3. create streaming connection, with batch 3, withStaticPartitionValues with the existing partition 4. beginTransaction, write, commitTransaction 5. beginTransaction, write, abortTransaction 6. beingTransaction, write, commitTransaction 7. close connection, count of the table is 2 8. run manual minor compaction on the partition. it will skip compaction, because deltacount =1 but clean, because there is aborted txn1 9. cleaner will remove both aborted record from txn_components 10. wait for acidhousekeeper to remove empty aborted txns 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24477) Separate production and test code in TxnDbUtil
Peter Varga created HIVE-24477: -- Summary: Separate production and test code in TxnDbUtil Key: HIVE-24477 URL: https://issues.apache.org/jira/browse/HIVE-24477 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga This class was created as a test utility, but it is production package, since it is used in multiple projects. Now it is a mixed of test utility and production utility which is unfortunate. The production code could be moved to TxnUtils class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24403) change min_history_level schema change to be compatible with previous version
Peter Varga created HIVE-24403: -- Summary: change min_history_level schema change to be compatible with previous version Key: HIVE-24403 URL: https://issues.apache.org/jira/browse/HIVE-24403 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Peter Varga Assignee: Peter Varga In some configurations the HMS backend DB is used by HMS services with different versions. HIVE-23107 dropped the min_history_level table from the backend DB making the new schema version incompatible with the older HMS services. It is possible to modify that change to keep the compatibility * Keep the min_history_level table * Add the new fields for the compaction_queue the same way * Create a feature flag for min_history_level and if it is on * Keep the logic inserting to the table during openTxn * Change the logic in the cleaner, to get the highwatermark the old way * But still change it to not start the cleaning before that * Keep the min_history level delete after cleaner * Change the logic in AcidHouseKeeper to clean the txn_to_write_id the old way * This feature flag can be automatically setup based on the existence of the min_history level table, this way if the table will be dropped all HMS-s can switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24401) COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated
Peter Varga created HIVE-24401: -- Summary: COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated Key: HIVE-24401 URL: https://issues.apache.org/jira/browse/HIVE-24401 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga minor query based compaction is implemented -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24329) Add HMS notification for compaction commit
Peter Varga created HIVE-24329: -- Summary: Add HMS notification for compaction commit Key: HIVE-24329 URL: https://issues.apache.org/jira/browse/HIVE-24329 Project: Hive Issue Type: New Feature Reporter: Peter Varga Assignee: Peter Varga This could be used by file metadata caches, to invalidate the cache content -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas
Peter Varga created HIVE-24291: -- Summary: Compaction Cleaner prematurely cleans up deltas Key: HIVE-24291 URL: https://issues.apache.org/jira/browse/HIVE-24291 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga Since HIVE-23107 the cleaner can clean up deltas that are still used by running queries. Example: * TxnId 1-5 writes to a partition, all commits * Compactor starts with txnId=6 * Long running query starts with txnId=7, it sees txnId=6 as open in its snapshot * Compaction commits * Cleaner runs Previously min_history_level table would have prevented the Cleaner to delete the deltas1-5 until txnId=7 is open, but now they will be deleted and the long running query may fail if its tries to access the files. Solution could be to not run the cleaner until any txn is open that was opened before the compaction was committed (CQ_NEXT_TXN_ID) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24233) except subquery throws nullpointer with cbo disabled
Peter Varga created HIVE-24233: -- Summary: except subquery throws nullpointer with cbo disabled Key: HIVE-24233 URL: https://issues.apache.org/jira/browse/HIVE-24233 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga Except and intersect was only implemented with Calcite in HIVE-12764. If cbo is disabled it would just throw a nullpointer exception. We should at least throw a SemanticException stating this is not supported. Repro: set hive.cbo.enable=false; create table test(id int); insert into table test values(1); select id from test except select id from test; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24219) TestStreaming is flaky
Peter Varga created HIVE-24219: -- Summary: TestStreaming is flaky Key: HIVE-24219 URL: https://issues.apache.org/jira/browse/HIVE-24219 Project: Hive Issue Type: Bug Reporter: Peter Varga Seems like the HeartBeater threads did not get cleaned up and deadlocks itself with the transactional table cleanups http://ci.hive.apache.org/job/hive-flaky-check/119/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24162) Query based compaction looses bloom filter
Peter Varga created HIVE-24162: -- Summary: Query based compaction looses bloom filter Key: HIVE-24162 URL: https://issues.apache.org/jira/browse/HIVE-24162 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga *Steps to reproduce:* {noformat} ++ | createtab_stmt | ++ | CREATE TABLE `bloomTest`( | | `msisdn` string, | | `imsi` varchar(20), | | `imei` bigint, | | `cell_id` bigint)| | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | | LOCATION | | 's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest' | | TBLPROPERTIES (| | 'bucketing_version'='2', | | 'orc.bloom.filter.columns'='msisdn,cell_id,imsi', | | 'orc.bloom.filter.fpp'='0.02', | | 'transactional'='true', | | 'transactional_properties'='default',| | 'transient_lastDdlTime'='1597222946')| ++ insert into bloomTest values ("a", "b", 10, 20); insert into bloomTest values ("aa", "bb", 100, 200); insert into bloomTest values ("aaa", "bbb", 1000, 2000); select * from bloomTest; +---+-+-++ | bloomtest.msisdn | bloomtest.imsi | bloomtest.imei | bloomtest.cell_id | +---+-+-++ | a | b | 10 | 20 | | aa| bb | 100 | 200| | aaa | bbb | 1000| 2000 | +---+-+-++ {noformat} - Compact the table {code:java} alter table bloomTest compact 'MAJOR'; {code} - Wait for the compaction to be over and check for bloom filters in dataset. - delta would have it, but not in the base dataset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23994) TestRetryable is unstable
Peter Varga created HIVE-23994: -- Summary: TestRetryable is unstable Key: HIVE-23994 URL: https://issues.apache.org/jira/browse/HIVE-23994 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Aasha Medhi The flaky test check run: [http://ci.hive.apache.org/job/hive-flaky-check/83/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23956) Delete delta directory file information should be pushed to execution side
Peter Varga created HIVE-23956: -- Summary: Delete delta directory file information should be pushed to execution side Key: HIVE-23956 URL: https://issues.apache.org/jira/browse/HIVE-23956 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Since HIVE-23840 LLAP cache is used to retrieve the tail of the ORC bucket files in the delete deltas, but to use the cache the fileId must be determined, so one more FileSystem call is issued for each bucket. This fileId is already available during compilation in the AcidState calculation, we should serialise this to the OrcSplit, and remove the unnecessary FS calls. Furthermore instead of sending the SyntheticFileId directly, we should pass the attemptId instead of the standard path hash, this way the path and the SyntheticFileId. can be calculated, and it will work even, if the move free delete operations will be introduced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork
Peter Varga created HIVE-23837: -- Summary: HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork Key: HIVE-23837 URL: https://issues.apache.org/jira/browse/HIVE-23837 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga If the FileSinkOperator's root operator is a MergeJoinWork the HbaseStorageHandler.configureJobConf will never get called, and the execution will miss the HBASE_AUTH_TOKEN and the hbase jars. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23819) Use ranges in ValidReadTxnList serialization
Peter Varga created HIVE-23819: -- Summary: Use ranges in ValidReadTxnList serialization Key: HIVE-23819 URL: https://issues.apache.org/jira/browse/HIVE-23819 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Time to time we see a case, when the open / aborted transaction count is high and often the aborted transactions come in continues ranges. When the transaction count goes high the serialization / deserialization to hive.txn.valid.txns conf gets slower and produces a large config value. Using ranges in the string representation can mitigate the issue somewhat. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23762) TestPigHBaseStorageHandler tests are flaky
Peter Varga created HIVE-23762: -- Summary: TestPigHBaseStorageHandler tests are flaky Key: HIVE-23762 URL: https://issues.apache.org/jira/browse/HIVE-23762 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Aasha Medhi Most likely caused by HIVE-23668 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge inserst
Peter Varga created HIVE-23725: -- Summary: ValidTxnManager snapshot outdating causing partial reads in merge inserst Key: HIVE-23725 URL: https://issues.apache.org/jira/browse/HIVE-23725 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga When the ValidTxnManager invalidates the snapshot during merge insert and starts to read committed transactions that were not committed when the query compilation happened, it can cause partial read problems if the committed transaction created new partition in the source or target table. The solution should be not only fix the snapshot but also recompile the query and acquire the locks again -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23715) Fix zookeeper ssl keystore password handling issues
Peter Varga created HIVE-23715: -- Summary: Fix zookeeper ssl keystore password handling issues Key: HIVE-23715 URL: https://issues.apache.org/jira/browse/HIVE-23715 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga In HIVE-23045 Zookeeper SSL communication support was introduced, but the password config for the keystore and truststore is not handled correctly is they are stored in jceks. Also the ZooKeeperTokenStore is not handling well the fallback to the global zookeeper configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases
Peter Varga created HIVE-23671: -- Summary: MSCK repair should handle transactional tables in certain usecases Key: HIVE-23671 URL: https://issues.apache.org/jira/browse/HIVE-23671 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Peter Varga Assignee: Peter Varga The MSCK REPAIR tool does not handle transactional tables too well. It can find and add new partitions the same way as for non-transactional tables, but since the writeId differences are not handled, the data can not read back from the new partitions. We could handle some usecases when the writeIds in the HMS and the underlying data are not conflicting. If the HMS does not contains allocated writes for the table we can seed the table with the writeIds read from the directory structrure. Real life use cases could be: * Copy data files from one cluster to another with different HMS, create the table and call MSCK REPAIR * If the HMS db is lost, recreate the table and call MSCK REPAIR -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23495) AcidUtils.getAcidState cleanup
Peter Varga created HIVE-23495: -- Summary: AcidUtils.getAcidState cleanup Key: HIVE-23495 URL: https://issues.apache.org/jira/browse/HIVE-23495 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga since HIVE-21225 there are two redundant implementation of the AcidUtils.getAcidState. The previous implementation (without the recursive listing) can be removed. Also the performance can be improved, by removing unnecessary fileStatus calls. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23413) Create a new config to skip all locks
Peter Varga created HIVE-23413: -- Summary: Create a new config to skip all locks Key: HIVE-23413 URL: https://issues.apache.org/jira/browse/HIVE-23413 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga >From time-to-time some query is blocked on locks which should not. To have a quick workaround for this we should have a config which the user can set in the session to disable acquiring/checking locks, so we can provide it immediately and then later investigate and fix the root cause. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23392) Metastore upgrade script TXN_LOCK_TBL rename inconsistency
Peter Varga created HIVE-23392: -- Summary: Metastore upgrade script TXN_LOCK_TBL rename inconsistency Key: HIVE-23392 URL: https://issues.apache.org/jira/browse/HIVE-23392 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga HIVE-23048 introduced a bug in the metastore upgrade scripts, by not renaming correctly the columns in TXN_LOCK_TBL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23341) Merge HMS ACID related calls to improve performance
Peter Varga created HIVE-23341: -- Summary: Merge HMS ACID related calls to improve performance Key: HIVE-23341 URL: https://issues.apache.org/jira/browse/HIVE-23341 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Peter Varga Assignee: Peter Varga It might be possible to merge multiple HMS calls to save performance. The following candidates are: * openTxns, getOpenTxns, to get the snapshot. * getLocks / getWriteIds? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23340) TxnHandler cleanup
Peter Varga created HIVE-23340: -- Summary: TxnHandler cleanup Key: HIVE-23340 URL: https://issues.apache.org/jira/browse/HIVE-23340 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga * Merge getOpenTxns and getOpenTxnInfo to avoid code duplication * Remove TxnStatus character constants and use the enum values -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23311) Fix ValidTxnManager regression
Peter Varga created HIVE-23311: -- Summary: Fix ValidTxnManager regression Key: HIVE-23311 URL: https://issues.apache.org/jira/browse/HIVE-23311 Project: Hive Issue Type: Bug Components: Locking Reporter: Peter Varga Assignee: Peter Varga During query execution if there are only shared lock tables, the txnList in the driverContext should always be considered valid. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23232) Fix flaky TestJdbcWithServiceDiscovery.testKillQueryWithDifferentServerZKTurnedOff
Peter Varga created HIVE-23232: -- Summary: Fix flaky TestJdbcWithServiceDiscovery.testKillQueryWithDifferentServerZKTurnedOff Key: HIVE-23232 URL: https://issues.apache.org/jira/browse/HIVE-23232 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga The test sometimes failed with error in the TEZ environment, most likely the root cause is that two MiniHS2 is using the same TEZ parallel. Sample Exception tExecute expected null, but was: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23207) Create integration tests for TxnManager for different rdbms metstores
Peter Varga created HIVE-23207: -- Summary: Create integration tests for TxnManager for different rdbms metstores Key: HIVE-23207 URL: https://issues.apache.org/jira/browse/HIVE-23207 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Create an integration test suite that runs tests for TxnManager with the metastore configured to use different kind of RDBMS-s. Use the different DatabaseRule-s defined in the standalone-metastore for docker environments, and use the real init schema for every database type instead of the hardwired TxnDbUtil.prepDb. This test will be useful for easy manual validation of schema changes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23084) Implement kill query in multiple HS2 environment
Peter Varga created HIVE-23084: -- Summary: Implement kill query in multiple HS2 environment Key: HIVE-23084 URL: https://issues.apache.org/jira/browse/HIVE-23084 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Peter Varga Assignee: Peter Varga KILL command was implemented in: * https://issues.apache.org/jira/browse/HIVE-17483 * https://issues.apache.org/jira/browse/HIVE-20549 But it is not working in an environment where service discovery is enabled and more than one HS2 instance is running (except for manually sending the kill query to all HS2 instance). Solution: * If a HS2 instance can't kill a query locally, it should post a kill query request to the Zookeeper * Every HS2 should watch the Zookeeper for kill query requests and if its running on that instance kill it * Authorization of kill query should work the same -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23045) Zookeeper SSL/TLS support
Peter Varga created HIVE-23045: -- Summary: Zookeeper SSL/TLS support Key: HIVE-23045 URL: https://issues.apache.org/jira/browse/HIVE-23045 Project: Hive Issue Type: Improvement Components: HiveServer2, JDBC, Metastore Reporter: Peter Varga Assignee: Peter Varga Zookeeper 3.5.5 server can operate with SSL/TLS secure connection with its clients. [https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide] The SSL communication should be possible in the different part of HIVE, where it communicates with Zookeeper servers. The Zookeeper clients are used in the following places: * HiveServer2 PrivilegeSynchronizer * HiveServer2 register/remove server from Zookeeper * HS2ActivePassiveHARegistryClient * ZooKeeperHiveLockManager * LLapZookeeperRegistryImpl * TezAmRegistryImpl * WebHCat ZooKeeperStorage * JDBC Driver server lookup * Metastore - ZookeeperTokenStore * Metastore register/remove server from Zookeeper The flag to enable SSL communication and the required parameters should be provided by different configuration parameters, corresponding the different use cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23019) Fix TestTxnCommandsForMmTable test case
Peter Varga created HIVE-23019: -- Summary: Fix TestTxnCommandsForMmTable test case Key: HIVE-23019 URL: https://issues.apache.org/jira/browse/HIVE-23019 Project: Hive Issue Type: Bug Components: Test Reporter: Peter Varga Assignee: Peter Varga TestTxnCommandsForMmTable.testInsertOverwriteForPartitionedMmTable was fixed in HIVE-19084 to avoid being dependent on the order of the element returned by FileSystem.listStatus. However the fix introduced a new bug, as now the assertion for the base directory name doesn't run for the second partition, instead it runs twice for the first one. -- This message was sent by Atlassian Jira (v8.3.4#803005)