[jira] [Created] (HIVE-22528) Bloom Filter not showing up in Explain plan
Subhajit Sinha created HIVE-22528: - Summary: Bloom Filter not showing up in Explain plan Key: HIVE-22528 URL: https://issues.apache.org/jira/browse/HIVE-22528 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.0 Environment: Test Environment. Reporter: Subhajit Sinha Hi Team, We are using Hive version (Apache Hive (version 3.1.0.3.1.0.0-78) and trying to implement Bloom filter in it. So basically I have created a managed table with table properties defined as: 'orc.bloom.filter.columns'='***', 'orc.bloom.filter.fpp'='0.05', 'orc.stripe.size'='268435456', and stored it as orc file. While checking the explain plan(running: explain select count(1) from the_table where ) in the current Hive version, I couldn't see anything as "Bloom_Filter" in the Plan provided by the CBO. The table I'm querying data in has records. I have a few doubts: # Is Hive 3.1 version not using Bloom filter? If so, I have queried a normal table with same query and condition have seen that it takes more time compared to a table having Bloom filter defined on the column that has condition. # Is there any parameter that needs to be set to get the value/ Bloom filter in the table? # I have come across three parameters, please let me know what does these signify : h5. hive.tez.max.bloom.filter.entries,hive.tez.min.bloom.filter.entries,hive.tez.bloom.filter.factor Please let me know if anyone has used Bloom filter. Let me know then the process -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22527) Hive on Tez : Job of merging samll files will be submitted into another queue (default queue)
zhangbutao created HIVE-22527: - Summary: Hive on Tez : Job of merging samll files will be submitted into another queue (default queue) Key: HIVE-22527 URL: https://issues.apache.org/jira/browse/HIVE-22527 Project: Hive Issue Type: Bug Affects Versions: 3.1.1, 3.1.0 Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22526) Extract Compiler from Driver
Miklos Gergely created HIVE-22526: - Summary: Extract Compiler from Driver Key: HIVE-22526 URL: https://issues.apache.org/jira/browse/HIVE-22526 Project: Hive Issue Type: Sub-task Components: Hive Reporter: Miklos Gergely Assignee: Miklos Gergely Fix For: 4.0.0 The Driver class contains ~600 lines of code responsible for compiling the command. That means that from the command String a Plan needs to be created, and also a transaction needs to be started (in most of the cases). This is a thing done by the compile function, which has a lot of sub functions to help this task, while itself is also really big. All these codes should be put into a separate class, where it can do it's job without getting mixed with the other codes in the Driver. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22525) Refactor HiveOpConverter
Miklos Gergely created HIVE-22525: - Summary: Refactor HiveOpConverter Key: HIVE-22525 URL: https://issues.apache.org/jira/browse/HIVE-22525 Project: Hive Issue Type: Improvement Components: Hive Reporter: Miklos Gergely Assignee: Miklos Gergely Fix For: 4.0.0 HiveOpConverter is on it's way to become a monster class. It is already ~1300 lines long, and expected to grow. It should be refactored, cut into multiple classes in a reasonable way. It is a natural way to do this is to create separate visitor classes for the different RelNodes, which are already handled in different functions within HiveOpConverter. That way HiveOpConverter can be the dispatcher among those visitor classes, while each of them are handling some specific work, potentially requesting sub nodes to be dispatched by HiveOpConverter. The functions used by multiple visitors should be put into some utility class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71792/ --- (Updated Nov. 21, 2019, 5:35 p.m.) Review request for hive, Laszlo Pinter and Peter Vary. Bugs: HIVE-21917 https://issues.apache.org/jira/browse/HIVE-21917 Repository: hive-git Description --- The Initiator thread in the metastore repeatedly loops over entries in the COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might need to be compacted. However, entries are never removed from this table except by a completed Compactor run. In a cluster where most tables / partitions are write-once read-many, this results in stale entries in this table never being cleaned up. In a small test cluster, we have observed approximately 45k entries in this table (virtually equal to the number of partitions in the cluster) while < 100 of these tables have delta files at all. Since most of the tables will never get enough writes to trigger a compaction (and in fact have only ever been written to once), the initiator thread keeps trying to evaluate them on every loop. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 610cf05204 ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java b28b57779b standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java 8253ccb9c9 standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 268038795b standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java e840758c9d Diff: https://reviews.apache.org/r/71792/diff/2/ Changes: https://reviews.apache.org/r/71792/diff/1-2/ Testing --- Unit tests Thanks, Denys Kuzmenko
Review Request 71801: The error handler in LlapRecordReader might block if its queue is full
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71801/ --- Review request for hive, Laszlo Bodor, Panos Garefalakis, and Slim Bouguerra. Bugs: HIVE-22523 https://issues.apache.org/jira/browse/HIVE-22523 Repository: hive-git Description --- In setError() we set the value of an atomic reference (pendingError) and we also put the error in a queue. The latter seems not just unnecessary but it might block the caller of the handler if the queue is full. Also closing the reader might not properly handled as some of the flags are not volatile. Diffs - llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java 77966aa9650 Diff: https://reviews.apache.org/r/71801/diff/1/ Testing --- q tests Thanks, Attila Magyar
Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
> On Nov. 20, 2019, 3:19 p.m., Denys Kuzmenko wrote: > > Not ready. Need to handle aborted and currently active compactions. > > Denys Kuzmenko wrote: > Handling above cases would complicate the Initiator logic and make > preliminare check longer. Not sure how critial it is that in case of > unsuccessful compaction attempt, on next run we won't retry unless there is > some change to the selected table/partiotion. Any thoughts on this? Changed findPotentialCompactions query to: select distinct ctc_database, ctc_table, ctc_partition from COMPLETED_TXN_COMPONENTS where (select CC_STATE from COMPLETED_COMPACTIONS where ctc_database = CC_DATABASE and ctc_table = CC_TABLE and (ctc_partition is null or ctc_partition = cc_partition) order by cc_id desc limit 1) IN ('a', 'f') || ctc_timestamp < current_timestamp however this still won't cover skipped compactions due to already running one - Denys --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71792/#review218723 --- On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71792/ > --- > > (Updated Nov. 20, 2019, 12:20 p.m.) > > > Review request for hive, Laszlo Pinter and Peter Vary. > > > Bugs: HIVE-21917 > https://issues.apache.org/jira/browse/HIVE-21917 > > > Repository: hive-git > > > Description > --- > > The Initiator thread in the metastore repeatedly loops over entries in the > COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might > need to be compacted. However, entries are never removed from this table > except by a completed Compactor run. > > In a cluster where most tables / partitions are write-once read-many, this > results in stale entries in this table never being cleaned up. In a small > test cluster, we have observed approximately 45k entries in this table > (virtually equal to the number of partitions in the cluster) while < 100 of > these tables have delta files at all. Since most of the tables will never get > enough writes to trigger a compaction (and in fact have only ever been > written to once), the initiator thread keeps trying to evaluate them on every > loop. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > 610cf05204 > > ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java > b28b57779b > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java > 8253ccb9c9 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 6281208247 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java > e840758c9d > > > Diff: https://reviews.apache.org/r/71792/diff/1/ > > > Testing > --- > > Unit tests > > > Thanks, > > Denys Kuzmenko > >
Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
> On Nov. 20, 2019, 3:19 p.m., Denys Kuzmenko wrote: > > Not ready. Need to handle aborted and currently active compactions. Handling above cases would complicate the Initiator logic and make preliminare check longer. Not sure how critial it is that in case of unsuccessful compaction attempt, on next run we won't retry unless there is some change to the selected table/partiotion. Any thoughts on this? - Denys --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71792/#review218723 --- On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71792/ > --- > > (Updated Nov. 20, 2019, 12:20 p.m.) > > > Review request for hive, Laszlo Pinter and Peter Vary. > > > Bugs: HIVE-21917 > https://issues.apache.org/jira/browse/HIVE-21917 > > > Repository: hive-git > > > Description > --- > > The Initiator thread in the metastore repeatedly loops over entries in the > COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might > need to be compacted. However, entries are never removed from this table > except by a completed Compactor run. > > In a cluster where most tables / partitions are write-once read-many, this > results in stale entries in this table never being cleaned up. In a small > test cluster, we have observed approximately 45k entries in this table > (virtually equal to the number of partitions in the cluster) while < 100 of > these tables have delta files at all. Since most of the tables will never get > enough writes to trigger a compaction (and in fact have only ever been > written to once), the initiator thread keeps trying to evaluate them on every > loop. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > 610cf05204 > > ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java > b28b57779b > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java > 8253ccb9c9 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 6281208247 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java > e840758c9d > > > Diff: https://reviews.apache.org/r/71792/diff/1/ > > > Testing > --- > > Unit tests > > > Thanks, > > Denys Kuzmenko > >
[jira] [Created] (HIVE-22524) CommandProcessorException should utilize standard Exception fields
Zoltan Haindrich created HIVE-22524: --- Summary: CommandProcessorException should utilize standard Exception fields Key: HIVE-22524 URL: https://issues.apache.org/jira/browse/HIVE-22524 Project: Hive Issue Type: Improvement Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich CommandProcessorException right now has: * getCause() inherited from Exception * getException() local implementation * getMessage() inherited from Exception * getErrorMessage() local implementation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full
Attila Magyar created HIVE-22523: Summary: The error handler in LlapRecordReader might block if its queue is full Key: HIVE-22523 URL: https://issues.apache.org/jira/browse/HIVE-22523 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In setError() we set the value of an atomic reference (pendingError) and we also put the error in a queue. The latter seems not just unnecessary but it might block the caller of the handler if the queue is full. Also closing of the reader is might not properly handled as some of the flags are not volatile. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 71784: HiveProtoLoggingHook might consume lots of memory
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/ --- (Updated Nov. 21, 2019, 9:40 a.m.) Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and Panos Garefalakis. Changes --- test might be flaky, ignore it until we find a better solution Bugs: HIVE-22514 https://issues.apache.org/jira/browse/HIVE-22514 Repository: hive-git Description --- HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced from the outside. If log events are generated at a very fast rate this queue can grow large. Since ScheduledThreadPoolExecutor does not support changing the default unbounded queue to a bounded one, the queue capacity is checked manually by the patch. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 8eab54859bf ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 450a0b544d6 Diff: https://reviews.apache.org/r/71784/diff/2/ Changes: https://reviews.apache.org/r/71784/diff/1-2/ Testing --- unittest Thanks, Attila Magyar