[jira] [Created] (HIVE-23743) hive-druid-handler shaded jar doesn't include maven-artifact classes
Hankó Gergely created HIVE-23743: Summary: hive-druid-handler shaded jar doesn't include maven-artifact classes Key: HIVE-23743 URL: https://issues.apache.org/jira/browse/HIVE-23743 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Hankó Gergely Assignee: Nishant Bangarwa hive-druid-handler depends on druid-processing jar that depends on classes from maven-artifact jar but these classes are not included in the shaded jar so the following Exception may occur: {code:java} java.lang.ClassNotFoundException: org.apache.maven.artifact.versioning.ArtifactVersion at ... org.apache.hive.druid.org.apache.druid.query.ordering.StringComparators.(StringComparators.java:44) at org.apache.hive.druid.org.apache.druid.query.ordering.StringComparator.fromString(StringComparator.java:35) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23744) Reduce query startup latency
Mustafa Iman created HIVE-23744: --- Summary: Reduce query startup latency Key: HIVE-23744 URL: https://issues.apache.org/jira/browse/HIVE-23744 Project: Hive Issue Type: Task Components: llap Affects Versions: 4.0.0 Reporter: Mustafa Iman Assignee: Mustafa Iman Attachments: am_schedule_and_transmit.png, task_start.png When I run queries with large number of tasks for a single vertex, I see a significant delay before all tasks start execution in llap daemons. Although llap daemons have the free capacity to run the tasks, it takes a significant time to schedule all the tasks in AM and actually transmit them to executors. "am_schedule_and_transmit" shows scheduling of tasks of tpcds query 55. It shows only the tasks scheduled for one of 10 llap daemons. The scheduler works in a single thread, scheduling tasks one by one. A delay in scheduling of one task, delays all the tasks. !am_schedule_and_transmit.png|width=831,height=573! Another issue is that it takes long time to fill all the execution slots in llap daemons even though they are all empty initially. This is caused by LlapTaskCommunicator using a fixed number of threads (10 by default) to send the tasks to daemons. Also this communication is synchronized so these threads block communication staying idle. "task_start.png" shows running tasks on an llap daemon that has 12 execution slots. By the time 12th task starts running, more than 100ms already passes. That slot stays idle all this time. !task_start.png|width=1166,height=635! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23747) Increase the number of parallel tasks sent to daemons from am
Mustafa Iman created HIVE-23747: --- Summary: Increase the number of parallel tasks sent to daemons from am Key: HIVE-23747 URL: https://issues.apache.org/jira/browse/HIVE-23747 Project: Hive Issue Type: Sub-task Reporter: Mustafa Iman Assignee: Mustafa Iman The number of inflight tasks from AM to a single executor is hardcoded to 1 currently([https://github.com/apache/hive/blob/master/llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java#L57] ). It does not make sense to increase this right now as communication between am and daemons happen synchronously anyway. After resolving https://issues.apache.org/jira/browse/HIVE-23746 this must be increased to at least number of execution slots per daemon. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23746) Send task attempts async from AM to daemons
Mustafa Iman created HIVE-23746: --- Summary: Send task attempts async from AM to daemons Key: HIVE-23746 URL: https://issues.apache.org/jira/browse/HIVE-23746 Project: Hive Issue Type: Sub-task Components: llap Reporter: Mustafa Iman Assignee: Mustafa Iman LlapTaskCommunicator uses sync client to send task attempts. There are fixed number of communication threads (10 by default). This causes unneccessary delays when there are enough free execution slots in daemons but they do not receive all the tasks because of this bottleneck. LlapTaskCommunicator can use an async client to pass these tasks to daemons. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23745) Avoid copying userpayload in task communicator
Mustafa Iman created HIVE-23745: --- Summary: Avoid copying userpayload in task communicator Key: HIVE-23745 URL: https://issues.apache.org/jira/browse/HIVE-23745 Project: Hive Issue Type: Sub-task Reporter: Mustafa Iman Assignee: Mustafa Iman [https://github.com/apache/hive/blob/master/llap-common/src/java/org/apache/hadoop/hive/llap/tez/Converters.java#L182] I see this copy take a few milliseconds sometimes. Delay here adds up for all tasks of a single vertex in LlapTaskCommunicator as it processes tasks one by one. User payload never changes in this codepath. Copy is made because of limitations of Protobuf library. Protobuf adds a UnsafeByteOperations class that avoid copying of ByteBuffers in 3.1 version. This can be resolved when Protobuf is upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix
wanguangping created HIVE-23748: --- Summary: tez task with File Merge operator generate tmp file with wrong suffix Key: HIVE-23748 URL: https://issues.apache.org/jira/browse/HIVE-23748 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 3.1.0 Reporter: wanguangping h1. background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log [^hiveserver2 log.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23749) Improve Hive Hook Documentation
liuyan created HIVE-23749: - Summary: Improve Hive Hook Documentation Key: HIVE-23749 URL: https://issues.apache.org/jira/browse/HIVE-23749 Project: Hive Issue Type: Improvement Components: Documentation Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0 Reporter: liuyan h5. Seems we have little documentation/examples arounnd _"org.apache.hadoop.hive.ql.hooks"_ on how to develop a hook and use it with hive.exec.post.hooks hive.exec.pre.hooks hive.exec.failure.hooks hive.query.lifetime.hooks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23738) DBLockManager::lock() : Move lock request to debug level
Rajesh Balamohan created HIVE-23738: --- Summary: DBLockManager::lock() : Move lock request to debug level Key: HIVE-23738 URL: https://issues.apache.org/jira/browse/HIVE-23738 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java#L102] For Q78 @30TB scale, it ends up dumping couple of MBs of log in info level to print the lock request type. If possible, this should be moved to debug level. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced
Krisztian Kasa created HIVE-23736: - Summary: Disable topn in ReduceSinkOp if a TNK is introduced Key: HIVE-23736 URL: https://issues.apache.org/jira/browse/HIVE-23736 Project: Hive Issue Type: Improvement Components: Physical Optimizer Reporter: Krisztian Kasa Assignee: Krisztian Kasa Both the Reduce Sink and TopNKey operator has Top n key filtering functionality. If TNK is introduced this functionality is done twice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
Syed Shameerur Rahman created HIVE-23737: Summary: LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete Key: HIVE-23737 URL: https://issues.apache.org/jira/browse/HIVE-23737 Project: Hive Issue Type: Improvement Environment: *strong text* Reporter: Syed Shameerur Rahman Assignee: Syed Shameerur Rahman LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez have added support for dagDelete in custom shuffle handler (TEZ-3362) we could re-use that feature in LLAP. There are some added advantages of using Tez's dagDelete feature rather than the current LLAP's dagDelete feature. 1) We can easily extend this feature to accommodate the upcoming features such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 and TEZ-4129 2) It will be more easier to maintain this feature by separating it out from the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23741) Store CacheTags in the file cache level
Antal Sinkovits created HIVE-23741: -- Summary: Store CacheTags in the file cache level Key: HIVE-23741 URL: https://issues.apache.org/jira/browse/HIVE-23741 Project: Hive Issue Type: Improvement Reporter: Antal Sinkovits Assignee: Antal Sinkovits CacheTags are currently stored for every data buffer. The strings are internalized, but the number of cache tag objects can be reduced by moving them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23739) ShuffleHandler: Unordered partitioned data could be evicted immediately after transfer
Rajesh Balamohan created HIVE-23739: --- Summary: ShuffleHandler: Unordered partitioned data could be evicted immediately after transfer Key: HIVE-23739 URL: https://issues.apache.org/jira/browse/HIVE-23739 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L1019] Not optimal for unordered partitioned, as it ends up evicting the data immediately. (E.g Q78) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23740) [Hive]delete from ; without where clause not giving correct error msg
ABHISHEK KUMAR GUPTA created HIVE-23740: --- Summary: [Hive]delete from ; without where clause not giving correct error msg Key: HIVE-23740 URL: https://issues.apache.org/jira/browse/HIVE-23740 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.0 Reporter: ABHISHEK KUMAR GUPTA Created Hive table from Hive Inserted data and fire delete from ; CREATE TABLE insert_only (key int, value string) STORED AS ORC TBLPROPERTIES ("transactional"="true", "transactional_properties"="insert_only"); INSERT INTO insert_only VALUES (13,'BAD'), (14,'SUCCESS'); delete from insert_only; Error throws: Error: Error while compiling statement: FAILED: SemanticException [Error 10414]: Attempt to do update or delete on table hive.insert_only that is insert-only transactional (state=42000,code=10414) Expectation: Should throw as where clause is missing because to delete all the content of table hive provides truncate table ; -- This message was sent by Atlassian Jira (v8.3.4#803005)
Hive TPC-DS metastore dumps in Postgres
Hey guys, I put up a small project on GitHub [1] with Hive metastore dumps from tpcds10tb/tpcds30tb (+partitioning) and some scripts to quickly spin up a dockerized Postgres with those loaded. Personally, I find it useful to check the plans of TPC-DS queries using the usual qtest mechanism (without external tools and tapping into a real cluster) having at hand beefy stats + partitioning info. The driver and other changes needed to run these tests are located in [2]. I am sharing it here in case it might be of use to somebody else. The two main commands that you will need if you wanna try this out: docker build --tag postgres-tpcds-metastore:1.0 . mvn test -Dtest=TestTezPerfDBCliDriver -Dtest.output.overwrite=true -Dtest.metastore.db=postgres.tpcds Small caveat: Currently in [2] the dockerized postgres is restarted for every query which makes things slow. This will be fixed later on. Best, Stamatis [1] https://github.com/zabetak/hive-postgres-metastore [2] https://github.com/zabetak/hive/tree/qtest_postgres_driver
[jira] [Created] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests
Stamatis Zampetakis created HIVE-23742: -- Summary: Remove unintentional execution of TPC-DS query39 in qtests Key: HIVE-23742 URL: https://issues.apache.org/jira/browse/HIVE-23742 Project: Hive Issue Type: Task Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis TPC-DS queries under clientpositive/perf are meant only to check plan regressions so they should never be really executed thus the execution part should be removed from query39.q and cbo_query39.q -- This message was sent by Atlassian Jira (v8.3.4#803005)