[jira] [Created] (HIVE-22617) Re-Enable PreCommit test org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
Oliver Draese created HIVE-22617: Summary: Re-Enable PreCommit test org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 Key: HIVE-22617 URL: https://issues.apache.org/jira/browse/HIVE-22617 Project: Hive Issue Type: Test Reporter: Oliver Draese The test was disabled via HIVE-22616 because it was flaky. If the test is considered valid, it needs to be fixed and re-enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22616) Disable PreCommit test org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
Oliver Draese created HIVE-22616: Summary: Disable PreCommit test org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 Key: HIVE-22616 URL: https://issues.apache.org/jira/browse/HIVE-22616 Project: Hive Issue Type: Test Components: Hive Reporter: Oliver Draese Assignee: Oliver Draese The test is flaky and produces following errors: {{java.io.FileNotFoundException: Source '/home/hiveptest/34.69.225.53-hiveptest-2/apache-github-source-source/itests/hive-unit/target/junit-qfile-results/clientpositive/join2.q.out' does not exist}} {{ at org.apache.commons.io.FileUtils.checkFileRequirements(FileUtils.java:1383)}} {{ at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:1060)}} {{ at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:1028)}} {{ at org.apache.hadoop.hive.ql.QOutProcessor.maskPatterns(QOutProcessor.java:162)}} {{ at org.apache.hadoop.hive.ql.QTestUtil.checkCliDriverResults(QTestUtil.java:932)}} {{ at org.apache.hadoop.hive.ql.QTestRunnerUtils.queryListRunnerMultiThreaded(QTestRunnerUtils.java:152)}} {{ at org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1(TestMTQueries.java:55)}} {{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}} {{ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}} {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} {{ at java.lang.reflect.Method.invoke(Method.java:498)}} {{ at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)}} {{ at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)}} {{ at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)}} {{ at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)}} {{ at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)}} {{ at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)}} {{ at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)}} {{ at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)}} {{ at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)}} {{ at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)}} {{ at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)}} {{ at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)}} {{ at org.junit.runners.ParentRunner.run(ParentRunner.java:309)}} {{ at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)}} {{ at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)}} {{ at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)}} {{ at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)}} {{ at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)}} {{ at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)}} {{ at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)}} {{ at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22581) Add tests for dynamic semijoins reduction
Oliver Draese created HIVE-22581: Summary: Add tests for dynamic semijoins reduction Key: HIVE-22581 URL: https://issues.apache.org/jira/browse/HIVE-22581 Project: Hive Issue Type: Test Components: Query Planning Reporter: Oliver Draese Assignee: Jesus Camacho Rodriguez There don't seem to be tests for the TezCompiler. A test suite should be added and a test for the error scenario of HIVE-22572 should be implemented there. This is a follow-up action for: https://issues.apache.org/jira/browse/HIVE-22572 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22572) NullPointerException when using dynamic semijoin reduction
Oliver Draese created HIVE-22572: Summary: NullPointerException when using dynamic semijoin reduction Key: HIVE-22572 URL: https://issues.apache.org/jira/browse/HIVE-22572 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Oliver Draese Assignee: Oliver Draese Fix For: 4.0.0 From HS2 logs Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1541) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] at org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:471) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:182) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:148) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12487) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:360) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1869) ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22117) Clean up RuntimeException code in AMReporter
Oliver Draese created HIVE-22117: Summary: Clean up RuntimeException code in AMReporter Key: HIVE-22117 URL: https://issues.apache.org/jira/browse/HIVE-22117 Project: Hive Issue Type: Bug Components: llap Affects Versions: 3.1.1 Reporter: Oliver Draese Assignee: Oliver Draese The AMReporter of LLAP throws RuntimExceptions from within addTaskAttempt and removeTaskAttempt. These can cause LLAP to come down. As an interims fix (see HIVE-22113), the RuntimeException of removeTaskAttemp is caught from within TaskRunnerCallable, preventing LLAP termination if a killed task is not found in AMReporter. Ideally, we would just log this on removeTask (a gone task is a gone task) and have a checked exception in addTaskAttempt. If the checkedException is caught, we should fail the task attempt (as there is already an attempt with this ID running). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22113) Prevent LLAP shutdown on AMReporter related RuntimeException
Oliver Draese created HIVE-22113: Summary: Prevent LLAP shutdown on AMReporter related RuntimeException Key: HIVE-22113 URL: https://issues.apache.org/jira/browse/HIVE-22113 Project: Hive Issue Type: Bug Components: llap Affects Versions: 3.1.1 Reporter: Oliver Draese Assignee: Oliver Draese If a task attempt cannot be removed from AMReporter (i.e. task attempt was not found), the AMReporter throws a RuntimeException. This exception is not caught and trickles up, causing an LLAP shutdown: {{2019-08-08T23:34:39,748[Wait-Queue-Scheduler-0()]:[Wait-Queue-Scheduler-0,5,main]}}{{java.lang.RuntimeException:_1563528877295_18872_3728_01_03_0't}}{{ at$AMNodeInfo.removeTaskAttempt(AMReporter.java:524)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{ at(AMReporter.java:243)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{ at(TaskRunnerCallable.java:384)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{ at(TaskExecutorService.java:739)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{ at$1100(TaskExecutorService.java:91)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{ at$WaitQueueWorker.run(TaskExecutorService.java:396)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{ at$RunnableAdapter.call(Executors.java:511)~[?:1.8.0_161]}}{{ at$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{ at(InterruptibleTask.java:41)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{ at(TrustedListenableFutureTask.java:77)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{ at(ThreadPoolExecutor.java:1149)[?:1.8.0_161]}}{{ at$Worker.run(ThreadPoolExecutor.java:624)[?:1.8.0_161]}}{{ at(Thread.java:748)[?:1.8.0_161]}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21785) Add task queue/runtime stats per LLAP daemon to output
Oliver Draese created HIVE-21785: Summary: Add task queue/runtime stats per LLAP daemon to output Key: HIVE-21785 URL: https://issues.apache.org/jira/browse/HIVE-21785 Project: Hive Issue Type: Improvement Components: llap Affects Versions: 3.1.1 Reporter: Oliver Draese Assignee: Oliver Draese Fix For: 3.1.1 There are several scenarios, where we want to investigate if a particular LLAP daemon is performing faster or slower than the others in the cluster. In these scenarios, it is specifically important to figure out if tasks spent significant time, waiting for an available executor (queued) vs. on the execution itself. Also, a skew in task-to-daemon assignment is interesting. This patch adds these statistics to the TezCounters and therefore to the job output on a per LLAP daemon base. Here is an example. {{INFO : LlapTaskRuntimeAgg by daemon:}} {{INFO : Count-host-1.example.com: 41}} {{INFO : Count-host-2.example.com: 39}} {{INFO : Count-host-3.example.com: 45}} {{INFO : QueueTime-host-1.example.com: 51437776}} {{INFO : QueueTime-host-2.example.com: 35758306}} {{INFO : QueueTime-host-3.example.com: 47168327}} {{INFO : RunTime-host-1.example.com: 165151539295}} {{INFO : RunTime-host-2.example.com: 141729193528}} {{INFO : RunTime-host-3.example.com: 166876988771}} The "Count-" are simple task counts for the appended host name (LLAP daemon) The "QueueTime-" values tell, how long tasks waited in the TaskExecutorService's queue before getting actually executed. The "RunTime-" values cover the time from execution start to finish (where finish can either be successful execution or a killed/failed execution). For the new counts to appear in the output, both - the preexisting hive.tez.exec.print.summary and the new hive.llap.task.time.print.summary have to be set to true. {{}} {{ hive.tez.exec.print.summary}} {{ true}} {{}} {{}} {{ hive.llap.task.time.print.summary}} {{ true}} {{}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21493) BuddyAllocator - Metrics count for allocated arenas wrong if preallocation is done
Oliver Draese created HIVE-21493: Summary: BuddyAllocator - Metrics count for allocated arenas wrong if preallocation is done Key: HIVE-21493 URL: https://issues.apache.org/jira/browse/HIVE-21493 Project: Hive Issue Type: Bug Components: llap Affects Versions: 3.1.1 Reporter: Oliver Draese Assignee: Oliver Draese Fix For: 4.0.0 The (Hadoop/JMX) metrics are not correctly initialized if arena preallocation is done and the arena count is greater than 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21422) Add metrics to LRFU cache policy
Oliver Draese created HIVE-21422: Summary: Add metrics to LRFU cache policy Key: HIVE-21422 URL: https://issues.apache.org/jira/browse/HIVE-21422 Project: Hive Issue Type: Improvement Components: llap Affects Versions: 4.0.0 Reporter: Oliver Draese Assignee: Oliver Draese Fix For: 4.0.0 The LRFU cache policy for the LLAP data cache doesn't provide enough insight to figure out, what is cached and why something might get evicted. This ticket is used to add Hadoop metrics 2 information (accessible via JMX) to the LRFU policy, providing following information: * How much memory is cached for data buffers * How much memory is cached for meta data buffers * How large is the min-heap of the cache policy * How long is the eviction short list (linked list) * How much memory is currently "locked" (buffers with positive reference count) and therefore in use by a query These new counters are found in the MX bean, following this path: Hadoop/LlapDaemon/LowLevelLrfuCachePolicy- -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21221) Make HS2 and LLAP consistent - Bring up LLAP WebUI in test mode if WebUI port is configured
Oliver Draese created HIVE-21221: Summary: Make HS2 and LLAP consistent - Bring up LLAP WebUI in test mode if WebUI port is configured Key: HIVE-21221 URL: https://issues.apache.org/jira/browse/HIVE-21221 Project: Hive Issue Type: Improvement Components: llap Reporter: Oliver Draese Assignee: Oliver Draese When HiveServer2 comes up, it skips the start of the WebUI if 1) hive.in.test is set to true AND 2) the WebUI port is not specified or default (hive.server2.webui.port) Right now, on LLAP daemon start, it is only checked if hive is in test (condition 1) above. The LLAP Daemon start up code (to skip WebUI creation) should be consistent with HS2, therefore if a port is specified (other than the default), the WebUI should also be started in test mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21204) Instrumentation for read/write locks in LLAP
Oliver Draese created HIVE-21204: Summary: Instrumentation for read/write locks in LLAP Key: HIVE-21204 URL: https://issues.apache.org/jira/browse/HIVE-21204 Project: Hive Issue Type: Improvement Components: llap Reporter: Oliver Draese Assignee: Oliver Draese LLAP has several R/W locks for serialization of updates to query tracker, file data, Instrumentation is added to monitor the * total amount of R/W locks within a particular category * average + max wait/suspension time to get the R/W lock A category includes all lock instances for particular areas (i.e. category is FileData and all R/W locks that are used in FileData instances are accounted within the one category). The monitoring/accounting is done via Hadoop Metrics 2, making them accessible via JMX. In addition, a new "locking" GET endpoint is added to the LLAP daemon's REST interface. It produces output like the following example: {{{}} {{ "statsCollection": "enabled",}} {{ "lockStats": [}} {{ {}}{{ "type": "R/W Lock Stats",}} {{ "label": "FileData",}} {{ "totalLockWaitTimeMillis": 0,}} {{ "readLock": {}} {{ "count": 0,}} {{ "avgWaitTimeNanos": 0,}} {{ "maxWaitTimeNanos": 0}} {{ },}} {{ "writeLock": {}} {{ "count": 0,}} {{ "avgWaitTimeNanos": 0,}} {{ "maxWaitTimeNanos": 0}} {{ }}} {{ },}} {{ { "}}{{type": "R/W Lock Stats",}} {{ "label": "QueryTracker",}} {{ "totalLockWaitTimeMillis": 0,}} {{ "readLock": {}} {{ "count": 0,}} {{ "avgWaitTimeNanos": 0,}} {{ "maxWaitTimeNanos": 0}} {{ },}} {{ "writeLock": {}} {{ "count": 0,}} {{ "avgWaitTimeNanos": 0,}} {{ "maxWaitTimeNanos": 0}} {{ }}} {{ } }}{{]}} {{ }}} {{}}} To avoid the overhead of lock instrumentation, lock metrics collection is disabled by default and can be enabled via the following configuration parameter: {{hive.llap.lockmetrics.collect = true}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21183) Interrupt wait time for FileCacheCleanupThread
Oliver Draese created HIVE-21183: Summary: Interrupt wait time for FileCacheCleanupThread Key: HIVE-21183 URL: https://issues.apache.org/jira/browse/HIVE-21183 Project: Hive Issue Type: Improvement Components: llap Reporter: Oliver Draese Assignee: Oliver Draese The FileCacheCleanupThread is waiting unnecessarily long for eviction counts to increment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20773) Query result cache might contain stale MV data
Oliver Draese created HIVE-20773: Summary: Query result cache might contain stale MV data Key: HIVE-20773 URL: https://issues.apache.org/jira/browse/HIVE-20773 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Oliver Draese Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v7.6.3#76005)