[jira] [Created] (HIVE-22617) Re-Enable PreCommit test org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1

2019-12-10 Thread Oliver Draese (Jira)
Oliver Draese created HIVE-22617:


 Summary: Re-Enable PreCommit test 
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
 Key: HIVE-22617
 URL: https://issues.apache.org/jira/browse/HIVE-22617
 Project: Hive
  Issue Type: Test
Reporter: Oliver Draese


The test was disabled via HIVE-22616 because it was flaky. If the test is 
considered valid, it needs to be fixed and re-enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22616) Disable PreCommit test org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1

2019-12-10 Thread Oliver Draese (Jira)
Oliver Draese created HIVE-22616:


 Summary: Disable PreCommit test 
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
 Key: HIVE-22616
 URL: https://issues.apache.org/jira/browse/HIVE-22616
 Project: Hive
  Issue Type: Test
  Components: Hive
Reporter: Oliver Draese
Assignee: Oliver Draese


The test is flaky and produces following errors:

{{java.io.FileNotFoundException: Source 
'/home/hiveptest/34.69.225.53-hiveptest-2/apache-github-source-source/itests/hive-unit/target/junit-qfile-results/clientpositive/join2.q.out'
 does not exist}}
{{ at 
org.apache.commons.io.FileUtils.checkFileRequirements(FileUtils.java:1383)}}
{{ at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:1060)}}
{{ at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:1028)}}
{{ at 
org.apache.hadoop.hive.ql.QOutProcessor.maskPatterns(QOutProcessor.java:162)}}
{{ at 
org.apache.hadoop.hive.ql.QTestUtil.checkCliDriverResults(QTestUtil.java:932)}}
{{ at 
org.apache.hadoop.hive.ql.QTestRunnerUtils.queryListRunnerMultiThreaded(QTestRunnerUtils.java:152)}}
{{ at 
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1(TestMTQueries.java:55)}}
{{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
{{ at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
{{ at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
{{ at java.lang.reflect.Method.invoke(Method.java:498)}}
{{ at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)}}
{{ at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)}}
{{ at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)}}
{{ at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)}}
{{ at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)}}
{{ at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)}}
{{ at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)}}
{{ at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)}}
{{ at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)}}
{{ at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)}}
{{ at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)}}
{{ at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)}}
{{ at org.junit.runners.ParentRunner.run(ParentRunner.java:309)}}
{{ at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)}}
{{ at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)}}
{{ at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)}}
{{ at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)}}
{{ at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)}}
{{ at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)}}
{{ at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)}}
{{ at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22581) Add tests for dynamic semijoins reduction

2019-12-04 Thread Oliver Draese (Jira)
Oliver Draese created HIVE-22581:


 Summary: Add tests for dynamic semijoins reduction
 Key: HIVE-22581
 URL: https://issues.apache.org/jira/browse/HIVE-22581
 Project: Hive
  Issue Type: Test
  Components: Query Planning
Reporter: Oliver Draese
Assignee: Jesus Camacho Rodriguez


There don't seem to be tests for the TezCompiler. A test suite should be added 
and a test for the error scenario of HIVE-22572 should be implemented there.

 

This is a follow-up action for:

https://issues.apache.org/jira/browse/HIVE-22572



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22572) NullPointerException when using dynamic semijoin reduction

2019-12-03 Thread Oliver Draese (Jira)
Oliver Draese created HIVE-22572:


 Summary: NullPointerException when using dynamic semijoin reduction
 Key: HIVE-22572
 URL: https://issues.apache.org/jira/browse/HIVE-22572
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Oliver Draese
Assignee: Oliver Draese
 Fix For: 4.0.0


From HS2 logs

Caused by: java.lang.NullPointerException
 at 
org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1541)
 ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]
 at 
org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:471)
 ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]
 at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:182)
 ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]
 at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:148) 
~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12487)
 ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:360)
 ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
 ~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664) 
~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1869) 
~[hive-exec-3.1.0.3.1.0.142-1.jar:3.1.0.3.1.0.142-1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22117) Clean up RuntimeException code in AMReporter

2019-08-15 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-22117:


 Summary: Clean up RuntimeException code in AMReporter
 Key: HIVE-22117
 URL: https://issues.apache.org/jira/browse/HIVE-22117
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 3.1.1
Reporter: Oliver Draese
Assignee: Oliver Draese


The AMReporter of LLAP throws RuntimExceptions from within addTaskAttempt and 
removeTaskAttempt. These can cause LLAP to come down.

As an interims fix (see HIVE-22113), the RuntimeException of removeTaskAttemp 
is caught from within TaskRunnerCallable, preventing LLAP termination if a 
killed task is not found in AMReporter.

Ideally, we would just log this on removeTask (a gone task is a gone task) and 
have a checked exception in addTaskAttempt. If the checkedException is caught, 
we should fail the task attempt (as there is already an attempt with this ID 
running).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22113) Prevent LLAP shutdown on AMReporter related RuntimeException

2019-08-14 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-22113:


 Summary: Prevent LLAP shutdown on AMReporter related 
RuntimeException
 Key: HIVE-22113
 URL: https://issues.apache.org/jira/browse/HIVE-22113
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 3.1.1
Reporter: Oliver Draese
Assignee: Oliver Draese


If a task attempt cannot be removed from AMReporter (i.e. task attempt was not 
found), the AMReporter throws a RuntimeException. This exception is not caught 
and trickles up, causing an LLAP shutdown:
{{2019-08-08T23:34:39,748[Wait-Queue-Scheduler-0()]:[Wait-Queue-Scheduler-0,5,main]}}{{java.lang.RuntimeException:_1563528877295_18872_3728_01_03_0't}}{{

at$AMNodeInfo.removeTaskAttempt(AMReporter.java:524)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{

at(AMReporter.java:243)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{

at(TaskRunnerCallable.java:384)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{

at(TaskExecutorService.java:739)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{

at$1100(TaskExecutorService.java:91)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{

at$WaitQueueWorker.run(TaskExecutorService.java:396)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{

at$RunnableAdapter.call(Executors.java:511)~[?:1.8.0_161]}}{{

at$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{

at(InterruptibleTask.java:41)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{

at(TrustedListenableFutureTask.java:77)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{

at(ThreadPoolExecutor.java:1149)[?:1.8.0_161]}}{{

at$Worker.run(ThreadPoolExecutor.java:624)[?:1.8.0_161]}}{{
at(Thread.java:748)[?:1.8.0_161]}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21785) Add task queue/runtime stats per LLAP daemon to output

2019-05-23 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-21785:


 Summary: Add task queue/runtime stats per LLAP daemon to output
 Key: HIVE-21785
 URL: https://issues.apache.org/jira/browse/HIVE-21785
 Project: Hive
  Issue Type: Improvement
  Components: llap
Affects Versions: 3.1.1
Reporter: Oliver Draese
Assignee: Oliver Draese
 Fix For: 3.1.1


There are several scenarios, where we want to investigate if a particular LLAP 
daemon is performing faster or slower than the others in the cluster. In these 
scenarios, it is specifically important to figure out if tasks spent 
significant time, waiting for an available executor (queued) vs. on the 
execution itself. Also, a skew in task-to-daemon assignment is interesting.

This patch adds these statistics to the TezCounters and therefore to the job 
output on a per LLAP daemon base. Here is an example.

{{INFO : LlapTaskRuntimeAgg by daemon:}}
{{INFO :    Count-host-1.example.com: 41}}
{{INFO :    Count-host-2.example.com: 39}}
{{INFO :    Count-host-3.example.com: 45}}
{{INFO :    QueueTime-host-1.example.com: 51437776}}
{{INFO :    QueueTime-host-2.example.com: 35758306}}
{{INFO :    QueueTime-host-3.example.com: 47168327}}
{{INFO :    RunTime-host-1.example.com: 165151539295}}
{{INFO :    RunTime-host-2.example.com: 141729193528}}
{{INFO :    RunTime-host-3.example.com: 166876988771}}

The "Count-" are simple task counts for the appended host name (LLAP daemon)

The "QueueTime-" values tell, how long tasks waited in the 
TaskExecutorService's queue before getting actually executed.

The "RunTime-" values cover the time from execution start to finish (where 
finish can either be successful execution or a killed/failed execution).

For the new counts to appear in the output, both - the preexisting 
hive.tez.exec.print.summary and the new hive.llap.task.time.print.summary have 
to be set to true.

 
{{}}
{{  hive.tez.exec.print.summary}}
{{  true}}
{{}}
{{}}
{{  hive.llap.task.time.print.summary}}
{{  true}}
{{}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21493) BuddyAllocator - Metrics count for allocated arenas wrong if preallocation is done

2019-03-22 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-21493:


 Summary: BuddyAllocator - Metrics count for allocated arenas wrong 
if preallocation is done
 Key: HIVE-21493
 URL: https://issues.apache.org/jira/browse/HIVE-21493
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 3.1.1
Reporter: Oliver Draese
Assignee: Oliver Draese
 Fix For: 4.0.0


The (Hadoop/JMX) metrics are not correctly initialized if arena preallocation 
is done and the arena count is greater than 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21422) Add metrics to LRFU cache policy

2019-03-11 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-21422:


 Summary: Add metrics to LRFU cache policy
 Key: HIVE-21422
 URL: https://issues.apache.org/jira/browse/HIVE-21422
 Project: Hive
  Issue Type: Improvement
  Components: llap
Affects Versions: 4.0.0
Reporter: Oliver Draese
Assignee: Oliver Draese
 Fix For: 4.0.0


The LRFU cache policy for the LLAP data cache doesn't  provide enough insight 
to figure out, what is cached and why something might get evicted. This ticket 
is used to add Hadoop metrics 2 information (accessible via JMX) to the LRFU 
policy, providing following information:
 * How much memory is cached for data buffers
 * How much memory is cached for meta data buffers
 * How large is the min-heap of the cache policy
 * How long is the eviction short list (linked list)
 * How much memory is currently "locked" (buffers with positive reference 
count) and therefore in use by a query

These new counters are found in the MX bean, following this path:

Hadoop/LlapDaemon/LowLevelLrfuCachePolicy-

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21221) Make HS2 and LLAP consistent - Bring up LLAP WebUI in test mode if WebUI port is configured

2019-02-05 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-21221:


 Summary: Make HS2 and LLAP consistent - Bring up LLAP WebUI in 
test mode if WebUI port is configured
 Key: HIVE-21221
 URL: https://issues.apache.org/jira/browse/HIVE-21221
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Oliver Draese
Assignee: Oliver Draese


When HiveServer2 comes up, it skips the start of the WebUI if
1) hive.in.test is set to true
AND
2) the WebUI port is not specified or default (hive.server2.webui.port)
 
Right now, on LLAP daemon start, it is only checked if hive is in test 
(condition 1) above.
 
The LLAP Daemon start up code (to skip WebUI creation) should be consistent 
with HS2, therefore if a port is specified (other than the default), the WebUI 
should also be started in test mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21204) Instrumentation for read/write locks in LLAP

2019-02-01 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-21204:


 Summary: Instrumentation for read/write locks in LLAP
 Key: HIVE-21204
 URL: https://issues.apache.org/jira/browse/HIVE-21204
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Oliver Draese
Assignee: Oliver Draese


LLAP has several R/W locks for serialization of updates to query tracker, file 
data, 

Instrumentation is added to monitor the
 * total amount of R/W locks within a particular category
 * average + max wait/suspension time to get the R/W lock

A category includes all lock instances for particular areas (i.e. category is 
FileData and all R/W locks that are used in FileData instances are accounted 
within the one category).

The monitoring/accounting is done via Hadoop Metrics 2, making them accessible 
via JMX. In addition, a new "locking" GET endpoint is added to the LLAP 
daemon's REST interface. It produces output like the following example:

{{{}}
{{  "statsCollection": "enabled",}}
{{  "lockStats": [}}
{{    {}}{{ "type": "R/W Lock Stats",}}
{{      "label": "FileData",}}
{{      "totalLockWaitTimeMillis": 0,}}
{{      "readLock": {}}
{{         "count": 0,}}
{{         "avgWaitTimeNanos": 0,}}
{{         "maxWaitTimeNanos": 0}}
{{      },}}
{{      "writeLock": {}}
{{         "count": 0,}}
{{         "avgWaitTimeNanos": 0,}}
{{         "maxWaitTimeNanos": 0}}
{{      }}}
{{    },}}
{{    { "}}{{type": "R/W Lock Stats",}}
{{      "label": "QueryTracker",}}
{{      "totalLockWaitTimeMillis": 0,}}
{{      "readLock": {}}
{{         "count": 0,}}
{{         "avgWaitTimeNanos": 0,}}
{{         "maxWaitTimeNanos": 0}}
{{      },}}
{{      "writeLock": {}}
{{         "count": 0,}}
{{         "avgWaitTimeNanos": 0,}}
{{         "maxWaitTimeNanos": 0}}
{{      }}}
{{    } }}{{]}}
{{  }}}

{{}}}

To avoid the overhead of lock instrumentation, lock metrics collection is 
disabled by default and can be enabled via the following configuration 
parameter:
 {{hive.llap.lockmetrics.collect = true}}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21183) Interrupt wait time for FileCacheCleanupThread

2019-01-29 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-21183:


 Summary: Interrupt wait time for FileCacheCleanupThread
 Key: HIVE-21183
 URL: https://issues.apache.org/jira/browse/HIVE-21183
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Oliver Draese
Assignee: Oliver Draese


The FileCacheCleanupThread is waiting unnecessarily long for eviction counts to 
increment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20773) Query result cache might contain stale MV data

2018-10-18 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-20773:


 Summary: Query result cache might contain stale MV data
 Key: HIVE-20773
 URL: https://issues.apache.org/jira/browse/HIVE-20773
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Oliver Draese
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)