[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes

2018-03-23 Thread PRAFUL DASH (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411847#comment-16411847
 ] 

PRAFUL DASH commented on HIVE-17573:


Its very Important guys , please get this done as you know already jdk 10 is 
going to available  soon,  so need to fix / make this compatable asap.

 

Thanks,

PRAFUL

> LLAP: JDK9 support fixes
> 
>
> Key: HIVE-17573
> URL: https://issues.apache.org/jira/browse/HIVE-17573
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>
> The perf diff between JDK8 -> JDK9 seems to be significant.  
> TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. 
> TPC-H Q6 on JDK9 takes 19s on the same host + same data.
> The performance difference seems to come from better JIT and better NUMA 
> handling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes

2018-01-04 Thread liyunzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312596#comment-16312596
 ] 

liyunzhang commented on HIVE-17573:
---

[~gopalv]: thanks for your reply and tool.  
bq.JDK9 seems to wake up the producer-consumer pair on the same NUMA zone (the 
IO elevator allocates, passes the array to the executor thread and executor 
passes it back instead of throwing it to GC deref).
 If I don't add {{-XX:+UseNUMA}}, I guess the optimization about NUMA handling 
will not benefit the query, is it right? UseNUMA is disabled by default.
bq.the IO elevator allocates, passes the array to the executor thread and 
executor passes it back instead of throwing it to GC deref
I guess this will reduce less GC.

>From my test result, what i found is GC is less in JDK9 comparing JDK8 on Hive 
>on Spark  in long 
>queries([link|https://docs.google.com/presentation/d/1cK9ZfUliAggH3NJzSvexTPwkXpbsM7Dm0o0kdmuFQUU/edit#slide=id.p]).
> Maybe this is because G1GC is the default garbage collector and the purpose 
>of 
>[G1GC|https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector.htm#JSGCT-GUID-0394E76A-1A8F-425E-A0D0-B48A3DC82B42]
> is less GC time.

> LLAP: JDK9 support fixes
> 
>
> Key: HIVE-17573
> URL: https://issues.apache.org/jira/browse/HIVE-17573
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> The perf diff between JDK8 -> JDK9 seems to be significant.  
> TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. 
> TPC-H Q6 on JDK9 takes 19s on the same host + same data.
> The performance difference seems to come from better JIT and better NUMA 
> handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes

2018-01-03 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310895#comment-16310895
 ] 

Gopal V commented on HIVE-17573:


[~kellyzly]: I'm using LLAP with ORC, loaded using the bin_flat tpc-h script in 
hive-testbench.

https://github.com/hortonworks/hive-testbench/tree/hdp26/ddl-tpch/bin_flat

The hardware is {{Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz}}, with 256Gb RAM 
and with the following NUMA organization

The memory is split as 128Gb Xmx + 32Gb cache for 24 executors, with a 180Gb 
container, which pretty much can fit the entire Q6 data in cache at the 1Tb 
scale.

If you have the text-cache enabled (this takes multiple flags), you might be 
able to get similar performance from the text data as well, but the significant 
ORC speedup comes from loading data into lineitem in a natural order (the 
production-like ingest results in one file per day).

{code}
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 131037 MB
node 0 free: 127359 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 131072 MB
node 1 free: 127987 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 
{code}

The setup uses a giant TLAB maximum so that the in-thread allocations go to the 
same NUMA zone.

{{-XX:TLABSize=128m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts 
-XX:MetaspaceSize=1024m}}

JDK9 seems to wake up the producer-consumer pair on the same NUMA zone (the IO 
elevator allocates, passes the array to the executor thread and executor passes 
it back instead of throwing it to GC deref).

I'm not sure there's any actual movement on JEP-157 which would probably help 
this thread-to-thread object passing much more.

bq. From which tool, you can get above conclusion?

https://github.com/t3rmin4t0r/perf-map-agent/blob/jitdump/jit-objdump.sh

That's the script which I use to attach GDB to a running JIT process and 
extract a JIT sample, with the additional CPU perf events.

Here's an example of the final report I gather from the JIT (this was sent to 
Intel JDK team as a perf report, to see if they could fix {{public String(byte 
ascii[], int hibyte, int offset, int count)}} to be faster for very small 
strings).

http://people.apache.org/~gopalv/perf-29529.tbz2

This is a perf event capture which contains for Q6 on text data (instead of ORC)

{code}
perf record -ag -e 
cycles,instructions,branch-misses,LLC-prefetch-misses,cache-misses,LLC-store-misses,LLC-load-misses
{code}

along with the JIT generated assembly.

If you're on a x86_64 machine, then I guess run-report.sh should work.


> LLAP: JDK9 support fixes
> 
>
> Key: HIVE-17573
> URL: https://issues.apache.org/jira/browse/HIVE-17573
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> The perf diff between JDK8 -> JDK9 seems to be significant.  
> TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. 
> TPC-H Q6 on JDK9 takes 19s on the same host + same data.
> The performance difference seems to come from better JIT and better NUMA 
> handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes

2018-01-03 Thread liyunzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310797#comment-16310797
 ] 

liyunzhang commented on HIVE-17573:
---

[~gopalv]:
It seems that on tpch_text_10(10g), there is no big difference between JDK9 and 
JDK8 for [TPC-H Q6| 
https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpch/tpch_query6.sql]
 on my machine( 1 node with CPU E5-2690 v2, 40 cores,62g )

jdk8: 54.189s

jdk9: 67.86s

{quote}
The performance difference seems to come from better JIT and better NUMA 
handling.
{quote}

>From which tool, you can get above conclusion? Have you test it on machine 
>with NUMA architecture?

> LLAP: JDK9 support fixes
> 
>
> Key: HIVE-17573
> URL: https://issues.apache.org/jira/browse/HIVE-17573
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> The perf diff between JDK8 -> JDK9 seems to be significant.  
> TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. 
> TPC-H Q6 on JDK9 takes 19s on the same host + same data.
> The performance difference seems to come from better JIT and better NUMA 
> handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)