[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes
[ https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411847#comment-16411847 ] PRAFUL DASH commented on HIVE-17573: Its very Important guys , please get this done as you know already jdk 10 is going to available soon, so need to fix / make this compatable asap. Thanks, PRAFUL > LLAP: JDK9 support fixes > > > Key: HIVE-17573 > URL: https://issues.apache.org/jira/browse/HIVE-17573 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > > The perf diff between JDK8 -> JDK9 seems to be significant. > TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. > TPC-H Q6 on JDK9 takes 19s on the same host + same data. > The performance difference seems to come from better JIT and better NUMA > handling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes
[ https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312596#comment-16312596 ] liyunzhang commented on HIVE-17573: --- [~gopalv]: thanks for your reply and tool. bq.JDK9 seems to wake up the producer-consumer pair on the same NUMA zone (the IO elevator allocates, passes the array to the executor thread and executor passes it back instead of throwing it to GC deref). If I don't add {{-XX:+UseNUMA}}, I guess the optimization about NUMA handling will not benefit the query, is it right? UseNUMA is disabled by default. bq.the IO elevator allocates, passes the array to the executor thread and executor passes it back instead of throwing it to GC deref I guess this will reduce less GC. >From my test result, what i found is GC is less in JDK9 comparing JDK8 on Hive >on Spark in long >queries([link|https://docs.google.com/presentation/d/1cK9ZfUliAggH3NJzSvexTPwkXpbsM7Dm0o0kdmuFQUU/edit#slide=id.p]). > Maybe this is because G1GC is the default garbage collector and the purpose >of >[G1GC|https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector.htm#JSGCT-GUID-0394E76A-1A8F-425E-A0D0-B48A3DC82B42] > is less GC time. > LLAP: JDK9 support fixes > > > Key: HIVE-17573 > URL: https://issues.apache.org/jira/browse/HIVE-17573 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > > The perf diff between JDK8 -> JDK9 seems to be significant. > TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. > TPC-H Q6 on JDK9 takes 19s on the same host + same data. > The performance difference seems to come from better JIT and better NUMA > handling. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes
[ https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310895#comment-16310895 ] Gopal V commented on HIVE-17573: [~kellyzly]: I'm using LLAP with ORC, loaded using the bin_flat tpc-h script in hive-testbench. https://github.com/hortonworks/hive-testbench/tree/hdp26/ddl-tpch/bin_flat The hardware is {{Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz}}, with 256Gb RAM and with the following NUMA organization The memory is split as 128Gb Xmx + 32Gb cache for 24 executors, with a 180Gb container, which pretty much can fit the entire Q6 data in cache at the 1Tb scale. If you have the text-cache enabled (this takes multiple flags), you might be able to get similar performance from the text data as well, but the significant ORC speedup comes from loading data into lineitem in a natural order (the production-like ingest results in one file per day). {code} available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 131037 MB node 0 free: 127359 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 131072 MB node 1 free: 127987 MB node distances: node 0 1 0: 10 21 1: 21 10 {code} The setup uses a giant TLAB maximum so that the in-thread allocations go to the same NUMA zone. {{-XX:TLABSize=128m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts -XX:MetaspaceSize=1024m}} JDK9 seems to wake up the producer-consumer pair on the same NUMA zone (the IO elevator allocates, passes the array to the executor thread and executor passes it back instead of throwing it to GC deref). I'm not sure there's any actual movement on JEP-157 which would probably help this thread-to-thread object passing much more. bq. From which tool, you can get above conclusion? https://github.com/t3rmin4t0r/perf-map-agent/blob/jitdump/jit-objdump.sh That's the script which I use to attach GDB to a running JIT process and extract a JIT sample, with the additional CPU perf events. Here's an example of the final report I gather from the JIT (this was sent to Intel JDK team as a perf report, to see if they could fix {{public String(byte ascii[], int hibyte, int offset, int count)}} to be faster for very small strings). http://people.apache.org/~gopalv/perf-29529.tbz2 This is a perf event capture which contains for Q6 on text data (instead of ORC) {code} perf record -ag -e cycles,instructions,branch-misses,LLC-prefetch-misses,cache-misses,LLC-store-misses,LLC-load-misses {code} along with the JIT generated assembly. If you're on a x86_64 machine, then I guess run-report.sh should work. > LLAP: JDK9 support fixes > > > Key: HIVE-17573 > URL: https://issues.apache.org/jira/browse/HIVE-17573 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > > The perf diff between JDK8 -> JDK9 seems to be significant. > TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. > TPC-H Q6 on JDK9 takes 19s on the same host + same data. > The performance difference seems to come from better JIT and better NUMA > handling. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes
[ https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310797#comment-16310797 ] liyunzhang commented on HIVE-17573: --- [~gopalv]: It seems that on tpch_text_10(10g), there is no big difference between JDK9 and JDK8 for [TPC-H Q6| https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpch/tpch_query6.sql] on my machine( 1 node with CPU E5-2690 v2, 40 cores,62g ) jdk8: 54.189s jdk9: 67.86s {quote} The performance difference seems to come from better JIT and better NUMA handling. {quote} >From which tool, you can get above conclusion? Have you test it on machine >with NUMA architecture? > LLAP: JDK9 support fixes > > > Key: HIVE-17573 > URL: https://issues.apache.org/jira/browse/HIVE-17573 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > > The perf diff between JDK8 -> JDK9 seems to be significant. > TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. > TPC-H Q6 on JDK9 takes 19s on the same host + same data. > The performance difference seems to come from better JIT and better NUMA > handling. -- This message was sent by Atlassian JIRA (v6.4.14#64029)