Yingqi Lu created YARN-4282:
-------------------------------
Summary: JVM reuse in Yarn
Key: YARN-4282
URL: https://issues.apache.org/jira/browse/YARN-4282
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Yingqi Lu
Dear All,
Recently, we identified an issue inside Yarn with MapReduce. There is a
significant amount of time spent in libjvm.so and most of which is compilation.
Attached is a flame graph (visual call graph) of a query running for about 8
mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java
functions are colored in red. Data show that more than 40% of overall execution
time is spent in compilation itself, but still a lot of code ran in the
interpreter mode by looking inside the JVM themselves. In the ideal case, we
want everything runs with compiled code over and over again. However in
reality, mappers and reducers are long died before the compilation benefits
kick in. In other word, we take the performance hit from both compilation and
interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it
was removed in Yarn. We are right now working on a bunch of JVM parameters to
minimize the impact of the performance, but still think it would be good to
open a discussion here to seek for more permanent solutions since it ties to
the nature of how Yarn works.
We are wondering if any of you have seen this issue before or if there is any
on-going project already happening to address this?
Data for this graph was collected across the entire system with multiple JVMs
running. The workload we use is BigBench workload
(https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).
Thanks,
Yingqi Lu
1. Software and workloads used in performance tests may have been optimized for
performance only on Intel microprocessors. Performance tests, such as SYSmark
and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance
tests to assist you in fully evaluating your contemplated purchases, including
the performance of that product when combined with other products.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)