Yingqi Lu created YARN-4282:
-------------------------------

             Summary: JVM reuse in Yarn
                 Key: YARN-4282
                 URL: https://issues.apache.org/jira/browse/YARN-4282
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: Yingqi Lu


Dear All,

Recently, we identified an issue inside Yarn with MapReduce. There is a 
significant amount of time spent in libjvm.so and most of which is compilation. 

Attached is a flame graph (visual call graph) of a query running for about 8 
mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
functions are colored in red. Data show that more than 40% of overall execution 
time is spent in compilation itself, but still a lot of code ran in the 
interpreter mode by looking inside the JVM themselves. In the ideal case, we 
want everything runs with compiled code over and over again. However in 
reality, mappers and reducers are long died before the compilation benefits 
kick in. In other word, we take the performance hit from both compilation and 
interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
was removed in Yarn. We are right now working on a bunch of JVM parameters to 
minimize the impact of the performance, but still think it would be good to 
open a discussion here to seek for more permanent solutions since it ties to 
the nature of how Yarn works. 

We are wondering if any of you have seen this issue before or if there is any 
on-going project already happening to address this? 

Data for this graph was collected across the entire system with multiple JVMs 
running. The workload we use is BigBench workload 
(https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).

Thanks,
Yingqi Lu

1. Software and workloads used in performance tests may have been optimized for 
performance only on Intel microprocessors. Performance tests, such as SYSmark 
and MobileMark, are measured using specific computer systems, components, 
software, operations and functions. Any change to any of those factors may 
cause the results to vary. You should consult other information and performance 
tests to assist you in fully evaluating your contemplated purchases, including 
the performance of that product when combined with other products.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to