[ 
https://issues.apache.org/jira/browse/YARN-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingqi Lu updated YARN-4282:
----------------------------
    Attachment: flamegraph.png

> JVM reuse in Yarn
> -----------------
>
>                 Key: YARN-4282
>                 URL: https://issues.apache.org/jira/browse/YARN-4282
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Yingqi Lu
>              Labels: performance
>         Attachments: flamegraph.png
>
>
> Dear All,
> Recently, we identified an issue inside Yarn with MapReduce. There is a 
> significant amount of time spent in libjvm.so and most of which is 
> compilation. 
> Attached is a flame graph (visual call graph) of a query running for about 8 
> mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
> functions are colored in red. Data show that more than 40% of overall 
> execution time is spent in compilation itself, but still a lot of code ran in 
> the interpreter mode by looking inside the JVM themselves. In the ideal case, 
> we want everything runs with compiled code over and over again. However in 
> reality, mappers and reducers are long died before the compilation benefits 
> kick in. In other word, we take the performance hit from both compilation and 
> interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
> was removed in Yarn. We are right now working on a bunch of JVM parameters to 
> minimize the impact of the performance, but still think it would be good to 
> open a discussion here to seek for more permanent solutions since it ties to 
> the nature of how Yarn works. 
> We are wondering if any of you have seen this issue before or if there is any 
> on-going project already happening to address this? 
> Data for this graph was collected across the entire system with multiple JVMs 
> running. The workload we use is BigBench workload 
> (https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).
> Thanks,
> Yingqi Lu
> 1. Software and workloads used in performance tests may have been optimized 
> for performance only on Intel microprocessors. Performance tests, such as 
> SYSmark and MobileMark, are measured using specific computer systems, 
> components, software, operations and functions. Any change to any of those 
> factors may cause the results to vary. You should consult other information 
> and performance tests to assist you in fully evaluating your contemplated 
> purchases, including the performance of that product when combined with other 
> products.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to