[
https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310159#comment-16310159
]
Xuan Gong commented on YARN-7697:
---------------------------------
Upload a fix for this issue which includes two parts:
1) not use truncate API. Instead, we directly read the endIndex from checksum
file
2) Check UUID before we allocate byte array. Because UUID is fixed (both length
and value) and always appended at the end. If the loaded UUID is the same as
the original UUID, we can guarantee everything before the UUID is correct.
> NM goes down with OOM due to leak in log-aggregation
> ----------------------------------------------------
>
> Key: YARN-7697
> URL: https://issues.apache.org/jira/browse/YARN-7697
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Santhosh B Gowda
> Assignee: Xuan Gong
>
> 2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler
> (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread
> Thread[LogAggregationService #0,5,main] threw an Error. Shutting down now...
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
> at
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
> at
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
> at
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
> at
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:284)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-29 01:43:50,601 INFO application.ApplicationImpl
> (ApplicationImpl.java:handle(464)) - Application ap
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]