[ 
https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310159#comment-16310159
 ] 

Xuan Gong commented on YARN-7697:
---------------------------------

Upload a fix for this issue which includes two parts:
1) not use truncate API. Instead, we directly read the endIndex from checksum 
file
2) Check UUID before we allocate byte array. Because UUID is fixed (both length 
and value) and always appended at the end. If the loaded UUID is the same as 
the original UUID, we can guarantee everything before the UUID is correct.

> NM goes down with OOM due to leak in log-aggregation
> ----------------------------------------------------
>
>                 Key: YARN-7697
>                 URL: https://issues.apache.org/jira/browse/YARN-7697
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Santhosh B Gowda
>            Assignee: Xuan Gong
>
> 2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler 
> (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread 
> Thread[LogAggregationService #0,5,main] threw an Error.  Shutting down now...
> java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
>         at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
>         at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
>         at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
>         at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>         at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:284)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2017-12-29 01:43:50,601 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application ap



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to