[jira] [Comment Edited] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

zhengchenyu (JIRA) Fri, 23 Jun 2017 00:02:06 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059056#comment-16059056
 ]


zhengchenyu edited comment on YARN-6728 at 6/23/17 7:00 AM:
------------------------------------------------------------

Our partner [~maobaolong] advice to remove verifyAndCreateRemoteLogDir and 
mkdir to daemon thread. By this way,  container will not  be stuck by defaultFs 
before container runs. In our test, we found the speed of running containers 
are significantly improved.


was (Author: zhengchenyu):
Our partner [~maobaolong] advice to remove verifyAndCreateRemoteLogDir and 
mkdir to daemon thread. By this way container will not  be stuck by defaultFs

> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6728
>                 URL: https://issues.apache.org/jira/browse/YARN-6728
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, yarn
>    Affects Versions: 2.7.1
>         Environment: CentOS 7.1 hadoop-2.7.1
>            Reporter: zhengchenyu
>             Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_000011 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_000011 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
> the defaultFs. So the container will be stuck here. Then application will run 
> slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

Reply via email to