[ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876750#comment-16876750
 ] 

liyakun commented on YARN-9480:
-------------------------------

Thanks [~tangzhankun] and [~Weiwei Yang]. [~Yunyao Zhang] Please help to solve 
this issue ASAP.

> createAppDir() in LogAggregationService shouldn't block dispatcher thread of 
> ContainerManagerImpl
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-9480
>                 URL: https://issues.apache.org/jira/browse/YARN-9480
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: liyakun
>            Assignee: Yunyao Zhang
>            Priority: Major
>
> At present, when startContainers(), if NM does not contain the application, 
> it will enter the step of INIT_APPLICATION. In the application init step, 
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file 
> system. This operation is affected by the SLA of the external file system. 
> Once the external file system has a high latency, the NM dispatcher thread of 
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM 
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time 
> of uploading log (in other threads). And according to the logRetentionPolicy, 
> many of the containers may not get to this step, which will save a lot of 
> interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to