[
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yunyao Zhang updated YARN-9480:
-------------------------------
Attachment: (was: YARN-9480.001.patch)
> createAppDir() in LogAggregationService shouldn't block dispatcher thread of
> ContainerManagerImpl
> -------------------------------------------------------------------------------------------------
>
> Key: YARN-9480
> URL: https://issues.apache.org/jira/browse/YARN-9480
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Reporter: liyakun
> Assignee: Yunyao Zhang
> Priority: Major
>
> At present, when startContainers(), if NM does not contain the application,
> it will enter the step of INIT_APPLICATION. In the application init step,
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file
> system. This operation is affected by the SLA of the external file system.
> Once the external file system has a high latency, the NM dispatcher thread of
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time
> of uploading log (in other threads). And according to the logRetentionPolicy,
> many of the containers may not get to this step, which will save a lot of
> interactions with external file system.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]