liyakun created YARN-9480:
-----------------------------

             Summary: createAppDir() in LogAggregationService shouldn't block 
dispatcher thread of ContainerManagerImpl
                 Key: YARN-9480
                 URL: https://issues.apache.org/jira/browse/YARN-9480
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: nodemanager
            Reporter: liyakun
            Assignee: liyakun


At present, when startContainers(), if NM does not contain the application, it 
will enter the step of INIT_APPLICATION. In the application init step, 
createAppDir() will be executed, and it is a blocking operation.

createAppDir() is an operation that needs to interact with an external file 
system. This operation is affected by the SLA of the external file system. Once 
the external file system has a high latency, the NM dispatcher thread of 
ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM stuck 
here for more than an hour.)

I think it would be more reasonable to move createAppDir() to the actual time 
of uploading log (in other threads). And according to the logRetentionPolicy, 
many of the containers may not get to this step, which will save a lot of 
interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to