[ 
https://issues.apache.org/jira/browse/YARN-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351459#comment-17351459
 ] 

Xiping Zhang commented on YARN-10781:
-------------------------------------

[~zhuqi]  

Yes, we have enabled rolling log aggregation, but that doesn't seem to be the 
problem.This long running job may occupy one thread for all nodes of the 
cluster due to the dynamic resource mechanism. If there are 100 such long 
running jobs, all the NM(default 100 threads) aggregation threads on the 
cluster will be occupied. : (

> The Thread of the NM aggregate log is exhausted and no other Application can 
> aggregate the log
> ----------------------------------------------------------------------------------------------
>
>                 Key: YARN-10781
>                 URL: https://issues.apache.org/jira/browse/YARN-10781
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.9.2, 3.3.0
>            Reporter: Xiping Zhang
>            Priority: Major
>         Attachments: applications.png, containers.png, containers.png
>
>
> We observed more than 100 applications running on one NM.Most of these 
> applications are SparkStreaming applications, but these applications do not 
> have running Containers.When the offline application running on it finishes, 
> the log cannot be reported to HDFS. When we killed a large number of 
> SparkStreaming applications, we found that a large number of log files were 
> being created on the NN side, causing the read and write performance on the 
> NN side to degrade significantly.Causes the business application to time out。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to