Zizon created YARN-9934:
---------------------------

             Summary: LogAggregationService should not submit aggregator when 
app dir creation fail
                 Key: YARN-9934
                 URL: https://issues.apache.org/jira/browse/YARN-9934
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: log-aggregation
            Reporter: Zizon
         Attachments: YARN-8246.patch

Before submiting a log aggreation runnable, LogAggregationService  will try to 
create the aggreated log dir.

In some case, it may fail(e.g dir num exceed max limit)

 

When it did failed and submitted to LogAggregationService, the runnable may run 
forever if some app statue flip misbehavior(e.g not handling application 
complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl be 
always true).

 

In our production(Version 2.7.3), this cause huge number of dangling 
aggregator(~400+ LogAggregationService threads alive for some node, in which 
nodemanager configured only 50+ vCPUs).

 

The patch try to early throw the creation exception, avoiding starting 
unnecessary log polling. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to