Jun Gong created YARN-11714:
-------------------------------

             Summary: Add cache for createAndGetApplicationReport to improve 
perfocemance
                 Key: YARN-11714
                 URL: https://issues.apache.org/jira/browse/YARN-11714
             Project: Hadoop YARN
          Issue Type: Improvement
    Affects Versions: 3.3.6
            Reporter: Jun Gong


In our cluster, which consists of 2000+ nodes, 2000-8000 running applications, 
and 10,000 completed applications, it takes approximately 1 to 10 seconds to 
obtain the application list using YarnClient.getApplications(). Additionally, 
the ResourceManager (RM) event size often exceeds 100,000.

Upon further investigation, I discovered that the createAndGetApplicationReport 
function consumes a significant amount of time, as it requires obtaining 
several critical locks, such as the RMApp lock, RMAppAttempt lock, and 
scheduler lock. This consequently reduces scheduler performance and slows down 
event handling.

To enhance performance, I propose implementing a cache for storing the app 
reports of applications with a finished status (SUCCEEDED/FAILED/KILLED). Since 
the status of these applications will not change, caching their reports should 
be a viable solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to