Jun Gong created YARN-11714: ------------------------------- Summary: Add cache for createAndGetApplicationReport to improve perfocemance Key: YARN-11714 URL: https://issues.apache.org/jira/browse/YARN-11714 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.3.6 Reporter: Jun Gong
In our cluster, which consists of 2000+ nodes, 2000-8000 running applications, and 10,000 completed applications, it takes approximately 1 to 10 seconds to obtain the application list using YarnClient.getApplications(). Additionally, the ResourceManager (RM) event size often exceeds 100,000. Upon further investigation, I discovered that the createAndGetApplicationReport function consumes a significant amount of time, as it requires obtaining several critical locks, such as the RMApp lock, RMAppAttempt lock, and scheduler lock. This consequently reduces scheduler performance and slows down event handling. To enhance performance, I propose implementing a cache for storing the app reports of applications with a finished status (SUCCEEDED/FAILED/KILLED). Since the status of these applications will not change, caching their reports should be a viable solution. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org