[ 
https://issues.apache.org/jira/browse/YARN-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenruotao updated YARN-11848:
------------------------------
    Target Version/s: 3.4.1, 3.1.4  (was: 3.1.4, 3.4.1)
         Description: 
when nodemanager restart, there are too many logs like

"2025-03-23 01:53:05,883 WARN [RM Event dispatcher] rmnode.RMNodeImpl 
(RMNodeImpl.java:handleRunningAppOnNode(791)) - Cannot get RMApp by 
appId=application_1737511638592_8413719, just added it to finishedApplications 
list for cleanup"

print in resourcemanager.

when too many nodemanagers are restarted at one time, resourcemanager will be 
blocked.

I think nodemanager should be able to determine whether an app has fininshed on 
local and reduce the number of apps reported to resourcemanager.

https://github.com/apache/hadoop/pull/7861

  was:
when nodemanager restart, there are too many logs like

"2025-03-23 01:53:05,883 WARN [RM Event dispatcher] rmnode.RMNodeImpl 
(RMNodeImpl.java:handleRunningAppOnNode(791)) - Cannot get RMApp by 
appId=application_1737511638592_8413719, just added it to finishedApplications 
list for cleanup"

print in resourcemanager.

when too many nodemanagers are restarted at one time, resourcemanager will be 
blocked.

I think nodemanager should be able to determine whether an app has fininshed on 
local and reduce the number of apps reported to resourcemanager.

 


> Reduce the number of apps reported when nodemanager restart
> -----------------------------------------------------------
>
>                 Key: YARN-11848
>                 URL: https://issues.apache.org/jira/browse/YARN-11848
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.1.4, 3.4.1
>            Reporter: chenruotao
>            Priority: Major
>              Labels: pull-request-available
>
> when nodemanager restart, there are too many logs like
> "2025-03-23 01:53:05,883 WARN [RM Event dispatcher] rmnode.RMNodeImpl 
> (RMNodeImpl.java:handleRunningAppOnNode(791)) - Cannot get RMApp by 
> appId=application_1737511638592_8413719, just added it to 
> finishedApplications list for cleanup"
> print in resourcemanager.
> when too many nodemanagers are restarted at one time, resourcemanager will be 
> blocked.
> I think nodemanager should be able to determine whether an app has fininshed 
> on local and reduce the number of apps reported to resourcemanager.
> https://github.com/apache/hadoop/pull/7861



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to