[jira] [Commented] (YARN-11402) Meaningless logs are frequently printed during ResourceManager startup and recover container.

ASF GitHub Bot (Jira) Tue, 20 Dec 2022 18:26:04 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650056#comment-17650056
 ]


ASF GitHub Bot commented on YARN-11402:
---------------------------------------

Daniel-009497 commented on code in PR #5247:
URL: https://github.com/apache/hadoop/pull/5247#discussion_r1053913286


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java:
##########
@@ -506,6 +506,7 @@ public void setEntitlement(String queue, QueueEntitlement 
entitlement)
   private void killOrphanContainerOnNode(RMNode node,
       NMContainerStatus container) {
     if (!container.getContainerState().equals(ContainerState.COMPLETE)) {
+      LOG.warn("Killing container " + container + " for unknown application");

Review Comment:
   > Will this change result in more log output? Does this mean that all apps 
that enter this judgment must be printed?
   
   @slfan1989  Not really, Only the orphan container which status is unfinished 
will be killed and logged here.  What is really matters is the minority ones be 
handled not the majority ones be skipped. 





> Meaningless logs are frequently printed during ResourceManager startup and 
> recover container.
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-11402
>                 URL: https://issues.apache.org/jira/browse/YARN-11402
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Daniel Ma
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> Tens of thousands of meaningless logs are frequently printed during 
> ResourceManager startup and recover container.
> As we know, ResourceManager will always keep 10k application information by 
> default. In our very big scale cluster, it is very usual that resourcemanager 
> try to recover the containers which already finished and does not exist in 
> ResourceManager but still reported by nodemanager.
> Under this case, below logs will be frequently printed,  more importantly, 
> this log is meaningless, in real production setups, the maintainers actually 
> more care about which containers are properly recovered or killed not the 
> ones are skipped.
> The related code are as follows,
>  !screenshot-1.png! 
> So we move the log into function killOrphanContainerOnNode().
>  !screenshot-2.png! 
> Only the containers to be killed need to be loged  which is vital for trouble 
> shooting to distinguish whether the containers are kill by hadoop inner 
> mechanism or by users themselves.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-11402) Meaningless logs are frequently printed during ResourceManager startup and recover container.

Reply via email to