[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589432#comment-16589432
 ] 

Jason Lowe commented on YARN-8649:
----------------------------------

Thanks for updating the patch!  Logic looks good overall, but I have some 
concerns on the logging that was added.

I think it's misleading to assume the NM is shutting down when this situation 
occurs.  As I understand it, the main trigger for this scenario is a container 
getting killed while it is still localizing.  That can happen when the NM shuts 
down, but it can also happen without the NM shutting down.  Therefore it seems 
inappropriate to assume this scenario means the NM is shutting down.  There are 
already separate logs when the NM decides to shut down so probably best to keep 
this logging to just the fact that the resource was removed before we got 
around to localizing it and therefore will no longer be localized.

The warning log should show the source resource, similar to what is done in the 
public localization debug code that was added, rather than the local path.  The 
local path won't mean as much as the resource that was requested, as that 
source resource path was logged when it was initially requested by the 
container.

There is debug logging in the public localizer case but not the private case 
which is inconsistent.  Arguably if it's useful for the public case it would be 
useful for the private case.  Given there's a loud warning log already in the 
common getPathForLocalization code, I'm not sure the debug log in the public 
path adds any value, especially if we change the loud warning log to show the 
source path.


> Similar as YARN-4355:NPE while processing localizer heartbeat
> -------------------------------------------------------------
>
>                 Key: YARN-8649
>                 URL: https://issues.apache.org/jira/browse/YARN-8649
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.1
>            Reporter: lujie
>            Assignee: lujie
>            Priority: Major
>         Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to