[
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589432#comment-16589432
]
Jason Lowe commented on YARN-8649:
----------------------------------
Thanks for updating the patch! Logic looks good overall, but I have some
concerns on the logging that was added.
I think it's misleading to assume the NM is shutting down when this situation
occurs. As I understand it, the main trigger for this scenario is a container
getting killed while it is still localizing. That can happen when the NM shuts
down, but it can also happen without the NM shutting down. Therefore it seems
inappropriate to assume this scenario means the NM is shutting down. There are
already separate logs when the NM decides to shut down so probably best to keep
this logging to just the fact that the resource was removed before we got
around to localizing it and therefore will no longer be localized.
The warning log should show the source resource, similar to what is done in the
public localization debug code that was added, rather than the local path. The
local path won't mean as much as the resource that was requested, as that
source resource path was logged when it was initially requested by the
container.
There is debug logging in the public localizer case but not the private case
which is inconsistent. Arguably if it's useful for the public case it would be
useful for the private case. Given there's a loud warning log already in the
common getPathForLocalization code, I'm not sure the debug log in the public
path adds any value, especially if we change the loud warning log to show the
source path.
> Similar as YARN-4355:NPE while processing localizer heartbeat
> -------------------------------------------------------------
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.1.1
> Reporter: lujie
> Assignee: lujie
> Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch,
> YARN-8649_4.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The
> reason maybe similar to YARN-4355 which is reported by [# Jason Lowe].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]