[
https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941310#comment-13941310
]
Vinod Kumar Vavilapalli commented on YARN-1849:
-----------------------------------------------
Haven't looked at the patch, but in general there is a constant tussle between
keeping things up vs failing fast so as to be able to fix bugs.
I would in general avoid null checks unless I am sure - failing the RM/NM at
least uncovers the bug instead of limping with it and then breaking somewhere
else at which point it becomes hard to root-cause. If possible, let's fix what
is actually broken here instead of putting in a lot of null checks (if that is
what the above comments are talking about). Sure, we may run into one more
issue that we haven't foreseen, but we can atleast comfort in knowing that we
are addressing the right corner cases.
> NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
> ----------------------------------------------------------------------------
>
> Key: YARN-1849
> URL: https://issues.apache.org/jira/browse/YARN-1849
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.4.0
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
> Priority: Blocker
> Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch,
> yarn-1849-3.patch
>
>
> While running an UnmanagedAM on secure cluster, ran into an NPE on
> failover/restart. This is similar to YARN-1821.
--
This message was sent by Atlassian JIRA
(v6.2#6252)