[
https://issues.apache.org/jira/browse/YARN-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341565#comment-16341565
]
Rohith Sharma K S commented on YARN-7765:
-----------------------------------------
I see that there are couple of problem in NodeManager.
# HBase connection is created in seviceInit of TimelineWriter. But at this
point of time, NM had not yet done kinit.
# NMTimelinePublisher as well in serviceInit, nmLoginUgi has been copied to
local variable and used while creating a TimelineClient. So, TimelineClient is
created with current user but not with logged in user. As a result, NM throw
above exception while publishing as well.
# With same logged in user, it also affect NM recovery flow. All the recovered
application would also fail since applications are recovered in serviceInit
phase.
To fix all the 3 issue, we need to do secure login before initializing services
in NodeManager. Otherwise, we need to fix above 3 issues one by one in
different places.
> [Atsv2] GSSException: No valid credentials provided - Failed to find any
> Kerberos tgt thrown by HBaseClient in NM and HDFSClient in HBase daemons
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-7765
> URL: https://issues.apache.org/jira/browse/YARN-7765
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Priority: Critical
>
> Secure cluster is deployed and all YARN services are started successfully.
> When application is submitted, app collectors which is started as aux-service
> throwing below exception. But this exception is *NOT* observed from RM
> TimelineCollector.
> Cluster is deployed with Hadoop-3.0 and Hbase-1.2.6 secure cluster. All the
> YARN and HBase service are started and working perfectly fine. After 24 hours
> i.e when token lifetime is expired, HBaseClient in NM and HDFSClient in
> HMaster and HRegionServer started getting this error. After sometime, HBase
> daemons got shutdown. In NM, JVM didn't shutdown but none of the events got
> published.
> {noformat}
> 2018-01-17 11:04:48,017 FATAL ipc.RpcClientImpl (RpcClientImpl.java:run(684))
> - SASL authentication failed. The most likely cause is missing or invalid
> credentials. Consider 'kinit'.
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to find
> any Kerberos tgt)]
> {noformat}
> cc :/ [~vrushalic] [~varun_saxena]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]