[ 
https://issues.apache.org/jira/browse/YARN-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341565#comment-16341565
 ] 

Rohith Sharma K S commented on YARN-7765:
-----------------------------------------

I see that there are couple of problem in NodeManager.
# HBase connection is created in seviceInit of TimelineWriter. But at this 
point of time, NM had not yet done kinit. 
# NMTimelinePublisher as well in serviceInit, nmLoginUgi has been copied to 
local variable and used while creating a TimelineClient. So, TimelineClient is 
created with current user but not with logged in user. As a result, NM throw 
above exception while publishing as well. 
# With same logged in user, it also affect NM recovery flow. All the recovered 
application would also fail since applications are recovered in serviceInit 
phase. 

To fix all the 3 issue, we need to do secure login before initializing services 
in NodeManager. Otherwise, we need to fix above 3 issues one by one in 
different places.

> [Atsv2] GSSException: No valid credentials provided - Failed to find any 
> Kerberos tgt thrown by HBaseClient in NM and HDFSClient in HBase daemons
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-7765
>                 URL: https://issues.apache.org/jira/browse/YARN-7765
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>
> Secure cluster is deployed and all YARN services are started successfully. 
> When application is submitted, app collectors which is started as aux-service 
> throwing below exception. But this exception is *NOT* observed from RM 
> TimelineCollector. 
> Cluster is deployed with Hadoop-3.0 and Hbase-1.2.6 secure cluster. All the 
> YARN and HBase service are started and working perfectly fine. After 24 hours 
> i.e when token lifetime is expired, HBaseClient in NM and HDFSClient in 
> HMaster and HRegionServer started getting this error. After sometime, HBase 
> daemons got shutdown. In NM, JVM didn't shutdown but none of the events got 
> published.
> {noformat}
> 2018-01-17 11:04:48,017 FATAL ipc.RpcClientImpl (RpcClientImpl.java:run(684)) 
> - SASL authentication failed. The most likely cause is missing or invalid 
> credentials. Consider 'kinit'.
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> {noformat}
> cc :/ [~vrushalic] [~varun_saxena] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to