[ 
https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187137#comment-15187137
 ] 

Jason Lowe commented on YARN-4783:
----------------------------------

>From the exception it appears the HDFS token is being cancelled before the 
>nodemanager gets around to aggregating the logs.  Without the valid HDFS token 
>the NM cannot perform log aggregation.  I remember there was some token 
>keepalive semantics in the the RM-NM protocol so nodemanagers could ask for 
>tokens to be kept alive after the application completed to perform cleanup 
>tasks like log aggregation.  However if the nodemanager was down too long then 
>that window will be missed and the RM will cancel the HDFS token.

The RM logs should shed some light on exactly what happened.  It should show 
the relative timing of the following events which would be interesting to know:
# When the application completed
# When the HDFS token was cancelled by the RM (check for 'Cancelling 
HDFS_DELEGATION_TOKEN token 9 for yarn')
# When the nodemanager reconnected to the RM (and presumably started log 
aggregation shortly afterwards)


> Log aggregation failure for application when Nodemanager is restarted 
> ----------------------------------------------------------------------
>
>                 Key: YARN-4783
>                 URL: https://issues.apache.org/jira/browse/YARN-4783
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: Surendra Singh Lilhore
>
> Scenario :
> =========
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user 
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped  (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal 
> directory
> Expect Output :
> ===============
> Log aggregation should be succesfull
> Actual Output :
> ===============
> Log aggreation not successfull



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to