[
https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189434#comment-15189434
]
Jason Lowe commented on YARN-4783:
----------------------------------
Thanks for posting the details from the logs! The problem is as I suspected --
the RM cancelled the delegation token before log aggregation had started from
the nodemanager. In this case it was well before the nodemanager had a chance
to aggregate, as the nodemanager wasn't recovered until 13.5 hours after the
application completed.
I'm not sure what YARN can do to fix this scenario. It's a security risk to
leave the delegation token around too long after the application completed, and
in the general case we can't leave it around forever because it will eventually
expire on its own. Therefore we can't support arbitrary delays between the
application completing and the log aggregation starting.
> Log aggregation failure for application when Nodemanager is restarted
> ----------------------------------------------------------------------
>
> Key: YARN-4783
> URL: https://issues.apache.org/jira/browse/YARN-4783
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.7.1
> Reporter: Surendra Singh Lilhore
>
> Scenario :
> =========
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal
> directory
> Expect Output :
> ===============
> Log aggregation should be succesfull
> Actual Output :
> ===============
> Log aggreation not successfull
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)