[jira] [Commented] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted

Jason Lowe (JIRA) Thu, 10 Mar 2016 07:32:25 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189434#comment-15189434
 ]


Jason Lowe commented on YARN-4783:
----------------------------------

Thanks for posting the details from the logs!  The problem is as I suspected -- 
the RM cancelled the delegation token before log aggregation had started from 
the nodemanager.  In this case it was well before the nodemanager had a chance 
to aggregate, as the nodemanager wasn't recovered until 13.5 hours after the 
application completed.

I'm not sure what YARN can do to fix this scenario.  It's a security risk to 
leave the delegation token around too long after the application completed, and 
in the general case we can't leave it around forever because it will eventually 
expire on its own.  Therefore we can't support arbitrary delays between the 
application completing and the log aggregation starting.

> Log aggregation failure for application when Nodemanager is restarted 
> ----------------------------------------------------------------------
>
>                 Key: YARN-4783
>                 URL: https://issues.apache.org/jira/browse/YARN-4783
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: Surendra Singh Lilhore
>
> Scenario :
> =========
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user 
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped  (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal 
> directory
> Expect Output :
> ===============
> Log aggregation should be succesfull
> Actual Output :
> ===============
> Log aggreation not successfull



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted

Reply via email to