[ 
https://issues.apache.org/jira/browse/YARN-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706495#comment-13706495
 ] 

Omkar Vinit Joshi commented on YARN-592:
----------------------------------------

Just to be sure I might be wrong.... I am bit skeptical about .tmp file... are 
you sure it contains all the logs? My understanding is that it was still in the 
process and didn't finish with all. However even for completed logs.. it will 
enqueue them into the deletion service for future deletion....which may or may 
not happen even for graceful shutdown as we kill NM after some time...right? 
thoughts?

bq. This patch is trying to upload logs for the applications which run before 
and after NM restart. If the application gets completed after NM crash and 
before starting NM, atleast logs for the containers ran on that node can get 
from NM local logs dirs.

This seems to be problematic. The time difference between AM finishing and NM 
starting can be as low as sec..or as high as hours.. we need to have definite 
policy for handling logs.. because if we don't handle this logs will be lying 
on nm waiting for already finished app to finish ... right?.. thoughts?
                
> Container logs lost for the application when NM gets restarted
> --------------------------------------------------------------
>
>                 Key: YARN-592
>                 URL: https://issues.apache.org/jira/browse/YARN-592
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.1-alpha, 2.0.3-alpha
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Critical
>         Attachments: YARN-592.patch
>
>
> While running a big job if the NM goes down due to some reason and comes 
> back, it will do the log aggregation for the newly launched containers and 
> deletes all the containers for the application. This case we don't get the 
> container logs from HDFS or local for the containers which are launched 
> before restart and completed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to