[
https://issues.apache.org/jira/browse/YARN-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706495#comment-13706495
]
Omkar Vinit Joshi commented on YARN-592:
----------------------------------------
Just to be sure I might be wrong.... I am bit skeptical about .tmp file... are
you sure it contains all the logs? My understanding is that it was still in the
process and didn't finish with all. However even for completed logs.. it will
enqueue them into the deletion service for future deletion....which may or may
not happen even for graceful shutdown as we kill NM after some time...right?
thoughts?
bq. This patch is trying to upload logs for the applications which run before
and after NM restart. If the application gets completed after NM crash and
before starting NM, atleast logs for the containers ran on that node can get
from NM local logs dirs.
This seems to be problematic. The time difference between AM finishing and NM
starting can be as low as sec..or as high as hours.. we need to have definite
policy for handling logs.. because if we don't handle this logs will be lying
on nm waiting for already finished app to finish ... right?.. thoughts?
> Container logs lost for the application when NM gets restarted
> --------------------------------------------------------------
>
> Key: YARN-592
> URL: https://issues.apache.org/jira/browse/YARN-592
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.0.1-alpha, 2.0.3-alpha
> Reporter: Devaraj K
> Assignee: Devaraj K
> Priority: Critical
> Attachments: YARN-592.patch
>
>
> While running a big job if the NM goes down due to some reason and comes
> back, it will do the log aggregation for the newly launched containers and
> deletes all the containers for the application. This case we don't get the
> container logs from HDFS or local for the containers which are launched
> before restart and completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira