Xuan Gong commented on YARN-2468:

bq. Xuan, I looked through the code changes and have a question about uploading 
logs for unfinished containers. Let's say we have already uploaded syslog for a 
container at time T1. At time T2, the container is still running and when the 
log aggregation is triggered again, will it re-upload the same syslog file? 
That seems to be the case.

It will not. EveryTime after we do the log aggregation, we will save the 
information for aggregated log file with (containerId.toString() + "_" + 
file.getName() + "_"+ file.lastModified()). So, in next run, before we start to 
upload logs, we will check the log file whether it exists in the 
savedAggregatedLogFileCache (uploadedFileMeta in AppLogAggregatorImpl), if it 
exists, we will skip. Otherwise, we will upload it.

> Log handling for LRS
> --------------------
>                 Key: YARN-2468
>                 URL: https://issues.apache.org/jira/browse/YARN-2468
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: log-aggregation, nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>             Fix For: 2.6.0
>         Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
> YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
> YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
> YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
> YARN-2468.9.1.patch, YARN-2468.9.patch
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.

This message was sent by Atlassian JIRA

Reply via email to