[
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152599#comment-14152599
]
Xuan Gong commented on YARN-2468:
---------------------------------
bq. Why is the test in TestAggregatedLogsBlock ignored?
We will have YARN-2583 for web UI related changes. This test will be failed
right now. So, I add @ignored
bq. pendingUploadFiles is really not neded to be a class field. Rename
getNumOfLogFilesToUpload() to be getPendingLogFilesToUploadForThisContainer()
and return the set of pending files. LogValue.write() can then take Set<File>
pendingLogFilesToUpload as one of the arguments.
I would like to check how many log files we can upload this time. If the number
is 0, we can skip this time. And this check is also happened before
LogKey.write(), otherwise, we will write key, but without value.
bq. If deletion of previously uploaded file takes a while and the file remains
by the time of the next cycle, we will upload it again? It seems to be, let's
validate this via a test-case.
No, it will not. That is why I saved many information, such as
allExistingFiles, alreadyUploadedFiles and etc. We will those to check whether
the logs have been uploaded before.
bq. testLogAggregationServiceWithInterval: doLogAggregationOutOfBand +
Thread.sleep() is unreliable. Use a clock and refactor AppLogAggregatorImpl to
have the cyclic aggregation directly callable via a method.
The Thread.sleep() is not used to trigger the logAggregation. It is used to
make sure the logs has been uploaded into the remote directory. But, deleted
those Thread.sleep() from the testcases.
> Log handling for LRS
> --------------------
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: log-aggregation, nodemanager, resourcemanager
> Reporter: Xuan Gong
> Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch,
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch,
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch,
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch,
> YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch,
> YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log
> aggregation. But for Long running service applications, this is not ideal.
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a
> single file. The files could become larger and larger.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)