Xuan Gong commented on YARN-2468:

bq. Why is the test in TestAggregatedLogsBlock ignored?

We will have YARN-2583 for web UI related changes. This test will be failed 
right now. So, I add @ignored

bq. pendingUploadFiles is really not neded to be a class field. Rename 
getNumOfLogFilesToUpload() to be getPendingLogFilesToUploadForThisContainer() 
and return the set of pending files. LogValue.write() can then take Set<File> 
pendingLogFilesToUpload as one of the arguments.

I would like to check how many log files we can upload this time. If the number 
is 0, we can skip this time. And this check is also happened before 
LogKey.write(), otherwise, we will write key, but without value.

bq. If deletion of previously uploaded file takes a while and the file remains 
by the time of the next cycle, we will upload it again? It seems to be, let's 
validate this via a test-case.

No, it will not. That is why I saved many information, such as 
allExistingFiles, alreadyUploadedFiles and etc. We will those to check whether 
the logs have been uploaded before.

bq. testLogAggregationServiceWithInterval: doLogAggregationOutOfBand + 
Thread.sleep() is unreliable. Use a clock and refactor AppLogAggregatorImpl to 
have the cyclic aggregation directly callable via a method.

The Thread.sleep() is not used to trigger the logAggregation. It is used to 
make sure the logs has been uploaded into the remote directory. But, deleted 
those Thread.sleep() from the testcases.

> Log handling for LRS
> --------------------
>                 Key: YARN-2468
>                 URL: https://issues.apache.org/jira/browse/YARN-2468
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: log-aggregation, nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
> YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
> YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.patch
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.

This message was sent by Atlassian JIRA

Reply via email to