[
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163116#comment-14163116
]
Zhijie Shen commented on YARN-2582:
---
Thanks for the patch, Xuan! Here're some comments and thoughts.
1. This may not be sufficient. For example, "tmp_\[timestamp\].log" will become
a false positive case here. We need to check whether the file name ends with
TMP_FILE_SUFFIX or not.
{code}
&& !fileName.contains(LogAggregationUtils.TMP_FILE_SUFFIX)) {
{code}
{code}
if (!thisNodeFile.getPath().getName()
.contains(LogAggregationUtils.TMP_FILE_SUFFIX)) {
{code}
2. I'm thinking what the better way should be to handle the case that it fails
at fetching one log file, but not the others. Thoughts?
{code}
AggregatedLogFormat.LogReader reader = null;
try {
reader =
new AggregatedLogFormat.LogReader(getConf(),
thisNodeFile.getPath());
if (dumpAContainerLogs(containerId, reader, System.out) > -1) {
foundContainerLogs = true;
}
} finally {
if (reader != null) {
reader.close();
}
}
{code}
3. In fact, in the case of either single container or all containers, when the
container log is not found, it is possible that 1) the container log hasn't
been aggregated yet or 2) the log aggregation is not enabled. The the former
case it is also possible to have the third situation: the user are providing an
invalid nodeId. It's good to uniform the warning message.
{code}
if (!foundContainerLogs) {
System.out.println("Logs for container " + containerId
+ " are not present in this log-file.");
return -1;
}
{code}
{code}
if (! foundAnyLogs) {
System.out.println("Logs not available at " + remoteAppLogDir.toString());
System.out
.println("Log aggregation has not completed or is not enabled.");
return -1;
}
{code}
4. It seems that we don't have unit tests to cover the successful path of
dumping the container log(s) currently. I'm not sure if it is going to be a big
addition. If it is, let's handle it separately.
There're two general issues:
1. Currently we don't have the web page on RM web app to show the aggregated
logs. One the other hand, LRS is supposed to be always in the running state,
such that the timeline server will not present it. Therefore, we still haven't
a web entrance to view the aggregated logs.
2. For LRS, the latest logs are not aggregated in the HDFS, but in the local
dir of NM. When a user says he wants to check the logs of this LRS, it doesn't
make sense that he can just view yesterday's log, because today's is still not
uploaded to HDFS. It's good to have a combined view of both latest log on local
dir of NM and aggregated logs on HDFS. Thoughts?
> Log related CLI and Web UI changes for LRS
> --
>
> Key: YARN-2582
> URL: https://issues.apache.org/jira/browse/YARN-2582
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2582.1.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for
> Long Running Service. Log CLI and related Web UI should be modified
> accordingly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)