[jira] [Commented] (YARN-2468) Log handling for LRS

Zhijie Shen (JIRA) Tue, 16 Sep 2014 23:57:03 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136869#comment-14136869
 ]


Zhijie Shen commented on YARN-2468:
-----------------------------------

[~xgong], thanks for the patch. I agree with the solution in general. Here're 
some comments so far about the patch. I may post more after a second scan.

1. LogContext doesn't need to be in ApplicatonSubmissionContext, because 
ApplicatonSubmissionContext contains ContainerLaunchContext. LogContext is 
container related stuff, such that ContainerLaunchContext should be the best 
place. Concurrently, we can have one context for all containers. Maybe in the 
future we can think of setting different LogContext for each individual 
container.

2. Shouldn't be "exclude" patterns?
{code}
+ *     <li>logExcludePattern. It defines include patterns which is used to
+ *     filter the log files. The log files which match the include
+ *     patterns will not be uploaded.</li>
{code}

3. Is the "day" unit too coarse?
{code}
+ *     how often the logAggregationSerivce uploads container logs in days
{code}

4. The message is a bit confusing. The application is not running or completed? 
Then what's the app state?
{code}
+        System.out.println("Application is not running or completed."
+            + " Logs are only available when an application is running"
+            + " or completes")
{code}

5. Why not logContext = p.getLogContext() directly?
{code}
+    LogContext logContext = null;
+    if (p.getLogContext() != null) {
+      logContext = new LogContextPBImpl(p.getLogContext());
+    }
{code}

6. Should the method be public?
{code}
    public int numOflogFiles() {
{code}

7. In getFilteredLogFiles, the logic is that if the log file matches the 
include pattern, it will be added first, and if then if it matches the exclude 
pattern, it will be removed. Shall we do the sanity check to make sure we can 
not include and exclude the same pattern, otherwise, the semantics is a bit 
weird.

8. The following code doesn't need to be executed by the doas ugi?
{code}
      this.remoteAppLogFile = remoteAppLogFile;
      fc = FileContext.getFileContext(conf);
      fc.setUMask(APP_LOG_FILE_UMASK);
{code}

9. Changes in dumpAllContainersLogs just are just reformatting? If so, shall we 
undo it?

10. Is it okay that we have removed the following code to close the writer?
{code}
    if (this.writer != null) {
      this.writer.close();
      LOG.info("Finished aggregate log-file for app " + this.applicationId);
    }
{code}

And I've two general questions:

1. If LogContext is not specified, we're running into the traditional log 
handling case, right? We will still have a combined log file identified by the 
node id? Or node id will always be the directory, and there exists only one 
file under it?

2. Let's say if work-preserving NM restarting happens, NM is going to forget 
all the uploaded logs files, and redo everything, right?

> Log handling for LRS
> --------------------
>
>                 Key: YARN-2468
>                 URL: https://issues.apache.org/jira/browse/YARN-2468
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: log-aggregation, nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2468) Log handling for LRS

Reply via email to