[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138647#comment-14138647
 ] 

Zhijie Shen commented on YARN-2468:
-----------------------------------

1. Shall we say "if the log file name matches both the include and the exclude 
patterns, the file will be excluded eventually"?
{code}
+ *     filter the log files. The log files which match the include
+ *     patterns will not be uploaded. If any patterns are defined in
+ *     both logIncludePatterns and logExcludePatterns, those files matched
+ *     with these patterns will not be uploaded.</li>
{code}

2. Except this method, other new APIs are marked \@Stable. Is it a bit 
aggressive as it is not a simple new feature?
{code}
+public abstract class LogContext {
+
+  @Public
+  @Unstable
+  public static LogContext newInstance(Set<String> logIncludePatterns,
{code}

3. Is ACCEPTED or SUBMITTED to early to say the log is available? The AM 
container my not have been launched yet.
{code}
       switch (appReport.getYarnApplicationState()) {
       case NEW:
       case NEW_SAVING:
+        return -1;
       case ACCEPTED:
       case SUBMITTED:
       case RUNNING:
-        return -1;
{code}

4. numOflogFiles -> getNumOfLogFilesToUpload?

5. The following code always executes no matter the include patterns are 
applied or not. getFilteredLogFiles log seem to be improvable.
{code}
      Set<File> candiates =
          new HashSet<File>(Arrays.asList(containerLogDir.listFiles()));
{code}

6. Why not just tmp suffix?
{code}
  public static final String TMP_FILE = "TEMP.tmp";
{code}

7. Should you call getRemoteNodeLogDirForApp here, though both methods seem to 
do the same thing?
{code}
+    Path remoteAppNodeLogDir = LogAggregationUtils.getRemoteNodeLogFileForApp(
{code}

8. In this case should we return -1?
{code}
    if (!foundContainerLogs) {
      System.out.println("Logs for container " + containerId
          + " are not present in this log-file.");
    }
{code}

9. Do you want to remove getRemoteNodeLogFileForApp (in LogAggregationUtils and 
LogAggregationService), which is not used by production code any more, and 
getRemoteNodeLogDirForApp is doing the identical stuff.

10. Not right variable name convention.
{code}
            Path node_appDir = LogAggregationUtils.getRemoteNodeLogDirForApp(
{code}
{code}
+    long time_stamp = 0;
{code}
{code}
+    while (count <= max_attempt) {
{code}

11. Change appLogAggregators to public directly?
{code}
  @Private
  @VisibleForTesting
  public ConcurrentMap<ApplicationId, AppLogAggregator> getAppLogAggregators() {
    return this.appLogAggregators;
  }
{code}

12. In testLogAggregationServiceWithInterval, round 1, 2, 3's logic seems to be 
similar. It is possible to have iteration of i = 1 -> 3?

bq. 3. Is the "day" unit too coarse?

If we want to use "day" as the unit, shall we allow "double" interval, such 
that I can say 0.1 day. I think it would be useful for testing purpose (not 
unit test, but verifying the feature is working on purpose, and the app is 
using the feature correctly). Thoughts?

bq. I think that we can solve this problem in separate ticket, because this 
ticket is the first step to solve Log handling for LRS.

Sounds good. It shouldn't be a common case, unless we config to delay the log 
deletion for debugging purpose. Please file a Jira about it.

> Log handling for LRS
> --------------------
>
>                 Key: YARN-2468
>                 URL: https://issues.apache.org/jira/browse/YARN-2468
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: log-aggregation, nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
> YARN-2468.5.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to