[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138647#comment-14138647 ]
Zhijie Shen commented on YARN-2468: ----------------------------------- 1. Shall we say "if the log file name matches both the include and the exclude patterns, the file will be excluded eventually"? {code} + * filter the log files. The log files which match the include + * patterns will not be uploaded. If any patterns are defined in + * both logIncludePatterns and logExcludePatterns, those files matched + * with these patterns will not be uploaded.</li> {code} 2. Except this method, other new APIs are marked \@Stable. Is it a bit aggressive as it is not a simple new feature? {code} +public abstract class LogContext { + + @Public + @Unstable + public static LogContext newInstance(Set<String> logIncludePatterns, {code} 3. Is ACCEPTED or SUBMITTED to early to say the log is available? The AM container my not have been launched yet. {code} switch (appReport.getYarnApplicationState()) { case NEW: case NEW_SAVING: + return -1; case ACCEPTED: case SUBMITTED: case RUNNING: - return -1; {code} 4. numOflogFiles -> getNumOfLogFilesToUpload? 5. The following code always executes no matter the include patterns are applied or not. getFilteredLogFiles log seem to be improvable. {code} Set<File> candiates = new HashSet<File>(Arrays.asList(containerLogDir.listFiles())); {code} 6. Why not just tmp suffix? {code} public static final String TMP_FILE = "TEMP.tmp"; {code} 7. Should you call getRemoteNodeLogDirForApp here, though both methods seem to do the same thing? {code} + Path remoteAppNodeLogDir = LogAggregationUtils.getRemoteNodeLogFileForApp( {code} 8. In this case should we return -1? {code} if (!foundContainerLogs) { System.out.println("Logs for container " + containerId + " are not present in this log-file."); } {code} 9. Do you want to remove getRemoteNodeLogFileForApp (in LogAggregationUtils and LogAggregationService), which is not used by production code any more, and getRemoteNodeLogDirForApp is doing the identical stuff. 10. Not right variable name convention. {code} Path node_appDir = LogAggregationUtils.getRemoteNodeLogDirForApp( {code} {code} + long time_stamp = 0; {code} {code} + while (count <= max_attempt) { {code} 11. Change appLogAggregators to public directly? {code} @Private @VisibleForTesting public ConcurrentMap<ApplicationId, AppLogAggregator> getAppLogAggregators() { return this.appLogAggregators; } {code} 12. In testLogAggregationServiceWithInterval, round 1, 2, 3's logic seems to be similar. It is possible to have iteration of i = 1 -> 3? bq. 3. Is the "day" unit too coarse? If we want to use "day" as the unit, shall we allow "double" interval, such that I can say 0.1 day. I think it would be useful for testing purpose (not unit test, but verifying the feature is working on purpose, and the app is using the feature correctly). Thoughts? bq. I think that we can solve this problem in separate ticket, because this ticket is the first step to solve Log handling for LRS. Sounds good. It shouldn't be a common case, unless we config to delay the log deletion for debugging purpose. Please file a Jira about it. > Log handling for LRS > -------------------- > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager > Reporter: Xuan Gong > Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, > YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, > YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, > YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, > YARN-2468.5.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)