[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Li Lu updated YARN-4265: ------------------------ Attachment: YARN-4265-trunk.004.patch Thanks [~djp] for the review! I updated my patch according to your comments. Some quick comments: bq. I am a bit confused with logic here: if appLogs is not done yet, but its detail logs is empty, do we need to scanForLogs? If not, we should document the reason at the least. Yes, we only update summary logs when the app is running. Updated comments for this. bq. If we have two groupIds: 114859476_01_1 and 114859476_01_11, the later one's log file name can match with previous groupId as well? If so, we may consider to match file name with cache id more exactly? The same case with code below {{if (log.getFilename().contains(groupId.toString())) }} Nice catch! What I'm trying to address here is the names with entity group id and a sequence number. I've updated related logic here. bq. For cleanLogs(Path dirpath), it seems like the execution result of cleanup log depends on the order of files/directories returned. Say an app dir include: file A, dir B, file A is a fresh one and all files in dir B are older than logRetainMillis. If file A get return first, the cleanLogs() do nothing, but if dir B get return first, cleanLogs() will clenup dir B. Give fs.listStatusIterator(dirpath) could return file A, dir B in randomly order, is this randomly behavior expected? This is not possible because in the first part of cleanLogs(), we're only doing a DFS to decide if we need to remove this dir. If any file in the directory is new, we will not remove it. The detailed remove logic happens after the DFS process. bq. Is it a common case for a AppLogs have many summaryLogs (and detail logs)? Right now we're not facing this kind of use case. We can certainly optimize this logic in future though. bq. Can we directly return appDirPath's modification time instead of go through all sub directories? I believe we cannot. We're trying to return the latest time any file within a directory has been changed to decide if the app is in UNKNOWN state for long enough in parseSummaryLogs. > Provide new timeline plugin storage to support fine-grained entity caching > -------------------------------------------------------------------------- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Li Lu > Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)