[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089413#comment-15089413
 ] 

Junping Du commented on YARN-4265:
----------------------------------

Thanks [~gtCarrera9] for updating the patch. 
trunk.003 patch is getting closer, more comments: 
In yarn-default.xml,
{code}
+      How long the ATS v1.5 entity group file system storage systemwill keep an
+      application's data in the done directory.
{code}
"file system storage systemwill" should be "file system storage will"

In EntityCacheItem.java, inside of refreshCache():
{code}
+    if (needRefresh()) {
+      if (!appLogs.isDone()) {
+        appLogs.parseSummaryLogs();
+      } else if (appLogs.getDetailLogs().isEmpty()) {
+        appLogs.scanForLogs();
+      }
+      if (!appLogs.getDetailLogs().isEmpty()) {
{code}
I am a bit confused with logic here: if appLogs is not done yet, but its detail 
logs is empty, do we need to scanForLogs? If not, we should document the reason 
at the least.

{code}
+        if (appLogs.getDetailLogs().isEmpty()) {
+          LOG.debug("cache id {}'s detail log is empty! ", groupId);
+        }
{code}
This is inside of case {{ if (!appLogs.getDetailLogs().isEmpty()) }}, does it 
possible for detailLogs to become empty again? If not, we should remove it?

{code}
+          // Only refresh the log that matches the cache id
+          if (log.getFilename().contains(groupId.toString())) {
{code}
If we have two groupIds: 114859476_01_1 and 114859476_01_11, the later one's 
log file name can match with previous groupId as well? If so, we may consider 
to match file name with cache id more exactly? The same case with code below 
{{if (log.getFilename().contains(groupId.toString())) }}

{{appLogs.getDetailLogs().removeAll(removeList);}} should be move out of for 
each, or removeList will be useless here.

In EntityGroupFSTimelineStore.java,
{code}
+          + "%04d" + Path.SEPARATOR // app num / 10000000
{code}
I think it should be app num / 1000,000 (1 million) but not 10 millions.

{code}
+  private final Map<TimelineEntityGroupId, EntityCacheItem> cachedLogs
+      = Collections.synchronizedMap(
+      new LinkedHashMap<TimelineEntityGroupId, EntityCacheItem>(
+          appCacheMaxSize + 1, 0.75f, true) {
{code}
It should move cachedLogs initiaization into serviceInit after appCacheMaxSize 
is setup already.

{code}
+  @SuppressWarnings("serial")
+  @Override
+  protected void serviceInit(Configuration conf) throws Exception {
{code}
{{SuppressWarnings("serial")}} seems to be unnecessary?

In loadPlugIns(),
{code}
+      } catch (Exception e) {
+        LOG.info("Error loading plugin " + name, e);
+      }
{code}
It should use warn instead of info level for log message.

{code}
+      LOG.debug("Load plugin class {}", cacheIdPlugin.getClass().getName());
{code}
Better to use info instead of debug to follow the same practice with other 
messages in serviceInit.

For cleanLogs(Path dirpath), it seems like the execution result of cleanup log 
depends on the order of files/directories returned. Say an app dir include: 
file A, dir B, file A is a fresh one and all files in dir B are older than 
logRetainMillis. If file A get return first, the cleanLogs() do nothing, but if 
dir B get return first, cleanLogs() will clenup dir B. Give 
{{fs.listStatusIterator(dirpath)}} could return file A, dir B in randomly 
order, is this randomly behavior expected?

{code}
    // scans for new logs and returns the modification timestamp of the
    // most recently modified log
    @InterfaceAudience.Private
    @VisibleForTesting
    long scanForLogs() throws IOException {
{code}
Can we directly return appDirPath's modification time instead of go through all 
sub directories? Besides domain, summary, entity logs, anything else could 
exist under this directory?

{code}
      for (LogInfo log : summaryLogs) {
        if (log.getFilename().equals(filename)
            && log.getAttemptDirName().equals(attemptDirName)) {
          return;
        }
      }
{code}
Is it a common case for a AppLogs have many summaryLogs (and detail logs)? I 
guess not, but if my guess is wrong, we may should to use hashmap instead of 
list which costs to much time in search.

In LogInfo.parsePath(),
{code}
    try {
      in.seek(offset);
      try {
        parser = jsonFactory.createJsonParser(in);
        parser.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
      } catch (IOException e) {
        // if app hasn't completed then there may be errors due to the
        // incomplete file which are treated as EOF until app completes
        if (appCompleted) {
          throw e;
        } else {
          LOG.debug("Exception in parse path: {}", e.getMessage());
        }
      }
      return doParse(tdm, parser, objMapper, ugi, appCompleted);
{code}
Even we get rid of throw exception for app incompleted case, doParse() below 
will throw a NPE exception. Is this expected?

In PluginStoreTestUtils.java,
{code}
  static TimelineEntities generateTestEntities() {
        TimelineEntities entities = new TimelineEntities();
{code}
Indentation issue here.

> Provide new timeline plugin storage to support fine-grained entity caching
> --------------------------------------------------------------------------
>
>                 Key: YARN-4265
>                 URL: https://issues.apache.org/jira/browse/YARN-4265
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265-trunk.003.patch, YARN-4265.YARN-4234.001.patch, 
> YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to