[ 
https://issues.apache.org/jira/browse/YARN-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437953#comment-16437953
 ] 

Kanwaljeet Sachdev commented on YARN-7088:
------------------------------------------

Thanks [~haibochen] for the comments. Here are my responses
{quote}We are generating the timestamp when RMAppImpl is notified of the 
launch. But the correct timestamp should come from 
RMAppAttemptEventType.LAUNCHED that is generated by AMLauncher. If the event 
dispatch thread is falling back, there could be a large gap between when 
RMAppImpl is notified and when the attempt is actually launched.
{quote}
I have addressed the above by adding a new constructor to update the timeStamp 
in the data members of _RMAppEvent and RMAppAttemptEvent_ classes.

 
{quote}I see two transitions are added to RMAPPImpl state machine, namely 
ACCEPTED --> ACCEPTED and RUNNING --> RUNNING upon the new 
RMAppEventType.ATTEMPT_LAUNCHED....
{quote}
I have addressed this by removing the unwanted transition handling state change 
of RUNING --> RUNNING

 
{quote}Some minor comments: there is an unused variable you added in 
AMLaunchedTransition; the log in AppRunningOnAppAttemptLaunchTransition  is too 
generic. Can we add application id, attempt id and the timestamp?

The change in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
 is unnecessary as you have decided not to change for ATS (all versions). 
{quote}
Done as suggested

 
{quote}The schedulingWaitTime in AppInfo is misleading if one application has 
multiple attempts. Because the launch time is updated everytime a new attempt 
is launched, we can rename it to the launch time of the latest attempt. If 
there are mutiple attempts, the running time of previous attempts would be 
counted as schedulingwait time, which is not correct.
{quote}
I update the launchTime only if it is non zero so I guess we are fine here?

 

Four tests failed in the last run. Three are from 
*TestMRTimelineEventHandling*. I verified that they fail even without my 
changes and if I increase the timeout of the below snippet from 30 seconds, 
they pass, so my guess is they are flaky and not related to my changes.

 
{code:java}
public static RunningJob runJobSucceed(JobConf conf, Path inDir, Path outDir)
       throws IOException {
  conf.setJobName("test-job-succeed");
  conf.setMapperClass(IdentityMapper.class);
  conf.setReducerClass(IdentityReducer.class);
  
  RunningJob job = UtilsForTests.runJob(conf, inDir, outDir);
  long sleepCount = 0;
  while (!job.isComplete()) {
    try {
      if (sleepCount > 300) { // 30 seconds
        throw new IOException("Job didn't finish in 30 seconds");
      }
      Thread.sleep(100);
      sleepCount++;
    } catch (InterruptedException e) {
      break;
    }
  }
{code}
The other test 
*org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange*
 shows the same trace as reported by [~rchiang] in 
https://issues.apache.org/jira/browse/YARN-6272

 

 

> Fix application start time and add submit time to UIs
> -----------------------------------------------------
>
>                 Key: YARN-7088
>                 URL: https://issues.apache.org/jira/browse/YARN-7088
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Abdullah Yousufi
>            Assignee: Kanwaljeet Sachdev
>            Priority: Major
>         Attachments: YARN-7088.001.patch, YARN-7088.002.patch, 
> YARN-7088.003.patch, YARN-7088.004.patch, YARN-7088.005.patch, 
> YARN-7088.006.patch, YARN-7088.007.patch, YARN-7088.008.patch, 
> YARN-7088.009.patch, YARN-7088.010.patch, YARN-7088.011.patch, 
> YARN-7088.012.patch, YARN-7088.013.patch, YARN-7088.014.patch, 
> YARN-7088.015.patch
>
>
> Currently, the start time in the old and new UI actually shows the app 
> submission time. There should actually be two different fields; one for the 
> app's submission and one for its start, as well as the elapsed pending time 
> between the two.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to