[ 
https://issues.apache.org/jira/browse/YARN-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252048#comment-17252048
 ] 

Szilard Nemeth commented on YARN-7200:
--------------------------------------

Hi [~akshink]

Thanks for working on this.

Some comments:

1. SchedulerMetrics#tearDown: Can you add a comment why the metricsLogBW is 
nullified? I mean I got it from the comments here but I think it should have 
been documented in the code as well.

2. SchedulerMetrics#tearDown: Can you add a LOG.info call to the beginning of 
the method, just printing that the teardown has been started?

3. I can see that you added the new teardown + exit logic to 
SLSCapacityScheduler / SLSFairScheduler, so when an APP_ATTEMPT_REMOVED event 
is received and there are no remaining apps the schedulermetrics will tear down 
and SLSRunner will exit.
It's a bit suspicious for me. What if there's a testcase that submits say 2 
apps, waits some time (e.g. 10 seconds) then submits a new app.
In that 10 seconds time range, there will be no apps running (remanining) for 
the SLSRunner, but we can't really say that if there's any point in time when 
we have no apps that we immediately want to exit the SLS system and do the 
teardown.
Can you make a testcase like this to prove that your change is able to cope 
with this scenario?

4. Looking at your change again, I realized all the exit logic was centralized 
in SLSRunner#decreaseRemainingApps before your patch:
{code}

public static void decreaseRemainingApps() {
    remainingApps--;

    if (remainingApps == 0) {
      LOG.info("SLSRunner tears down.");
      if (exitAtTheFinish) {
        System.exit(0);
      }
    }
 }
{code}
So in theory the case described at 3. can't happen, but I'm not yet convinced.

> SLS generates a realtimetrack.json file but that file is missing the closing 
> ']'
> --------------------------------------------------------------------------------
>
>                 Key: YARN-7200
>                 URL: https://issues.apache.org/jira/browse/YARN-7200
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler-load-simulator
>            Reporter: Grant Sohn
>            Assignee: Agshin Kazimli
>            Priority: Minor
>              Labels: newbie, newbie++
>         Attachments: YARN-7200-branch-trunk.patch, YARN-7200.002.patch, 
> YARN-7200.003.patch, snemeth-testing-20201113.zip
>
>
> File 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java
>  shows:
> {noformat}
>   void tearDown() throws Exception {
>     if (metricsLogBW != null)  {
>       metricsLogBW.write("]");
>       metricsLogBW.close();
>     }
>     ....
> {noformat}
> So the exit logic is flawed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to