[
https://issues.apache.org/jira/browse/YARN-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252048#comment-17252048
]
Szilard Nemeth commented on YARN-7200:
--------------------------------------
Hi [~akshink]
Thanks for working on this.
Some comments:
1. SchedulerMetrics#tearDown: Can you add a comment why the metricsLogBW is
nullified? I mean I got it from the comments here but I think it should have
been documented in the code as well.
2. SchedulerMetrics#tearDown: Can you add a LOG.info call to the beginning of
the method, just printing that the teardown has been started?
3. I can see that you added the new teardown + exit logic to
SLSCapacityScheduler / SLSFairScheduler, so when an APP_ATTEMPT_REMOVED event
is received and there are no remaining apps the schedulermetrics will tear down
and SLSRunner will exit.
It's a bit suspicious for me. What if there's a testcase that submits say 2
apps, waits some time (e.g. 10 seconds) then submits a new app.
In that 10 seconds time range, there will be no apps running (remanining) for
the SLSRunner, but we can't really say that if there's any point in time when
we have no apps that we immediately want to exit the SLS system and do the
teardown.
Can you make a testcase like this to prove that your change is able to cope
with this scenario?
4. Looking at your change again, I realized all the exit logic was centralized
in SLSRunner#decreaseRemainingApps before your patch:
{code}
public static void decreaseRemainingApps() {
remainingApps--;
if (remainingApps == 0) {
LOG.info("SLSRunner tears down.");
if (exitAtTheFinish) {
System.exit(0);
}
}
}
{code}
So in theory the case described at 3. can't happen, but I'm not yet convinced.
> SLS generates a realtimetrack.json file but that file is missing the closing
> ']'
> --------------------------------------------------------------------------------
>
> Key: YARN-7200
> URL: https://issues.apache.org/jira/browse/YARN-7200
> Project: Hadoop YARN
> Issue Type: Bug
> Components: scheduler-load-simulator
> Reporter: Grant Sohn
> Assignee: Agshin Kazimli
> Priority: Minor
> Labels: newbie, newbie++
> Attachments: YARN-7200-branch-trunk.patch, YARN-7200.002.patch,
> YARN-7200.003.patch, snemeth-testing-20201113.zip
>
>
> File
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java
> shows:
> {noformat}
> void tearDown() throws Exception {
> if (metricsLogBW != null) {
> metricsLogBW.write("]");
> metricsLogBW.close();
> }
> ....
> {noformat}
> So the exit logic is flawed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]