[
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253589#comment-17253589
]
Szilard Nemeth commented on YARN-10427:
---------------------------------------
Hi [~werd.up],
Thanks for reporting this issue and congratulations for the first reported
Hadoop YARN jira.
{quote}In the process of attempting to verify and validate the SLS output, I've
encountered a number of issues including runtime exceptions and bad output.
{quote}
I read through your observations and spent some time to play around with SLS.
If you encountered other issues, please report other jiras if you have some
time.
As the process of running SLS involved some repetitive tasks like uploading
configs to the remote machine, launch SLS, save the resulted logs..., I created
some scripts into my public Github repo here:
[https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427]
Let me break summarize what are these scripts are doing:
1. [config
dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/config]:
This is the exact same configuration file set that you attached to this jira,
with one exception of the log4j.properties file, that turns on DEBUG logging
for SLS.
2. [upstream-patches
dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/upstream-patches]:
This is the directory of the logging patch that helped me see the issues more
clearly.
My code changes are also pushed to my Hadoop fork:
[https://github.com/szilard-nemeth/hadoop/tree/YARN-10427-investigation]
3. [scripts
dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts]:
This is the directory that contains all my scripts to build Hadoop + launch
SLS and save produced logs to the local machine.
As I have been working on a remote cluster, there's a script called
[setup-vars-upstream.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/setup-vars-upstream.sh]
that contains some configuration values for the remote cluster + some local
directories. If you want to use the scripts, all you need to do is to replace
the configs in this file according to your environment.
3.1
[build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh]:
This is the script that builds Hadoop according to the environment variables
and launches the SLS suite on the remote cluster.
3.2
[start-sls.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/start-sls.sh]:
This is the most important script as this will be executed on the remote
machine.
I think the script itself is straightforward enough, but let me briefly list
what it does:
- This script assumes that the Hadoop dist package is copied to the remote
machine (this was done by
[build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh])
- Cleans up all Hadoop-related directories and extracts the Hadoop dist tar.gz
- Copies the config to Hadoop's config dirs so SLS will use these particular
configs
- Launches SLS by starting slsrun.sh with the appropriate CLI swithces
- Greps for some useful data in the resulted SLS log file.
3.3
[launch-sls.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/launch-sls.sh]:
This script is executed by
[build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh]
as its last step. Once the start-sls.sh is finished, the
[save-latest-sls-logs.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/save-latest-sls-logs.sh]
script is started. As the name implies it saves the latest SLS log dir and
SCPs it to the local machine. The target directory of the local machine is
determined by the config
([setup-vars-upstream.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/setup-vars-upstream.sh]).
*The latest logs and grepped logs for the SLS run is saved to my repo
[here.|https://github.com/szilard-nemeth/linux-env/tree/96ed3d8af9f4677866652bb57153713b29f24a98/workplace-specific/cloudera/investigations/YARN-10427/latest-logs/slsrun-out-20201222_040513]*
h2. What causes the duplicate Job IDs
1. The jobruntime.csv file is being written with class SchedulerMetrics, you
can see the init part
[here|https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java#L180-L186].
2. The jobruntime records (lines of CSV file) are written with method
[SchedulerMetrics#addAMRuntime|https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java#L661-L674].
We only need to check the call hierarchy of this method to reveal the reason
of duplicate application IDs.
*2.1 Call hierarchy #1 (From bottom to top):*
{code:java}
org.apache.hadoop.yarn.sls.scheduler.SchedulerMetrics#addAMRuntime
org.apache.hadoop.yarn.sls.appmaster.AMSimulator#lastStep
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator#lastStep
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator#processResponseQueue{code}
*2.2 Call hierarchy #2 (From bottom to top):*
{code:java}
org.apache.hadoop.yarn.sls.scheduler.SchedulerMetrics#addAMRuntime
org.apache.hadoop.yarn.sls.appmaster.AMSimulator#lastStep
org.apache.hadoop.yarn.sls.scheduler.TaskRunner.Task#run
{code}
3. These duplicate calls of MRAMSimulator#lastStep can be easily justified with
the logs as well.
[apps-shuttingdown.log|https://github.com/szilard-nemeth/linux-env/blob/0d41e4dbda5e3a22105c4fe27f540ae8004857fe/workplace-specific/cloudera/investigations/YARN-10427/latest-logs/slsrun-out-20201222_040513/grepped/apps-shuttingdown.log]
In this logfile, it's clearly visible that 9 apps
(application_1608638719822_0001 - application_1608638719822_0009) are "shutting
down" 2 times.
This is because the MRAMSimulator#lastStep is called twice.
As MRAMSimulator#lastStep calls
org.apache.hadoop.yarn.sls.appmaster.AMSimulator#lastStep (super method), I
added some logging that prints the stacktrace of lastStep method calls:
[AMSimulator#lastStep|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java#L223-L225].
Let's take application_1608638719822_0001 as an example with this file:
[laststep-calls-for-app0001.log|https://github.com/szilard-nemeth/linux-env/blob/96ed3d8af9f4677866652bb57153713b29f24a98/workplace-specific/cloudera/investigations/YARN-10427/latest-logs/slsrun-out-20201222_040513/laststep-calls-for-app0001.log]
4. Checking the 2 stacktraces:
*4.1 Stacktrace #1: Call to lastStep from MRAMSimulator#processResponseQueue,
when all mappers/reducers are finished:*
{code:java}
at
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.lastStep(AMSimulator.java:224)
at
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.lastStep(MRAMSimulator.java:401)
at
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.processResponseQueue(MRAMSimulator.java:195)
at
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212)
at
org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:101)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
[TaskRunner$Task.run|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L101]
calls AMSimulator#middleStep.
Then, in
[MRAMSimulator.processResponseQueue|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java#L194-L196],
there's a code piece that checks for completed mappers and reducers.
If the finished mappers are greater than or equal to all mappers and same with
reducers, the lastStep will be called.
{code:java}
if (mapFinished >= mapTotal && reduceFinished >= reduceTotal) {
lastStep();
}
{code}
*Stacktrace #2: Call to lastStep from
[TaskRunner$Task.run|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L89-L113]*
{code:java}
at
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.lastStep(AMSimulator.java:224)
at
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.lastStep(MRAMSimulator.java:401)
at
org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:106)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
According my code inspections, all NMs and AMs are scheduled with this
TaskRunner from SLSRunner.
The call hierarchy of a launch of an AM is this (from bottom to top):
TaskRunner.schedule(Task) (org.apache.hadoop.yarn.sls.scheduler)
{code:java}
SLSRunner.runNewAM(String, String, String, String, long, long,
List<ContainerSimulator>, ...) (org.apache.hadoop.yarn.sls)
SLSRunner.runNewAM(String, String, String, String, long, long,
List<ContainerSimulator>, ...) (org.apache.hadoop.yarn.sls)
SLSRunner.createAMForJob(Map) (org.apache.hadoop.yarn.sls)
SLSRunner.startAMFromSLSTrace(String) (org.apache.hadoop.yarn.sls)
SLSRunner.startAM() (org.apache.hadoop.yarn.sls)
SLSRunner.start() (org.apache.hadoop.yarn.sls)
SLSRunner.run(String[]) (org.apache.hadoop.yarn.sls){code}
As an implementation of the AM is class of
[AMSimulator|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java]
that extends TaskRunner.Task, that implements the Runnable interface, all
interesting things are happening in
[org.apache.hadoop.yarn.sls.scheduler.TaskRunner.Task#run|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L89-L113].
Initially, the field _nextTime_ is equal to _startTime_, so the firstStep
method is invoked.
For subsequent calls of run and while _nextRun_ < _endTime_, middleStep is
executed.
The field called '_nextRun_' is always incremented with the value of
_repeatInterval_ (which is 1000ms with the default config).
This means that all AMSimulator tasks are getting scheduled in every second.
Once '_nextRun_' reaches '_endTime_' (it becomes greater) then lastStep will
be called.
h2. Conclusion for duplicate Job IDs
These 2 calls to lastStep are the main reason of the duplicate applicationID in
the jobruntime.csv file.
It's not trivial for me why this lastStep method is invoked through
[AMSimulator#middleStep|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java#L209]
and ultimately through
[AMSimulator#processResponseQueue|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java#L212]
and from the main loop of the TaskRunner$Task.
*I suppose this method should be invoked only once per AM!*
What is even more interesting that 9 out of 10 apps had this method called
twice according to this log file:
[apps-shuttingdown.log|https://github.com/szilard-nemeth/linux-env/blob/0d41e4dbda5e3a22105c4fe27f540ae8004857fe/workplace-specific/cloudera/investigations/YARN-10427/latest-logs/slsrun-out-20201222_040513/grepped/apps-shuttingdown.log]
.
But for the last application it is only called once:
{code:java}
2020-12-22 04:09:47,892 INFO appmaster.AMSimulator: Application
application_1608638719822_0010 is shutting down. lastStep Stacktrace
{code}
All I can see is that the only call to lastStep for app 0010 is this:
(This is from [log
file|https://raw.githubusercontent.com/szilard-nemeth/linux-env/master/workplace-specific/cloudera/investigations/YARN-10427/latest-logs/slsrun-out-20201222_040513/output.log])
{code:java}
2020-12-22 04:09:47,892 INFO appmaster.AMSimulator: Application
application_1608638719822_0010 is shutting down. lastStep Stacktrace
java.lang.Exception
at
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.lastStep(AMSimulator.java:224)
at
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.lastStep(MRAMSimulator.java:401)
at
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.processResponseQueue(MRAMSimulator.java:195)
at
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212)
at
org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:101)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
_*This is the call from MRAMSimulator.processResponseQueue that verifies the
number of completed mappers/reducers.*_
_*The other call that checks the timestamps in TaskRunner$Task.run is not
called, meaning that the last application never reaches its intended running
time.*_
_*This could be counted as "another bug", but unfortunately I wasn't be able
to find out why this anomaly happens.*_
h2. Other observations
If I grep for any container ID that belongs to any of the 9 applications that
had duplicate Job IDs in the jobruntime.csv file, each of the apps have a log
record like this in the output.log:
{code:java}
2020-12-22 04:07:11,980 INFO scheduler.AbstractYarnScheduler: Container
container_1608638719822_0001_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
{code}
[See an example
here|https://github.com/szilard-nemeth/linux-env/blob/96ed3d8af9f4677866652bb57153713b29f24a98/workplace-specific/cloudera/investigations/YARN-10427/latest-logs/slsrun-out-20201222_040513/grepped/container_1608638719822_0001_01_000001.log#L32]
I think this is also happening because of the duplicate call to the lastStep
method.
h2. Possible fix for duplicate Job IDs
The task is to prevent lastStep to be called twice.
Without understanding the reason of the two calls above and the potential
side-effects of the removal of any of these calls, let's check what lastStep
does.
The implementation of lastStep for MRAMSimulator delegates to the superclass:
[AMSimulator#lastStep|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java#L222-L273].
*There are several things happening in this method:*
- App is unregistered / untracked.
- If the amContainer is not null, the NM of the AM will be notified and the AM
container will be marked as completed
[here|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java#L231-L238]
- The AM is unregistered from the RM
[here|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java#L246-L263].
- The finish time of the AM is set, this is the only write access of this
field:
[here|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java#L265].
- The job's runtime information will be persisted to the jobruntime.csv file
[here|https://github.com/szilard-nemeth/hadoop/blob/10d9d9ff3446583b3b2b6e4518ad0c3ea335da48/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java#L266-L272].
*I think all of these actions must be prevented to be called more than once!*
As there are only one update of a field in the lastStep method, without
introducing a new boolean flag to track if lastStep was called or not, a quick
and dirty solution is to check if the
_org.apache.hadoop.yarn.sls.appmaster.AMSimulator#simulateFinishTimeMS_ field
is modified (i.e. greater then zero, which is the default value of long
fields). As the only writer of this field is one write occurrence from the
lastStep method, it's safe to check this. If it is non-zero or greater than
zero, it means lastStep was called before.
h2. Test run with the fix
The fix patch is added
[here|https://github.com/szilard-nemeth/linux-env/blob/9bd94311a900b79764d2ee26db16aed312a7fff7/workplace-specific/cloudera/investigations/YARN-10427/upstream-patches/0002-YARN-10427-Prevent-second-call-of-AMSimulator-lastSt.patch]
It is also uploaded as an attachment to this jira, as a candidate for commit
as I think it's a proper fix.
The logs of the "fixed run" can be found here:
[https://github.com/szilard-nemeth/linux-env/tree/9bd94311a900b79764d2ee26db16aed312a7fff7/workplace-specific/cloudera/investigations/YARN-10427/fixed-logs]
1. The shutting down messages for applications look way better, there's only 10
messages and 10 apps, which is correct:
[apps-shuttingdown.log|https://github.com/szilard-nemeth/linux-env/blob/master/workplace-specific/cloudera/investigations/YARN-10427/fixed-logs/grepped/apps-shuttingdown.log]
2. The
[jobruntime.csv|https://github.com/szilard-nemeth/linux-env/blob/9bd94311a900b79764d2ee26db16aed312a7fff7/workplace-specific/cloudera/investigations/YARN-10427/fixed-logs/jobruntime.csv]
file also looks good. There's one entry per application now.
3. In the
[output.log|https://github.com/szilard-nemeth/linux-env/blob/9bd94311a900b79764d2ee26db16aed312a7fff7/workplace-specific/cloudera/investigations/YARN-10427/fixed-logs/output.log]
file, there are still weird messages when the AM container is finished, for
all the apps:
{code:java}
root@snemeth-fips2-1 slsrun-out-20201222_063242]# grep "but corresponding
RMContainer doesn't exist" output.log
2020-12-22 06:34:40,315 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0002_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
2020-12-22 06:34:41,315 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0001_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
2020-12-22 06:35:05,315 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0003_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
2020-12-22 06:35:10,315 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0005_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
2020-12-22 06:35:30,315 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0006_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
2020-12-22 06:36:04,315 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0009_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
2020-12-22 06:36:04,373 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0008_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
2020-12-22 06:36:20,315 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0004_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
2020-12-22 06:36:26,315 INFO scheduler.AbstractYarnScheduler: Container
container_1608647568797_0007_01_000001 completed with event FINISHED, but
corresponding RMContainer doesn't exist.
{code}
So in contrary to my expectations, this is not because of the double-call of
lastStep.
> Duplicate Job IDs in SLS output
> -------------------------------
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
> Issue Type: Bug
> Components: scheduler-load-simulator
> Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also
> tested against 3.2.1 and 3.3.0 release branches.
>
> Reporter: Drew Merrill
> Assignee: Szilard Nemeth
> Priority: Major
> Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv,
> jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've
> been having with the YARN Scheduler Load Simulator (SLS). I've been
> experimenting with SLS for several months now at work as we're trying to
> build a simulation model to characterize our enterprise Hadoop infrastructure
> for purposes of future capacity planning. In the process of attempting to
> verify and validate the SLS output, I've encountered a number of issues
> including runtime exceptions and bad output. The focus of this issue is the
> bad output. In all my simulation runs, the jobruntime.csv output seems to
> have one or more of the following problems: no output, duplicate job ids,
> and/or missing job ids.
>
> Because of where I work, I'm unable to provide the exact inputs I typically
> use, but I'm able to reproduce the problem of the duplicate Job IDS using
> some simplified inputs and configuration files, which I've attached, along
> with the output I obtained.
>
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json
> --output-dir=sls-run-1 --print-simulation
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>
> Can anyone help me understand what would cause the duplicate Job IDs in the
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in
> advance.
>
> PS: This is my first issue I've ever opened so please be kind if I've missed
> something or am not understanding something obvious about the way Hadoop
> works. I'll gladly follow-up with more info as requested.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]