[
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elen Chatikyan updated YARN-11666:
----------------------------------
Attachment: add_test_cases.patch
reproduce.sh
> NullPointerException in TestSLSRunner.testSimulatorRunning
> ----------------------------------------------------------
>
> Key: YARN-11666
> URL: https://issues.apache.org/jira/browse/YARN-11666
> Project: Hadoop YARN
> Issue Type: Bug
> Environment: {*}Operating System{*}: macOS (Sanoma 14.2.1 (23C71))
> {*}Hardware{*}: MacBook Air 2023
> {*}IDE{*}: IntelliJ IDEA (2023.3.2 (Ultimate Edition))
> {*}Java Version{*}: OpenJDK version "1.8.0_292"
> Reporter: Elen Chatikyan
> Priority: Major
> Attachments: add_test_cases.patch, reproduce.sh
>
>
> *What happened:*
> In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load
> Scheduler) framework, a *NullPointerException* is thrown during the teardown
> process of parameterized tests. This exception is thrown when the stop method
> is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This
> issue occurs under test conditions that involve mismatches between trace
> types (RUMEN, SLS, SYNTH) and their corresponding trace files, leading to
> scenarios where the rm object may not be properly initialized before the stop
> method is invoked.
>
> *Buggy code:*
> The issue is located in the
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
> file within the *{{stop}}* method:
> {code:java}
> public void stop() {
> rm.stop();
> }
> {code}
> The root cause of the *{{NullPointerException}}* is the lack of a null check
> for the {{rm}} object before calling its {{stop}} method. Under any condition
> where the *{{ResourceManager}}* fails to initialize correctly, attempting to
> stop the *{{ResourceManager}}* leads to a null pointer dereference.
>
> After fixing in
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
> ,
> [TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
> should also be fixed.
> [TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
> {code:java}
> public void stop() throws InterruptedException {
> executor.shutdownNow();
> executor.awaitTermination(20, TimeUnit.SECONDS);
> }
> {code}
>
> *How to trigger this bug:*
> {color:#00875a}*you can use the attachments(reproduce.sh and ) to easily
> reproduce the bug{color}
> * Change the parameterized unit test's(TestSLSRunner.java) data method to
> include one/both of the following test cases:
> * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
> * {capScheduler, "SYNTH", slsTraceFile, nodeFile }
> * Execute the *TestSLSRunner* test suite, particularly the
> *testSimulatorRunning* method.
> * Observe the resulting *NullPointerException* in the test output(triggered
> in RMRunner.java).
>
> {panel:title=Example stack trace from the test output:}
> [ERROR] testSimulatorRunning[Testing with: SYNTH,
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
> (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed:
> 3.027 s <<< ERROR!
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
> at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
> at
> org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
> ...
> {panel}
>
>
> *How To Fix*
> _{color:#172b4d}The bug can be fixed by implementing a null check for the
> {{rm}} object within the
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
> {{stop}} method before calling any methods on it.(same for executor object
> in TaskRunner.java){color}_
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]