[ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
----------------------------------
    Attachment: add_test_cases.patch
                reproduce.sh

> NullPointerException in TestSLSRunner.testSimulatorRunning
> ----------------------------------------------------------
>
>                 Key: YARN-11666
>                 URL: https://issues.apache.org/jira/browse/YARN-11666
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: {*}Operating System{*}: macOS (Sanoma 14.2.1 (23C71))
> {*}Hardware{*}: MacBook Air 2023
> {*}IDE{*}: IntelliJ IDEA (2023.3.2 (Ultimate Edition))
> {*}Java Version{*}: OpenJDK version "1.8.0_292"
>            Reporter: Elen Chatikyan
>            Priority: Major
>         Attachments: add_test_cases.patch, reproduce.sh
>
>
> *What happened:* 
> In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
> Scheduler) framework, a *NullPointerException* is thrown during the teardown 
> process of parameterized tests. This exception is thrown when the stop method 
> is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This 
> issue occurs under test conditions that involve mismatches between trace 
> types (RUMEN, SLS, SYNTH) and their corresponding trace files, leading to 
> scenarios where the rm object may not be properly initialized before the stop 
> method is invoked.
>  
> *Buggy code:*
> The issue is located in the 
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
>  file within the *{{stop}}* method:
> {code:java}
> public void stop() {
>   rm.stop();
> }
> {code}
> The root cause of the *{{NullPointerException}}* is the lack of a null check 
> for the {{rm}} object before calling its {{stop}} method. Under any condition 
> where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
> stop the *{{ResourceManager}}* leads to a null pointer dereference.
>  
> After fixing in 
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
>  , 
> [TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
>  should also be fixed.
> [TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
> {code:java}
> public void stop() throws InterruptedException {
>   executor.shutdownNow();
>   executor.awaitTermination(20, TimeUnit.SECONDS);
> }
> {code}
>  
> *How to trigger this bug:*
> {color:#00875a}*you can use the attachments(reproduce.sh and ) to easily 
> reproduce the bug{color}
>  * Change the parameterized unit test's(TestSLSRunner.java) data method to 
> include one/both of the following test cases:
>  * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
>  * {capScheduler, "SYNTH", slsTraceFile, nodeFile }
>  * Execute the *TestSLSRunner* test suite, particularly the 
> *testSimulatorRunning* method.
>  * Observe the resulting *NullPointerException* in the test output(triggered 
> in RMRunner.java).
>  
> {panel:title=Example stack trace from the test output:}
> [ERROR] testSimulatorRunning[Testing with: SYNTH, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
>  (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 
> 3.027 s <<< ERROR!
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
> at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
> at 
> org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
> ...
> {panel}
>  
>  
> *How To Fix*
> _{color:#172b4d}The bug can be fixed by implementing a null check for the 
> {{rm}} object within the 
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
>  {{stop}} method before calling any methods on it.(same for executor object 
> in TaskRunner.java){color}_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to