[ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
----------------------------------
    Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
*{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}*
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_


> NullPointerException in TestSLSRunner.testSimulatorRunning
> ----------------------------------------------------------
>
>                 Key: YARN-11666
>                 URL: https://issues.apache.org/jira/browse/YARN-11666
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: {*}Operating System{*}: macOS (Sanoma 14.2.1 (23C71))
> {*}Hardware{*}: MacBook Air 2023
> {*}IDE{*}: IntelliJ IDEA (2023.3.2 (Ultimate Edition))
> {*}Java Version{*}: OpenJDK version "1.8.0_292"
>            Reporter: Elen Chatikyan
>            Priority: Major
>
> *What happened:* 
> In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
> Scheduler) framework, a *NullPointerException* is thrown during the teardown 
> process of parameterized tests. This exception is thrown when the stop method 
> is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This 
> issue occurs under test conditions that involve mismatches between trace 
> types (RUMEN, SLS, SYNTH) and their corresponding trace files, leading to 
> scenarios where the rm object may not be properly initialized before the stop 
> method is invoked.
>  
> *Buggy code:*
> The issue is located in the 
> *{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}*
>  file within the *{{stop}}* method:
> {code:java}
> public void stop() {
>   rm.stop();
> }
> {code}
> The root cause of the *{{NullPointerException}}* is the lack of a null check 
> for the {{rm}} object before calling its {{stop}} method. Under any condition 
> where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
> stop the *{{ResourceManager}}* leads to a null pointer dereference.
>  
> After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.
> +TaskRunner.java+
> {code:java}
> public void stop() throws InterruptedException {
>   executor.shutdownNow();
>   executor.awaitTermination(20, TimeUnit.SECONDS);
> }
> {code}
>  
> *How to trigger this bug:*
>  * Change the parameterized unit test's(TestSLSRunner.java) data method to 
> include one/both of the following test cases:
>  * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
>  * {capScheduler, "SYNTH", slsTraceFile, nodeFile }
>  * Execute the *TestSLSRunner* test suite, particularly the 
> *testSimulatorRunning* method.
>  * Observe the resulting *NullPointerException* in the test output(triggered 
> in RMRunner.java).
>  
> {panel:title=Example stack trace from the test output:}
> [ERROR] testSimulatorRunning[Testing with: SYNTH, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
>  (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 
> 3.027 s <<< ERROR!
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
> at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
> at 
> org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
> ...
> {panel}
>  
>  
> *How To Fix*
> _{color:#172b4d}The bug can be fixed by implementing a null check for the 
> {{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling 
> any methods on it.(same for executor object in TaskRunner.java){color}_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to