Elen Chatikyan created YARN-11666:
-------------------------------------

             Summary: NullPointerException in TestSLSRunner.testSimulatorRunning
                 Key: YARN-11666
                 URL: https://issues.apache.org/jira/browse/YARN-11666
             Project: Hadoop YARN
          Issue Type: Bug
         Environment: {*}Operating System{*}: macOS (Sanoma 14.2.1 (23C71))

{*}Hardware{*}: MacBook Air 2023

{*}IDE{*}: IntelliJ IDEA (2023.3.2 (Ultimate Edition))

{*}Java Version{*}: OpenJDK version "1.8.0_292"
            Reporter: Elen Chatikyan


*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}

{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}

{code}
 

{*}How to trigger this bug:{*}{*}{{*}}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
_______________________________________________________________________

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to