[ https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Elen Chatikyan updated YARN-11666: ---------------------------------- Attachment: add_test_cases.patch reproduce.sh > NullPointerException in TestSLSRunner.testSimulatorRunning > ---------------------------------------------------------- > > Key: YARN-11666 > URL: https://issues.apache.org/jira/browse/YARN-11666 > Project: Hadoop YARN > Issue Type: Bug > Environment: {*}Operating System{*}: macOS (Sanoma 14.2.1 (23C71)) > {*}Hardware{*}: MacBook Air 2023 > {*}IDE{*}: IntelliJ IDEA (2023.3.2 (Ultimate Edition)) > {*}Java Version{*}: OpenJDK version "1.8.0_292" > Reporter: Elen Chatikyan > Priority: Major > Attachments: add_test_cases.patch, reproduce.sh > > > *What happened:* > In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load > Scheduler) framework, a *NullPointerException* is thrown during the teardown > process of parameterized tests. This exception is thrown when the stop method > is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This > issue occurs under test conditions that involve mismatches between trace > types (RUMEN, SLS, SYNTH) and their corresponding trace files, leading to > scenarios where the rm object may not be properly initialized before the stop > method is invoked. > > *Buggy code:* > The issue is located in the > {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}} > file within the *{{stop}}* method: > {code:java} > public void stop() { > rm.stop(); > } > {code} > The root cause of the *{{NullPointerException}}* is the lack of a null check > for the {{rm}} object before calling its {{stop}} method. Under any condition > where the *{{ResourceManager}}* fails to initialize correctly, attempting to > stop the *{{ResourceManager}}* leads to a null pointer dereference. > > After fixing in > {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}} > , > [TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169] > should also be fixed. > [TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169] > {code:java} > public void stop() throws InterruptedException { > executor.shutdownNow(); > executor.awaitTermination(20, TimeUnit.SECONDS); > } > {code} > > *How to trigger this bug:* > {color:#00875a}*you can use the attachments(reproduce.sh and ) to easily > reproduce the bug{color} > * Change the parameterized unit test's(TestSLSRunner.java) data method to > include one/both of the following test cases: > * {capScheduler, "SYNTH", rumenTraceFile, nodeFile } > * {capScheduler, "SYNTH", slsTraceFile, nodeFile } > * Execute the *TestSLSRunner* test suite, particularly the > *testSimulatorRunning* method. > * Observe the resulting *NullPointerException* in the test output(triggered > in RMRunner.java). > > {panel:title=Example stack trace from the test output:} > [ERROR] testSimulatorRunning[Testing with: SYNTH, > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler, > (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: > 3.027 s <<< ERROR! > java.lang.NullPointerException > at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127) > at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320) > at > org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68) > ... > {panel} > > > *How To Fix* > _{color:#172b4d}The bug can be fixed by implementing a null check for the > {{rm}} object within the > {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}} > {{stop}} method before calling any methods on it.(same for executor object > in TaskRunner.java){color}_ -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org