[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 , 
[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
 should also be fixed.

[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

{color:#505f79}_*you can use the attachments([^reproduce.sh] which uses 
[^add_test_cases.patch]patch) to easily reproduce the bug_{color}
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 {{stop}} method before calling any methods on it.(same for executor object in 
[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Attachment: add_test_cases.patch
reproduce.sh

> NullPointerException in TestSLSRunner.testSimulatorRunning
> --
>
> Key: YARN-11666
> URL: https://issues.apache.org/jira/browse/YARN-11666
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: {*}Operating System{*}: macOS (Sanoma 14.2.1 (23C71))
> {*}Hardware{*}: MacBook Air 2023
> {*}IDE{*}: IntelliJ IDEA (2023.3.2 (Ultimate Edition))
> {*}Java Version{*}: OpenJDK version "1.8.0_292"
>Reporter: Elen Chatikyan
>Priority: Major
> Attachments: add_test_cases.patch, reproduce.sh
>
>
> *What happened:* 
> In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
> Scheduler) framework, a *NullPointerException* is thrown during the teardown 
> process of parameterized tests. This exception is thrown when the stop method 
> is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This 
> issue occurs under test conditions that involve mismatches between trace 
> types (RUMEN, SLS, SYNTH) and their corresponding trace files, leading to 
> scenarios where the rm object may not be properly initialized before the stop 
> method is invoked.
>  
> *Buggy code:*
> The issue is located in the 
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
>  file within the *{{stop}}* method:
> {code:java}
> public void stop() {
>   rm.stop();
> }
> {code}
> The root cause of the *{{NullPointerException}}* is the lack of a null check 
> for the {{rm}} object before calling its {{stop}} method. Under any condition 
> where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
> stop the *{{ResourceManager}}* leads to a null pointer dereference.
>  
> After fixing in 
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
>  , 
> [TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
>  should also be fixed.
> [TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
> {code:java}
> public void stop() throws InterruptedException {
>   executor.shutdownNow();
>   executor.awaitTermination(20, TimeUnit.SECONDS);
> }
> {code}
>  
> *How to trigger this bug:*
> {color:#00875a}*you can use the attachments(reproduce.sh and ) to easily 
> reproduce the bug{color}
>  * Change the parameterized unit test's(TestSLSRunner.java) data method to 
> include one/both of the following test cases:
>  * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
>  * {capScheduler, "SYNTH", slsTraceFile, nodeFile }
>  * Execute the *TestSLSRunner* test suite, particularly the 
> *testSimulatorRunning* method.
>  * Observe the resulting *NullPointerException* in the test output(triggered 
> in RMRunner.java).
>  
> {panel:title=Example stack trace from the test output:}
> [ERROR] testSimulatorRunning[Testing with: SYNTH, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
>  (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 
> 3.027 s <<< ERROR!
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
> at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
> at 
> org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
> ...
> {panel}
>  
>  
> *How To Fix*
> _{color:#172b4d}The bug can be fixed by implementing a null check for the 
> {{rm}} object within the 
> {{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
>  {{stop}} method before calling any methods on it.(same for executor object 
> in TaskRunner.java){color}_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 , 
[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
 should also be fixed.

[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*

{color:#00875a}*you can use the attachments(reproduce.sh and ) to easily 
reproduce the bug{color}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 {{stop}} method before calling any methods on it.(same for executor object in 
TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 , 
[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
 should also be fixed.

[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 {{stop}} method before calling any methods on it.(same for executor object in 
TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 , 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in 
{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}
 , 
+[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]+
 should also be fixed.

+[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
*{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}*
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in 
*{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}*
 , 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
*{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}*
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in 
*{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}*
 , 
+[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]+
 should also be fixed.

+[TaskRunner.java|https://github.com/apache/hadoop/blob/12a26d8b1987e883efab00c25a0594512527bd1f/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L169]+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
*{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}*
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

*Buggy code:*

The issue is located in the 
*{{[RMRunner.java|https://github.com/apache/hadoop/blob/8b2058a4e755b8ebc081ac67b1b582dd2945e3c6/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RMRunner.java#L126]}}*
 file within the *{{stop}}* method:
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

*How to trigger this bug:*
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

{*}How to trigger this bug:{*}{*}{*}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

{*}How to trigger this bug:{*}{*}{*}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

{*}How to trigger this bug:{*}{*}{{*}}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

{*}How to trigger this bug:{*}{*}{{*}}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
*How To Fix*

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

{*}How to trigger this bug:{*}{*}{{*}}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 

[jira] [Updated] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elen Chatikyan updated YARN-11666:
--
Description: 
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}
{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}
{code}
 

{*}How to trigger this bug:{*}{*}{{*}}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
___

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_

  was:
*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}

{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}

{code}
 

{*}How to trigger this bug:{*}{*}{{*}}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at 

[jira] [Created] (YARN-11666) NullPointerException in TestSLSRunner.testSimulatorRunning

2024-03-19 Thread Elen Chatikyan (Jira)
Elen Chatikyan created YARN-11666:
-

 Summary: NullPointerException in TestSLSRunner.testSimulatorRunning
 Key: YARN-11666
 URL: https://issues.apache.org/jira/browse/YARN-11666
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: {*}Operating System{*}: macOS (Sanoma 14.2.1 (23C71))

{*}Hardware{*}: MacBook Air 2023

{*}IDE{*}: IntelliJ IDEA (2023.3.2 (Ultimate Edition))

{*}Java Version{*}: OpenJDK version "1.8.0_292"
Reporter: Elen Chatikyan


*What happened:* 

In the *TestSLSRunner* class of the Apache Hadoop YARN SLS (Simulated Load 
Scheduler) framework, a *NullPointerException* is thrown during the teardown 
process of parameterized tests. This exception is thrown when the stop method 
is called on the ResourceManager (rm) object in {_}RMRunner.java{_}. This issue 
occurs under test conditions that involve mismatches between trace types 
(RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios 
where the rm object may not be properly initialized before the stop method is 
invoked.

 

 

*Buggy code:*

The issue is located in the *{{RMRunner.java}}* file within the *{{stop}}* 
method:{+}{{+}}
{code:java}
public void stop() {
  rm.stop();
}

{code}
The root cause of the *{{NullPointerException}}* is the lack of a null check 
for the {{rm}} object before calling its {{stop}} method. Under any condition 
where the *{{ResourceManager}}* fails to initialize correctly, attempting to 
stop the *{{ResourceManager}}* leads to a null pointer dereference.

 

After fixing in {*}RMRunner.java{*}, TaskRunner should also be fixed.

+TaskRunner.java+
{code:java}
public void stop() throws InterruptedException {
  executor.shutdownNow();
  executor.awaitTermination(20, TimeUnit.SECONDS);
}

{code}
 

{*}How to trigger this bug:{*}{*}{{*}}
 * Change the parameterized unit test's(TestSLSRunner.java) data method to 
include one/both of the following test cases:
 * {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
 * {capScheduler, "SYNTH", slsTraceFile, nodeFile }

 * Execute the *TestSLSRunner* test suite, particularly the 
*testSimulatorRunning* method.
 * Observe the resulting *NullPointerException* in the test output(triggered in 
RMRunner.java).

 
{panel:title=Example stack trace from the test output:}
[ERROR] testSimulatorRunning[Testing with: SYNTH, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,
 (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 
s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at 
org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
{panel}
 

 
___

_{color:#172b4d}The bug can be fixed by implementing a null check for the 
{{rm}} object within the *{{RMRunner.java}}* {{stop}} method before calling any 
methods on it.(same for executor object in TaskRunner.java){color}_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5305) Yarn Application Log Aggregation fails due to NM can not get correct HDFS delegation token III

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828512#comment-17828512
 ] 

ASF GitHub Bot commented on YARN-5305:
--

hadoop-yetus commented on PR #6625:
URL: https://github.com/apache/hadoop/pull/6625#issuecomment-2008119232

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 44s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m  9s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 50s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   2m 35s | 
[/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6625/4/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html)
 |  hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  34m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 21s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6625/4/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 197 unchanged - 0 fixed = 198 total (was 
197)  |
   | +1 :green_heart: |  mvnsite  |   2m 42s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 51s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   4m 32s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 40s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |  24m 43s |  |  hadoop-yarn-server-nodemanager 
in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  5s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 268m 12s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6625/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6625 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c8c47b10ec91 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 772720878905eb7caa1ca4ca2936d727d54ee7b9 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 

[jira] [Commented] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828470#comment-17828470
 ] 

ASF GitHub Bot commented on YARN-11664:
---

shameersss1 commented on PR #6631:
URL: https://github.com/apache/hadoop/pull/6631#issuecomment-2007805123

   > -1. Please do not change the following `@Public` and `@Evolving` classes:
   > 
   > * QuotaExceededException.java
   > 
   > * DSQuotaExceededException.java
   > 
   > 
   > > 
https://apache.github.io/hadoop/hadoop-project-dist/hadoop-common/Compatibility.html
   > > Evolving interfaces must not change between minor releases.
   > 
   > Can we use ClusterStorageCapacityExceededException (hadoop-common) instead 
of DSQuotaExceededException/QuotaExceededException (hadoop-hdfs) in YARN source 
code?
   > 
   > IOStreamPair.java is `@Private` and I think we can relocate to 
hadoop-common.
   
   ClusterStorageCapacityExceededException is a parent exception of 
DSQuotaExceededException and hence catching it will serve the purpose as well. 
I will raise a revision of this change.




> Remove HDFS Binaries/Jars Dependency From YARN
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
> import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
> transitive dependency from DSQuotaExceededException){code}
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on *hadoop-common* module, One straight forward 
> approach is to move all these dependencies to *hadoop-common* module and both 
> HDFS and Yarn can pick these dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11216) Avoid unnecessary reconstruction of ConfigurationProperties

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828448#comment-17828448
 ] 

ASF GitHub Bot commented on YARN-11216:
---

hadoop-yetus commented on PR #4655:
URL: https://github.com/apache/hadoop/pull/4655#issuecomment-2007703551

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 22s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  36m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  19m  1s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  17m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 42s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 53s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 17s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   2m 34s | 
[/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4655/16/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html)
 |  hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  41m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | -1 :x: |  mvninstall  |   0m 32s | 
[/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4655/16/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt)
 |  hadoop-yarn-server-resourcemanager in the patch failed.  |
   | -1 :x: |  compile  |   8m 13s | 
[/patch-compile-root-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4655/16/artifact/out/patch-compile-root-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  root in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  javac  |   8m 13s | 
[/patch-compile-root-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4655/16/artifact/out/patch-compile-root-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  root in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  compile  |   7m 37s | 
[/patch-compile-root-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4655/16/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  root in the patch failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | -1 :x: |  javac  |   7m 37s | 
[/patch-compile-root-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4655/16/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  root in the patch failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 20s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4655/16/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 6 new + 139 unchanged - 0 fixed = 145 total (was 
139)  |
   | -1 :x: |  mvnsite  |   0m 37s | 

[jira] [Commented] (YARN-5305) Yarn Application Log Aggregation fails due to NM can not get correct HDFS delegation token III

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828447#comment-17828447
 ] 

ASF GitHub Bot commented on YARN-5305:
--

K0K0V0K commented on PR #6625:
URL: https://github.com/apache/hadoop/pull/6625#issuecomment-2007689009

   Thanks, @p-szucs for this fix. Nice work!
   
   LGTM!




> Yarn Application Log Aggregation fails due to NM can not get correct HDFS 
> delegation token III
> --
>
> Key: YARN-5305
> URL: https://issues.apache.org/jira/browse/YARN-5305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Different with YARN-5098 and YARN-5302, this problem happens when AM submits 
> a startContainer request with a new HDFS token (say, tokenB) which is not 
> managed by YARN, so two tokens exist in the credentials of the user on NM, 
> one is tokenB, the other is the one renewed on RM (tokenA). If tokenB is 
> selected when connect to HDFS and tokenB expires, exception happens.
> Supplementary: this problem happen due to that AM didn't use the service name 
> as the token alias in credentials, so two tokens for the same service can 
> co-exist in one credentials. TokenSelector can only select the first matched 
> token, it doesn't care if the token is valid or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5305) Yarn Application Log Aggregation fails due to NM can not get correct HDFS delegation token III

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828437#comment-17828437
 ] 

ASF GitHub Bot commented on YARN-5305:
--

p-szucs commented on code in PR #6625:
URL: https://github.com/apache/hadoop/pull/6625#discussion_r1530716191


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java:
##
@@ -286,7 +294,13 @@ private void uploadLogsForContainers(boolean appFinished)
 }
 
 addCredentials();
-
+if (UserGroupInformation.isSecurityEnabled()) {

Review Comment:
   Thanks @K0K0V0K for the review. Sure, I updated the PR with the fix.





> Yarn Application Log Aggregation fails due to NM can not get correct HDFS 
> delegation token III
> --
>
> Key: YARN-5305
> URL: https://issues.apache.org/jira/browse/YARN-5305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Different with YARN-5098 and YARN-5302, this problem happens when AM submits 
> a startContainer request with a new HDFS token (say, tokenB) which is not 
> managed by YARN, so two tokens exist in the credentials of the user on NM, 
> one is tokenB, the other is the one renewed on RM (tokenA). If tokenB is 
> selected when connect to HDFS and tokenB expires, exception happens.
> Supplementary: this problem happen due to that AM didn't use the service name 
> as the token alias in credentials, so two tokens for the same service can 
> co-exist in one credentials. TokenSelector can only select the first matched 
> token, it doesn't care if the token is valid or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5305) Yarn Application Log Aggregation fails due to NM can not get correct HDFS delegation token III

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828431#comment-17828431
 ] 

ASF GitHub Bot commented on YARN-5305:
--

K0K0V0K commented on code in PR #6625:
URL: https://github.com/apache/hadoop/pull/6625#discussion_r1530670402


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java:
##
@@ -286,7 +294,13 @@ private void uploadLogsForContainers(boolean appFinished)
 }
 
 addCredentials();
-
+if (UserGroupInformation.isSecurityEnabled()) {

Review Comment:
   Nit: maybe we can move this if into the removeExpiredDelegationTokens method 
as an early return, so it will be more similar to the addCredentials() method





> Yarn Application Log Aggregation fails due to NM can not get correct HDFS 
> delegation token III
> --
>
> Key: YARN-5305
> URL: https://issues.apache.org/jira/browse/YARN-5305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Different with YARN-5098 and YARN-5302, this problem happens when AM submits 
> a startContainer request with a new HDFS token (say, tokenB) which is not 
> managed by YARN, so two tokens exist in the credentials of the user on NM, 
> one is tokenB, the other is the one renewed on RM (tokenA). If tokenB is 
> selected when connect to HDFS and tokenB expires, exception happens.
> Supplementary: this problem happen due to that AM didn't use the service name 
> as the token alias in credentials, so two tokens for the same service can 
> co-exist in one credentials. TokenSelector can only select the first matched 
> token, it doesn't care if the token is valid or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11656) RMStateStore event queue blocked

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828289#comment-17828289
 ] 

ASF GitHub Bot commented on YARN-11656:
---

p-szucs commented on code in PR #6569:
URL: https://github.com/apache/hadoop/pull/6569#discussion_r1530182598


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/multidispatcher/MultiDispatcherExecutor.java:
##
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.event.multidispatcher;
+
+import java.util.Arrays;
+import java.util.Map;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.stream.Collectors;
+
+import org.slf4j.Logger;
+
+import org.apache.hadoop.yarn.event.Event;
+import org.apache.hadoop.yarn.util.Clock;
+import org.apache.hadoop.yarn.util.MonotonicClock;
+
+/**
+ * This class contains the thread which process the {@link MultiDispatcher}'s 
events.
+ */
+public class MultiDispatcherExecutor {
+
+  private final Logger log;
+  private final MultiDispatcherConfig config;
+  private final MultiDispatcherExecutorThread[] threads;
+  private final Clock clock = new MonotonicClock();
+
+  public MultiDispatcherExecutor(
+  Logger log,
+  MultiDispatcherConfig config,
+  String dispatcherName
+  ) {
+this.log = log;
+this.config = config;
+this.threads = new 
MultiDispatcherExecutorThread[config.getDefaultPoolSize()];
+ThreadGroup group = new ThreadGroup(dispatcherName);
+for (int i = 0; i < threads.length; ++i) {
+  threads[i] = new MultiDispatcherExecutorThread(group, i, 
config.getQueueSize());
+}
+  }
+
+  public void start() {
+for(Thread t : threads) {
+  t.start();
+}
+  }
+
+  public void execute(Event event, Runnable runnable) {
+String lockKey = event.getLockKey();
+// abs of Integer.MIN_VALUE is Integer.MIN_VALUE
+int threadIndex = lockKey == null  || lockKey.hashCode() == 
Integer.MIN_VALUE ?
+0 : Math.abs(lockKey.hashCode() % threads.length);

Review Comment:
   Based on our discussion, I think a comment or description probably would be 
useful to make the goal of this computation more clear



##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/multidispatcher/MultiDispatcherConfig.java:
##
@@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.event.multidispatcher;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * All the config what can be use in the {@link MultiDispatcher}
+ */
+class MultiDispatcherConfig extends Configuration {
+
+  private final String prefix;
+
+  public MultiDispatcherConfig(Configuration configuration, String 
dispatcherName) {
+super(configuration);
+this.prefix = String.format("yarn.dispatcher.multi-thread.%s.", 
dispatcherName);
+  }
+
+  /**
+   * How many executor thread should be created to handle the incoming events
+   * @return configured value, or default 4
+   */
+  public int getDefaultPoolSize() {
+return super.getInt(prefix + "default-pool-size", 4);
+  }
+
+  /**
+   * Maximus size of the event queue of the executor threads.

Review Comment:
   Just a typo, if you touch the code again anyways :)





> RMStateStore event 

[jira] [Commented] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828288#comment-17828288
 ] 

ASF GitHub Bot commented on YARN-11664:
---

aajisaka commented on PR #6631:
URL: https://github.com/apache/hadoop/pull/6631#issuecomment-2006945704

   -1. Please do not change the following `@Public` and `@Evolving` classes:
   - QuotaExceededException.java
   - DSQuotaExceededException.java
   
   
   > 
https://apache.github.io/hadoop/hadoop-project-dist/hadoop-common/Compatibility.html
   > Evolving interfaces must not change between minor releases.
   
   Can we use ClusterStorageCapacityExceededException (hadoop-common) instead 
of DSQuotaExceededException/QuotaExceededException (hadoop-hdfs) in YARN source 
code?
   
   IOStreamPair.java is `@Private` and I think we can relocate to hadoop-common.




> Remove HDFS Binaries/Jars Dependency From YARN
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
> import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
> transitive dependency from DSQuotaExceededException){code}
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on *hadoop-common* module, One straight forward 
> approach is to move all these dependencies to *hadoop-common* module and both 
> HDFS and Yarn can pick these dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org