[jira] [Commented] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication

2024-05-21 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848119#comment-17848119
 ] 

Syed Shameerur Rahman commented on YARN-11697:
--

[~wilfreds] 
 # IMHO, when the appAttempt is not available in the queue to be removed, It 
should be handled more gracefully than throwing IllegalStateException which 
will take down the RM.
 # Since the appAttempt is anyhow not available in the queue we can safely 
throw warning message instead of throwing exception

 

Any thoughts on the above approach ?

> Fix fair scheduler race condition in removeApplicationAttempt and 
> moveApplication
> -
>
> Key: YARN-11697
> URL: https://issues.apache.org/jira/browse/YARN-11697
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with 
> the following exception
> {code:java}
> 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher 
> (SchedulerEventDispatcher:Event Processor): Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.IllegalStateException: Given app to remove 
> appattempt_1706879498319_86660_01 Alloc:  does not 
> exist in queue [root, demand=, 
> running=, share=, w=1.0]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:750)
> {code}
> The exception seems similar to the one mentioned in YARN-5136, but it looks 
> like there is still some edge cases not covered by YARN-5136.
> 1. On deeper look, i could see that as mentioned in the comment here. if a 
> call for a moveApplication and removeApplicationAttempt for the same attempt 
> are processed in short succession the application attempt will still contain 
> a queue reference but is already removed from the list of applications for 
> the queue.
> 2. This can happen when 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908]
>  removes the appAttempt from the queue and 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707]
>  also tries to remove the same appAttempt from the queue.
> 3. On further checking, i could see that before doing 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779]
>  writeLock on appAttempt is taken where as for 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665]
>  , i don't see any writelock being taken which can result in race condition 
> if same appAttempt is being processed.
> 4. Additionally as mentioned in the comment here when such scenario occurs 
> ideally we should not take down RM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication

2024-05-21 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848086#comment-17848086
 ] 

Syed Shameerur Rahman edited comment on YARN-11697 at 5/21/24 8:00 AM:
---

Additionally i could specifically see this when Application is being killed and 
corresponds to the following code

 
{code:java}
boolean removeApp(FSAppAttempt app) {
  boolean runnable = false;

  // Remove app from runnable/nonRunnable list while holding the write lock
  writeLock.lock();
  try {
runnable = runnableApps.remove(app);
if (!runnable) {
  // removeNonRunnableApp acquires the write lock again, which is fine
  if (!removeNonRunnableApp(app)) {
throw new IllegalStateException("Given app to remove " + app +
" does not exist in queue " + this);
  }
}
  } finally {
writeLock.unlock();
  } {code}


was (Author: srahman):
Additionally i could specifically see this when Application is being killed.

> Fix fair scheduler race condition in removeApplicationAttempt and 
> moveApplication
> -
>
> Key: YARN-11697
> URL: https://issues.apache.org/jira/browse/YARN-11697
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with 
> the following exception
> {code:java}
> 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher 
> (SchedulerEventDispatcher:Event Processor): Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.IllegalStateException: Given app to remove 
> appattempt_1706879498319_86660_01 Alloc:  does not 
> exist in queue [root, demand=, 
> running=, share=, w=1.0]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:750)
> {code}
> The exception seems similar to the one mentioned in YARN-5136, but it looks 
> like there is still some edge cases not covered by YARN-5136.
> 1. On deeper look, i could see that as mentioned in the comment here. if a 
> call for a moveApplication and removeApplicationAttempt for the same attempt 
> are processed in short succession the application attempt will still contain 
> a queue reference but is already removed from the list of applications for 
> the queue.
> 2. This can happen when 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908]
>  removes the appAttempt from the queue and 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707]
>  also tries to remove the same appAttempt from the queue.
> 3. On further checking, i could see that before doing 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779]
>  writeLock on appAttempt is taken where as for 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665]
>  , i don't see any writelock being taken which can result in race condition 
> if same appAttempt is being processed.
> 4. Additionally as mentioned in the comment here when such scenario occurs 
> ideally we should not take down RM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication

2024-05-21 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848086#comment-17848086
 ] 

Syed Shameerur Rahman commented on YARN-11697:
--

Additionally i could specifically see this when Application is being killed.

> Fix fair scheduler race condition in removeApplicationAttempt and 
> moveApplication
> -
>
> Key: YARN-11697
> URL: https://issues.apache.org/jira/browse/YARN-11697
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with 
> the following exception
> {code:java}
> 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher 
> (SchedulerEventDispatcher:Event Processor): Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.IllegalStateException: Given app to remove 
> appattempt_1706879498319_86660_01 Alloc:  does not 
> exist in queue [root, demand=, 
> running=, share=, w=1.0]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:750)
> {code}
> The exception seems similar to the one mentioned in YARN-5136, but it looks 
> like there is still some edge cases not covered by YARN-5136.
> 1. On deeper look, i could see that as mentioned in the comment here. if a 
> call for a moveApplication and removeApplicationAttempt for the same attempt 
> are processed in short succession the application attempt will still contain 
> a queue reference but is already removed from the list of applications for 
> the queue.
> 2. This can happen when 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908]
>  removes the appAttempt from the queue and 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707]
>  also tries to remove the same appAttempt from the queue.
> 3. On further checking, i could see that before doing 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779]
>  writeLock on appAttempt is taken where as for 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665]
>  , i don't see any writelock being taken which can result in race condition 
> if same appAttempt is being processed.
> 4. Additionally as mentioned in the comment here when such scenario occurs 
> ideally we should not take down RM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication

2024-05-21 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848085#comment-17848085
 ] 

Syed Shameerur Rahman commented on YARN-11697:
--

[~wilfreds] 

I had some custom code/backports from higher version and hence the code lines 
might have differed from the OSS hadoop code base. I could see the following 
exception though 
java.lang.IllegalStateException: Given app to remove 
appattempt_1706879498319_86660_01 Alloc:  does not 
exist in queue [root, demand=, 
running=, share=, w=1.0]
 

So this exception comes only when the appAttempt is already removed from the 
queue and we try to remove it again. Throwing IllegalStateException causes the 
RM to shutdown with exception. Can you think of any scenario this can happen ?

> Fix fair scheduler race condition in removeApplicationAttempt and 
> moveApplication
> -
>
> Key: YARN-11697
> URL: https://issues.apache.org/jira/browse/YARN-11697
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with 
> the following exception
> {code:java}
> 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher 
> (SchedulerEventDispatcher:Event Processor): Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.IllegalStateException: Given app to remove 
> appattempt_1706879498319_86660_01 Alloc:  does not 
> exist in queue [root, demand=, 
> running=, share=, w=1.0]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:750)
> {code}
> The exception seems similar to the one mentioned in YARN-5136, but it looks 
> like there is still some edge cases not covered by YARN-5136.
> 1. On deeper look, i could see that as mentioned in the comment here. if a 
> call for a moveApplication and removeApplicationAttempt for the same attempt 
> are processed in short succession the application attempt will still contain 
> a queue reference but is already removed from the list of applications for 
> the queue.
> 2. This can happen when 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908]
>  removes the appAttempt from the queue and 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707]
>  also tries to remove the same appAttempt from the queue.
> 3. On further checking, i could see that before doing 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779]
>  writeLock on appAttempt is taken where as for 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665]
>  , i don't see any writelock being taken which can result in race condition 
> if same appAttempt is being processed.
> 4. Additionally as mentioned in the comment here when such scenario occurs 
> ideally we should not take down RM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication

2024-05-20 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated YARN-11697:
-
Description: 
For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with the 
following exception
{code:java}
2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher 
(SchedulerEventDispatcher:Event Processor): Error in handling event type 
APP_ATTEMPT_REMOVED to the Event Dispatcher
java.lang.IllegalStateException: Given app to remove 
appattempt_1706879498319_86660_01 Alloc:  does not 
exist in queue [root, demand=, 
running=, share=, w=1.0]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:750)
{code}
The exception seems similar to the one mentioned in YARN-5136, but it looks 
like there is still some edge cases not covered by YARN-5136.

1. On deeper look, i could see that as mentioned in the comment here. if a call 
for a moveApplication and removeApplicationAttempt for the same attempt are 
processed in short succession the application attempt will still contain a 
queue reference but is already removed from the list of applications for the 
queue.

2. This can happen when 
[moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908]
 removes the appAttempt from the queue and 
[removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707]
 also tries to remove the same appAttempt from the queue.

3. On further checking, i could see that before doing 
[moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779]
 writeLock on appAttempt is taken where as for 
[removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665]
 , i don't see any writelock being taken which can result in race condition if 
same appAttempt is being processed.

4. Additionally as mentioned in the comment here when such scenario occurs 
ideally we should not take down RM.

  was:
For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with the 
following exception
{code:java}
2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher 
(SchedulerEventDispatcher:Event Processor): Error in handling event type 
APP_ATTEMPT_REMOVED to the Event Dispatcher
java.lang.IllegalStateException: Given app to remove 
appattempt_1706879498319_86660_01 Alloc:  does not 
exist in queue [root.tier2.livy, demand=, 
running=, share=, w=1.0]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:750)
{code}
The exception seems similar to the one mentioned in YARN-5136, but it looks 
like there is still some edge cases not covered by YARN-5136.

1. On deeper look, i could see that as mentioned in the comment here. if a call 
for a moveApplication and removeApplicationAttempt for the same attempt are 
processed in short succession the application attempt will still contain a 
queue reference but is already removed from the list of applications for the 
queue.

2. This can happen when 

[jira] [Created] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication

2024-05-20 Thread Syed Shameerur Rahman (Jira)
Syed Shameerur Rahman created YARN-11697:


 Summary: Fix fair scheduler race condition in 
removeApplicationAttempt and moveApplication
 Key: YARN-11697
 URL: https://issues.apache.org/jira/browse/YARN-11697
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.2.1
Reporter: Syed Shameerur Rahman
Assignee: Syed Shameerur Rahman


For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with the 
following exception
{code:java}
2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher 
(SchedulerEventDispatcher:Event Processor): Error in handling event type 
APP_ATTEMPT_REMOVED to the Event Dispatcher
java.lang.IllegalStateException: Given app to remove 
appattempt_1706879498319_86660_01 Alloc:  does not 
exist in queue [root.tier2.livy, demand=, 
running=, share=, w=1.0]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:750)
{code}
The exception seems similar to the one mentioned in YARN-5136, but it looks 
like there is still some edge cases not covered by YARN-5136.

1. On deeper look, i could see that as mentioned in the comment here. if a call 
for a moveApplication and removeApplicationAttempt for the same attempt are 
processed in short succession the application attempt will still contain a 
queue reference but is already removed from the list of applications for the 
queue.

2. This can happen when 
[moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908]
 removes the appAttempt from the queue and 
[removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707]
 also tries to remove the same appAttempt from the queue.

3. On further checking, i could see that before doing 
[moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779]
 writeLock on appAttempt is taken where as for 
[removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665]
 , i don't see any writelock being taken which can result in race condition if 
same appAttempt is being processed.

4. Additionally as mentioned in the comment here when such scenario occurs 
ideally we should not take down RM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication

2024-05-20 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848046#comment-17848046
 ] 

Syed Shameerur Rahman commented on YARN-11697:
--

[~wilfreds]  any thoughts on this ?

> Fix fair scheduler race condition in removeApplicationAttempt and 
> moveApplication
> -
>
> Key: YARN-11697
> URL: https://issues.apache.org/jira/browse/YARN-11697
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with 
> the following exception
> {code:java}
> 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher 
> (SchedulerEventDispatcher:Event Processor): Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.IllegalStateException: Given app to remove 
> appattempt_1706879498319_86660_01 Alloc:  does not 
> exist in queue [root.tier2.livy, demand=, 
> running=, share=, w=1.0]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:750)
> {code}
> The exception seems similar to the one mentioned in YARN-5136, but it looks 
> like there is still some edge cases not covered by YARN-5136.
> 1. On deeper look, i could see that as mentioned in the comment here. if a 
> call for a moveApplication and removeApplicationAttempt for the same attempt 
> are processed in short succession the application attempt will still contain 
> a queue reference but is already removed from the list of applications for 
> the queue.
> 2. This can happen when 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908]
>  removes the appAttempt from the queue and 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707]
>  also tries to remove the same appAttempt from the queue.
> 3. On further checking, i could see that before doing 
> [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779]
>  writeLock on appAttempt is taken where as for 
> [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665]
>  , i don't see any writelock being taken which can result in race condition 
> if same appAttempt is being processed.
> 4. Additionally as mentioned in the comment here when such scenario occurs 
> ideally we should not take down RM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN

2024-03-22 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated YARN-11664:
-
Description: 
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;{code}
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on *hadoop-common* module,

* Constants variables and Utility classes can be moved to *hadoop-common*
* Instead of DSQuotaExceededException, Use the parent exception 
ClusterStoragrCapacityExceeded

  was:
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on *hadoop-common* module,

* Constants variables and Utility classes can be moved to *hadoop-common*
* Instead of DSQuotaExceededException, Use the parent exception 
ClusterStoragrCapacityExceeded


> Remove HDFS Binaries/Jars Dependency From YARN
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;{code}
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on *hadoop-common* module,
> * Constants variables and Utility classes can be moved to *hadoop-common*
> * Instead of DSQuotaExceededException, Use the parent exception 
> ClusterStoragrCapacityExceeded



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN

2024-03-20 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated YARN-11664:
-
Description: 
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on *hadoop-common* module,

* Constants variables and Utility classes can be moved to *hadoop-common*
* Instead of DSQuotaExceededException, Use the parent exception 
ClusterStoragrCapacityExceeded

  was:
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
transitive dependency from DSQuotaExceededException){code}
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on *hadoop-common* module, One straight forward 
approach is to move all these dependencies to *hadoop-common* module and both 
HDFS and Yarn can pick these dependencies.


> Remove HDFS Binaries/Jars Dependency From YARN
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on *hadoop-common* module,
> * Constants variables and Utility classes can be moved to *hadoop-common*
> * Instead of DSQuotaExceededException, Use the parent exception 
> ClusterStoragrCapacityExceeded



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN

2024-03-17 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated YARN-11664:
-
Summary: Remove HDFS Binaries/Jars Dependency From YARN  (was: Remove HDFS 
Binaries/Jars Dependency From Yarn)

> Remove HDFS Binaries/Jars Dependency From YARN
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
> import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
> transitive dependency from DSQuotaExceededException){code}
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on *hadoop-common* module, One straight forward 
> approach is to move all these dependencies to *hadoop-common* module and both 
> HDFS and Yarn can pick these dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11664) Remove HDFS Binaries/Jars Dependency From Yarn

2024-03-16 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827682#comment-17827682
 ] 

Syed Shameerur Rahman commented on YARN-11664:
--

[~zuston] [~ste...@apache.org] Any thoughts on this ?

> Remove HDFS Binaries/Jars Dependency From Yarn
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
> import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
> transitive dependency from DSQuotaExceededException){code}
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on *hadoop-common* module, One straight forward 
> approach is to move all these dependencies to *hadoop-common* module and both 
> HDFS and Yarn can pick these dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From Yarn

2024-03-16 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated YARN-11664:
-
Description: 
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
transitive dependency from DSQuotaExceededException){code}
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on *hadoop-common* module, One straight forward 
approach is to move all these dependencies to *hadoop-common* module and both 
HDFS and Yarn can pick these dependencies.

  was:
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
transitive dependency from DSQuotaExceededException){code}
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on hadoop-common module, One straight forward 
approach is to move all these dependencies to hadoop-common module and both 
HDFS and Yarn can pick these imports.


> Remove HDFS Binaries/Jars Dependency From Yarn
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
> import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
> transitive dependency from DSQuotaExceededException){code}
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on *hadoop-common* module, One straight forward 
> approach is to move all these dependencies to *hadoop-common* 

[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From Yarn

2024-03-16 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated YARN-11664:
-
Description: 
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
transitive dependency from DSQuotaExceededException){code}
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on hadoop-common module, One straight forward 
approach is to move all these dependencies to hadoop-common module and both 
HDFS and Yarn can pick these imports.

  was:
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
transitive dependency from DSQuotaExceededException){code}
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on hadoop-common module, One straight forward 
approach is to move all these dependencies to hadoop-common module and both 
HDFS and Yarn can pick these imports.


> Remove HDFS Binaries/Jars Dependency From Yarn
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
> import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
> transitive dependency from DSQuotaExceededException){code}
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on hadoop-common module, One straight forward 
> approach is to move all these dependencies to hadoop-common module and both 
> HDFS and Yarn can pick these imports.



--
This message was sent by 

[jira] [Created] (YARN-11664) Remove HDFS Binaries/Jars Dependency From Yarn

2024-03-16 Thread Syed Shameerur Rahman (Jira)
Syed Shameerur Rahman created YARN-11664:


 Summary: Remove HDFS Binaries/Jars Dependency From Yarn
 Key: YARN-11664
 URL: https://issues.apache.org/jira/browse/YARN-11664
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Syed Shameerur Rahman


In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
transitive dependency from DSQuotaExceededException){code}
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on hadoop-common module, One straight forward 
approach is to move all these dependencies to hadoop-common module and both 
HDFS and Yarn can pick these imports.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org