[jira] [Commented] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication
[ https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848119#comment-17848119 ] Syed Shameerur Rahman commented on YARN-11697: -- [~wilfreds] # IMHO, when the appAttempt is not available in the queue to be removed, It should be handled more gracefully than throwing IllegalStateException which will take down the RM. # Since the appAttempt is anyhow not available in the queue we can safely throw warning message instead of throwing exception Any thoughts on the above approach ? > Fix fair scheduler race condition in removeApplicationAttempt and > moveApplication > - > > Key: YARN-11697 > URL: https://issues.apache.org/jira/browse/YARN-11697 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with > the following exception > {code:java} > 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher > (SchedulerEventDispatcher:Event Processor): Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.IllegalStateException: Given app to remove > appattempt_1706879498319_86660_01 Alloc: does not > exist in queue [root, demand=, > running=, share=, w=1.0] > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:750) > {code} > The exception seems similar to the one mentioned in YARN-5136, but it looks > like there is still some edge cases not covered by YARN-5136. > 1. On deeper look, i could see that as mentioned in the comment here. if a > call for a moveApplication and removeApplicationAttempt for the same attempt > are processed in short succession the application attempt will still contain > a queue reference but is already removed from the list of applications for > the queue. > 2. This can happen when > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908] > removes the appAttempt from the queue and > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707] > also tries to remove the same appAttempt from the queue. > 3. On further checking, i could see that before doing > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779] > writeLock on appAttempt is taken where as for > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665] > , i don't see any writelock being taken which can result in race condition > if same appAttempt is being processed. > 4. Additionally as mentioned in the comment here when such scenario occurs > ideally we should not take down RM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication
[ https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848086#comment-17848086 ] Syed Shameerur Rahman edited comment on YARN-11697 at 5/21/24 8:00 AM: --- Additionally i could specifically see this when Application is being killed and corresponds to the following code {code:java} boolean removeApp(FSAppAttempt app) { boolean runnable = false; // Remove app from runnable/nonRunnable list while holding the write lock writeLock.lock(); try { runnable = runnableApps.remove(app); if (!runnable) { // removeNonRunnableApp acquires the write lock again, which is fine if (!removeNonRunnableApp(app)) { throw new IllegalStateException("Given app to remove " + app + " does not exist in queue " + this); } } } finally { writeLock.unlock(); } {code} was (Author: srahman): Additionally i could specifically see this when Application is being killed. > Fix fair scheduler race condition in removeApplicationAttempt and > moveApplication > - > > Key: YARN-11697 > URL: https://issues.apache.org/jira/browse/YARN-11697 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with > the following exception > {code:java} > 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher > (SchedulerEventDispatcher:Event Processor): Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.IllegalStateException: Given app to remove > appattempt_1706879498319_86660_01 Alloc: does not > exist in queue [root, demand=, > running=, share=, w=1.0] > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:750) > {code} > The exception seems similar to the one mentioned in YARN-5136, but it looks > like there is still some edge cases not covered by YARN-5136. > 1. On deeper look, i could see that as mentioned in the comment here. if a > call for a moveApplication and removeApplicationAttempt for the same attempt > are processed in short succession the application attempt will still contain > a queue reference but is already removed from the list of applications for > the queue. > 2. This can happen when > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908] > removes the appAttempt from the queue and > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707] > also tries to remove the same appAttempt from the queue. > 3. On further checking, i could see that before doing > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779] > writeLock on appAttempt is taken where as for > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665] > , i don't see any writelock being taken which can result in race condition > if same appAttempt is being processed. > 4. Additionally as mentioned in the comment here when such scenario occurs > ideally we should not take down RM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication
[ https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848086#comment-17848086 ] Syed Shameerur Rahman commented on YARN-11697: -- Additionally i could specifically see this when Application is being killed. > Fix fair scheduler race condition in removeApplicationAttempt and > moveApplication > - > > Key: YARN-11697 > URL: https://issues.apache.org/jira/browse/YARN-11697 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with > the following exception > {code:java} > 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher > (SchedulerEventDispatcher:Event Processor): Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.IllegalStateException: Given app to remove > appattempt_1706879498319_86660_01 Alloc: does not > exist in queue [root, demand=, > running=, share=, w=1.0] > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:750) > {code} > The exception seems similar to the one mentioned in YARN-5136, but it looks > like there is still some edge cases not covered by YARN-5136. > 1. On deeper look, i could see that as mentioned in the comment here. if a > call for a moveApplication and removeApplicationAttempt for the same attempt > are processed in short succession the application attempt will still contain > a queue reference but is already removed from the list of applications for > the queue. > 2. This can happen when > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908] > removes the appAttempt from the queue and > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707] > also tries to remove the same appAttempt from the queue. > 3. On further checking, i could see that before doing > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779] > writeLock on appAttempt is taken where as for > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665] > , i don't see any writelock being taken which can result in race condition > if same appAttempt is being processed. > 4. Additionally as mentioned in the comment here when such scenario occurs > ideally we should not take down RM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication
[ https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848085#comment-17848085 ] Syed Shameerur Rahman commented on YARN-11697: -- [~wilfreds] I had some custom code/backports from higher version and hence the code lines might have differed from the OSS hadoop code base. I could see the following exception though java.lang.IllegalStateException: Given app to remove appattempt_1706879498319_86660_01 Alloc: does not exist in queue [root, demand=, running=, share=, w=1.0] So this exception comes only when the appAttempt is already removed from the queue and we try to remove it again. Throwing IllegalStateException causes the RM to shutdown with exception. Can you think of any scenario this can happen ? > Fix fair scheduler race condition in removeApplicationAttempt and > moveApplication > - > > Key: YARN-11697 > URL: https://issues.apache.org/jira/browse/YARN-11697 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with > the following exception > {code:java} > 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher > (SchedulerEventDispatcher:Event Processor): Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.IllegalStateException: Given app to remove > appattempt_1706879498319_86660_01 Alloc: does not > exist in queue [root, demand=, > running=, share=, w=1.0] > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:750) > {code} > The exception seems similar to the one mentioned in YARN-5136, but it looks > like there is still some edge cases not covered by YARN-5136. > 1. On deeper look, i could see that as mentioned in the comment here. if a > call for a moveApplication and removeApplicationAttempt for the same attempt > are processed in short succession the application attempt will still contain > a queue reference but is already removed from the list of applications for > the queue. > 2. This can happen when > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908] > removes the appAttempt from the queue and > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707] > also tries to remove the same appAttempt from the queue. > 3. On further checking, i could see that before doing > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779] > writeLock on appAttempt is taken where as for > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665] > , i don't see any writelock being taken which can result in race condition > if same appAttempt is being processed. > 4. Additionally as mentioned in the comment here when such scenario occurs > ideally we should not take down RM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication
[ https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated YARN-11697: - Description: For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with the following exception {code:java} 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher (SchedulerEventDispatcher:Event Processor): Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.IllegalStateException: Given app to remove appattempt_1706879498319_86660_01 Alloc: does not exist in queue [root, demand=, running=, share=, w=1.0] at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:750) {code} The exception seems similar to the one mentioned in YARN-5136, but it looks like there is still some edge cases not covered by YARN-5136. 1. On deeper look, i could see that as mentioned in the comment here. if a call for a moveApplication and removeApplicationAttempt for the same attempt are processed in short succession the application attempt will still contain a queue reference but is already removed from the list of applications for the queue. 2. This can happen when [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908] removes the appAttempt from the queue and [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707] also tries to remove the same appAttempt from the queue. 3. On further checking, i could see that before doing [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779] writeLock on appAttempt is taken where as for [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665] , i don't see any writelock being taken which can result in race condition if same appAttempt is being processed. 4. Additionally as mentioned in the comment here when such scenario occurs ideally we should not take down RM. was: For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with the following exception {code:java} 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher (SchedulerEventDispatcher:Event Processor): Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.IllegalStateException: Given app to remove appattempt_1706879498319_86660_01 Alloc: does not exist in queue [root.tier2.livy, demand=, running=, share=, w=1.0] at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:750) {code} The exception seems similar to the one mentioned in YARN-5136, but it looks like there is still some edge cases not covered by YARN-5136. 1. On deeper look, i could see that as mentioned in the comment here. if a call for a moveApplication and removeApplicationAttempt for the same attempt are processed in short succession the application attempt will still contain a queue reference but is already removed from the list of applications for the queue. 2. This can happen when
[jira] [Created] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication
Syed Shameerur Rahman created YARN-11697: Summary: Fix fair scheduler race condition in removeApplicationAttempt and moveApplication Key: YARN-11697 URL: https://issues.apache.org/jira/browse/YARN-11697 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.2.1 Reporter: Syed Shameerur Rahman Assignee: Syed Shameerur Rahman For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with the following exception {code:java} 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher (SchedulerEventDispatcher:Event Processor): Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.IllegalStateException: Given app to remove appattempt_1706879498319_86660_01 Alloc: does not exist in queue [root.tier2.livy, demand=, running=, share=, w=1.0] at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:750) {code} The exception seems similar to the one mentioned in YARN-5136, but it looks like there is still some edge cases not covered by YARN-5136. 1. On deeper look, i could see that as mentioned in the comment here. if a call for a moveApplication and removeApplicationAttempt for the same attempt are processed in short succession the application attempt will still contain a queue reference but is already removed from the list of applications for the queue. 2. This can happen when [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908] removes the appAttempt from the queue and [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707] also tries to remove the same appAttempt from the queue. 3. On further checking, i could see that before doing [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779] writeLock on appAttempt is taken where as for [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665] , i don't see any writelock being taken which can result in race condition if same appAttempt is being processed. 4. Additionally as mentioned in the comment here when such scenario occurs ideally we should not take down RM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11697) Fix fair scheduler race condition in removeApplicationAttempt and moveApplication
[ https://issues.apache.org/jira/browse/YARN-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848046#comment-17848046 ] Syed Shameerur Rahman commented on YARN-11697: -- [~wilfreds] any thoughts on this ? > Fix fair scheduler race condition in removeApplicationAttempt and > moveApplication > - > > Key: YARN-11697 > URL: https://issues.apache.org/jira/browse/YARN-11697 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > For Hadoop version 3.2.1, the ResourceManager (RM) restarts frequently with > the following exception > {code:java} > 2024-03-11 04:41:29,329 FATAL org.apache.hadoop.yarn.event.EventDispatcher > (SchedulerEventDispatcher:Event Processor): Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.IllegalStateException: Given app to remove > appattempt_1706879498319_86660_01 Alloc: does not > exist in queue [root.tier2.livy, demand=, > running=, share=, w=1.0] > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1378) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:139) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:750) > {code} > The exception seems similar to the one mentioned in YARN-5136, but it looks > like there is still some edge cases not covered by YARN-5136. > 1. On deeper look, i could see that as mentioned in the comment here. if a > call for a moveApplication and removeApplicationAttempt for the same attempt > are processed in short succession the application attempt will still contain > a queue reference but is already removed from the list of applications for > the queue. > 2. This can happen when > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1908] > removes the appAttempt from the queue and > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L707] > also tries to remove the same appAttempt from the queue. > 3. On further checking, i could see that before doing > [moveApplication|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1779] > writeLock on appAttempt is taken where as for > [removeApplicationAttempt|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L665] > , i don't see any writelock being taken which can result in race condition > if same appAttempt is being processed. > 4. Additionally as mentioned in the comment here when such scenario occurs > ideally we should not take down RM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN
[ https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated YARN-11664: - Description: In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;{code} 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on *hadoop-common* module, * Constants variables and Utility classes can be moved to *hadoop-common* * Instead of DSQuotaExceededException, Use the parent exception ClusterStoragrCapacityExceeded was: In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on *hadoop-common* module, * Constants variables and Utility classes can be moved to *hadoop-common* * Instead of DSQuotaExceededException, Use the parent exception ClusterStoragrCapacityExceeded > Remove HDFS Binaries/Jars Dependency From YARN > -- > > Key: YARN-11664 > URL: https://issues.apache.org/jira/browse/YARN-11664 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > In principle Hadoop Yarn is independent of HDFS. It can work with any > filesystem. Currently there exists some code dependency for Yarn with HDFS. > This dependency requires Yarn to bring in some of the HDFS binaries/jars to > its class path. The idea behind this jira is to remove this dependency so > that Yarn can run without HDFS binaries/jars > *Scope* > 1. Non test classes are considered > 2. Some test classes which comes as transitive dependency are considered > *Out of scope* > 1. All test classes in Yarn module is not considered > > > A quick search in Yarn module revealed following HDFS dependencies > 1. Constants > {code:java} > import > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; > import org.apache.hadoop.hdfs.DFSConfigKeys;{code} > > > 2. Exception > {code:java} > import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;{code} > > 3. Utility > {code:java} > import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} > > Both Yarn and HDFS depends on *hadoop-common* module, > * Constants variables and Utility classes can be moved to *hadoop-common* > * Instead of DSQuotaExceededException, Use the parent exception > ClusterStoragrCapacityExceeded -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For
[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN
[ https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated YARN-11664: - Description: In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on *hadoop-common* module, * Constants variables and Utility classes can be moved to *hadoop-common* * Instead of DSQuotaExceededException, Use the parent exception ClusterStoragrCapacityExceeded was: In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a transitive dependency from DSQuotaExceededException){code} 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on *hadoop-common* module, One straight forward approach is to move all these dependencies to *hadoop-common* module and both HDFS and Yarn can pick these dependencies. > Remove HDFS Binaries/Jars Dependency From YARN > -- > > Key: YARN-11664 > URL: https://issues.apache.org/jira/browse/YARN-11664 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > In principle Hadoop Yarn is independent of HDFS. It can work with any > filesystem. Currently there exists some code dependency for Yarn with HDFS. > This dependency requires Yarn to bring in some of the HDFS binaries/jars to > its class path. The idea behind this jira is to remove this dependency so > that Yarn can run without HDFS binaries/jars > *Scope* > 1. Non test classes are considered > 2. Some test classes which comes as transitive dependency are considered > *Out of scope* > 1. All test classes in Yarn module is not considered > > > A quick search in Yarn module revealed following HDFS dependencies > 1. Constants > {code:java} > import > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; > import org.apache.hadoop.hdfs.DFSConfigKeys;{code} > > > 2. Exception > {code:java} > import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; > > 3. Utility > {code:java} > import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} > > Both Yarn and HDFS depends on *hadoop-common* module, > * Constants variables and Utility classes can be moved to *hadoop-common* > * Instead of DSQuotaExceededException, Use the parent exception > ClusterStoragrCapacityExceeded -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN
[ https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated YARN-11664: - Summary: Remove HDFS Binaries/Jars Dependency From YARN (was: Remove HDFS Binaries/Jars Dependency From Yarn) > Remove HDFS Binaries/Jars Dependency From YARN > -- > > Key: YARN-11664 > URL: https://issues.apache.org/jira/browse/YARN-11664 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > In principle Hadoop Yarn is independent of HDFS. It can work with any > filesystem. Currently there exists some code dependency for Yarn with HDFS. > This dependency requires Yarn to bring in some of the HDFS binaries/jars to > its class path. The idea behind this jira is to remove this dependency so > that Yarn can run without HDFS binaries/jars > *Scope* > 1. Non test classes are considered > 2. Some test classes which comes as transitive dependency are considered > *Out of scope* > 1. All test classes in Yarn module is not considered > > > A quick search in Yarn module revealed following HDFS dependencies > 1. Constants > {code:java} > import > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; > import org.apache.hadoop.hdfs.DFSConfigKeys;{code} > > > 2. Exception > {code:java} > import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; > import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a > transitive dependency from DSQuotaExceededException){code} > > 3. Utility > {code:java} > import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} > > Both Yarn and HDFS depends on *hadoop-common* module, One straight forward > approach is to move all these dependencies to *hadoop-common* module and both > HDFS and Yarn can pick these dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11664) Remove HDFS Binaries/Jars Dependency From Yarn
[ https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827682#comment-17827682 ] Syed Shameerur Rahman commented on YARN-11664: -- [~zuston] [~ste...@apache.org] Any thoughts on this ? > Remove HDFS Binaries/Jars Dependency From Yarn > -- > > Key: YARN-11664 > URL: https://issues.apache.org/jira/browse/YARN-11664 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > In principle Hadoop Yarn is independent of HDFS. It can work with any > filesystem. Currently there exists some code dependency for Yarn with HDFS. > This dependency requires Yarn to bring in some of the HDFS binaries/jars to > its class path. The idea behind this jira is to remove this dependency so > that Yarn can run without HDFS binaries/jars > *Scope* > 1. Non test classes are considered > 2. Some test classes which comes as transitive dependency are considered > *Out of scope* > 1. All test classes in Yarn module is not considered > > > A quick search in Yarn module revealed following HDFS dependencies > 1. Constants > {code:java} > import > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; > import org.apache.hadoop.hdfs.DFSConfigKeys;{code} > > > 2. Exception > {code:java} > import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; > import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a > transitive dependency from DSQuotaExceededException){code} > > 3. Utility > {code:java} > import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} > > Both Yarn and HDFS depends on *hadoop-common* module, One straight forward > approach is to move all these dependencies to *hadoop-common* module and both > HDFS and Yarn can pick these dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From Yarn
[ https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated YARN-11664: - Description: In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a transitive dependency from DSQuotaExceededException){code} 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on *hadoop-common* module, One straight forward approach is to move all these dependencies to *hadoop-common* module and both HDFS and Yarn can pick these dependencies. was: In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a transitive dependency from DSQuotaExceededException){code} 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on hadoop-common module, One straight forward approach is to move all these dependencies to hadoop-common module and both HDFS and Yarn can pick these imports. > Remove HDFS Binaries/Jars Dependency From Yarn > -- > > Key: YARN-11664 > URL: https://issues.apache.org/jira/browse/YARN-11664 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Syed Shameerur Rahman >Priority: Major > > In principle Hadoop Yarn is independent of HDFS. It can work with any > filesystem. Currently there exists some code dependency for Yarn with HDFS. > This dependency requires Yarn to bring in some of the HDFS binaries/jars to > its class path. The idea behind this jira is to remove this dependency so > that Yarn can run without HDFS binaries/jars > *Scope* > 1. Non test classes are considered > 2. Some test classes which comes as transitive dependency are considered > *Out of scope* > 1. All test classes in Yarn module is not considered > > > A quick search in Yarn module revealed following HDFS dependencies > 1. Constants > {code:java} > import > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; > import org.apache.hadoop.hdfs.DFSConfigKeys;{code} > > > 2. Exception > {code:java} > import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; > import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a > transitive dependency from DSQuotaExceededException){code} > > 3. Utility > {code:java} > import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} > > Both Yarn and HDFS depends on *hadoop-common* module, One straight forward > approach is to move all these dependencies to *hadoop-common*
[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From Yarn
[ https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated YARN-11664: - Description: In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a transitive dependency from DSQuotaExceededException){code} 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on hadoop-common module, One straight forward approach is to move all these dependencies to hadoop-common module and both HDFS and Yarn can pick these imports. was: In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a transitive dependency from DSQuotaExceededException){code} 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on hadoop-common module, One straight forward approach is to move all these dependencies to hadoop-common module and both HDFS and Yarn can pick these imports. > Remove HDFS Binaries/Jars Dependency From Yarn > -- > > Key: YARN-11664 > URL: https://issues.apache.org/jira/browse/YARN-11664 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Syed Shameerur Rahman >Priority: Major > > In principle Hadoop Yarn is independent of HDFS. It can work with any > filesystem. Currently there exists some code dependency for Yarn with HDFS. > This dependency requires Yarn to bring in some of the HDFS binaries/jars to > its class path. The idea behind this jira is to remove this dependency so > that Yarn can run without HDFS binaries/jars > *Scope* > 1. Non test classes are considered > 2. Some test classes which comes as transitive dependency are considered > *Out of scope* > 1. All test classes in Yarn module is not considered > > > A quick search in Yarn module revealed following HDFS dependencies > 1. Constants > {code:java} > import > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; > import org.apache.hadoop.hdfs.DFSConfigKeys;{code} > > > 2. Exception > {code:java} > import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; > import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a > transitive dependency from DSQuotaExceededException){code} > > 3. Utility > {code:java} > import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} > > Both Yarn and HDFS depends on hadoop-common module, One straight forward > approach is to move all these dependencies to hadoop-common module and both > HDFS and Yarn can pick these imports. -- This message was sent by
[jira] [Created] (YARN-11664) Remove HDFS Binaries/Jars Dependency From Yarn
Syed Shameerur Rahman created YARN-11664: Summary: Remove HDFS Binaries/Jars Dependency From Yarn Key: YARN-11664 URL: https://issues.apache.org/jira/browse/YARN-11664 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Syed Shameerur Rahman In principle Hadoop Yarn is independent of HDFS. It can work with any filesystem. Currently there exists some code dependency for Yarn with HDFS. This dependency requires Yarn to bring in some of the HDFS binaries/jars to its class path. The idea behind this jira is to remove this dependency so that Yarn can run without HDFS binaries/jars *Scope* 1. Non test classes are considered 2. Some test classes which comes as transitive dependency are considered *Out of scope* 1. All test classes in Yarn module is not considered A quick search in Yarn module revealed following HDFS dependencies 1. Constants {code:java} import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier; import org.apache.hadoop.hdfs.DFSConfigKeys;{code} 2. Exception {code:java} import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException; import org.apache.hadoop.hdfs.protocol.QuotaExceededException; (Comes as a transitive dependency from DSQuotaExceededException){code} 3. Utility {code:java} import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code} Both Yarn and HDFS depends on hadoop-common module, One straight forward approach is to move all these dependencies to hadoop-common module and both HDFS and Yarn can pick these imports. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org