[jira] [Updated] (YARN-11683) RM crash due to RELEASE_CONTAINER NPE
[ https://issues.apache.org/jira/browse/YARN-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11683: Affects Version/s: 3.4.0 > RM crash due to RELEASE_CONTAINER NPE > - > > Key: YARN-11683 > URL: https://issues.apache.org/jira/browse/YARN-11683 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > > We enable scheduleAsynchronously in our prod env, RM crash and throw > exception stack below: > {code:java} > // error stack > ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in > handling event type RELEASE_CONTAINER to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completeOustandingUpdatesWhichAreReserved(AbstractYarnScheduler.java:811) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2271) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:177) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:90) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > I found same issues like YARN-11488, YARN-10204 reported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11683) RM crash due to RELEASE_CONTAINER NPE
[ https://issues.apache.org/jira/browse/YARN-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11683: Description: We enable scheduleAsynchronously in our prod env, RM crash and throw exception stack below: {code:java} // error stack ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type RELEASE_CONTAINER to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completeOustandingUpdatesWhichAreReserved(AbstractYarnScheduler.java:811) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:770) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2271) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:177) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:90) at java.base/java.lang.Thread.run(Thread.java:834) {code} I found same issues like YARN-11488, YARN-10204 reported. > RM crash due to RELEASE_CONTAINER NPE > - > > Key: YARN-11683 > URL: https://issues.apache.org/jira/browse/YARN-11683 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Yuan Luo >Priority: Major > > We enable scheduleAsynchronously in our prod env, RM crash and throw > exception stack below: > {code:java} > // error stack > ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in > handling event type RELEASE_CONTAINER to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completeOustandingUpdatesWhichAreReserved(AbstractYarnScheduler.java:811) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2271) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:177) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:90) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > I found same issues like YARN-11488, YARN-10204 reported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11683) RM crash due to RELEASE_CONTAINER NPE
Yuan Luo created YARN-11683: --- Summary: RM crash due to RELEASE_CONTAINER NPE Key: YARN-11683 URL: https://issues.apache.org/jira/browse/YARN-11683 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Yuan Luo -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827323#comment-17827323 ] Yuan Luo edited comment on YARN-11663 at 3/15/24 2:51 AM: -- [~slfan1989] Thanks for your reply. Our cache setting is 60 seconds, and memory recycling is triggered by full gc. The number of cached objects keeps growing. !image-2024-03-15-10-50-32-860.png! I test ehcache or guava cache will not clean up expired keys immediately, it will only be triggered when accessed. After the job ends, the cache key will never be accessed again. was (Author: luoyuan): Our cache setting is 60 seconds, and memory recycling is triggered by full gc. The number of cached objects keeps growing. !image-2024-03-15-10-50-32-860.png! I test ehcache or guava cache will not clean up expired keys immediately, it will only be triggered when accessed. After the job ends, the cache key will never be accessed again. > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827323#comment-17827323 ] Yuan Luo commented on YARN-11663: - Our cache setting is 60 seconds, and memory recycling is triggered by full gc. The number of cached objects keeps growing. !image-2024-03-15-10-50-32-860.png! I test ehcache or guava cache will not clean up expired keys immediately, it will only be triggered when accessed. After the job ends, the cache key will never be accessed again. > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11663: Attachment: image-2024-03-15-10-50-32-860.png > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11663: Description: !image-2024-03-14-18-12-28-426.png! !image-2024-03-14-18-12-49-950.png! hi [~slfan1989] After apply this feature to our prod env, I found the memory of the router keeps growing over time. This is because after jobs finished, we won't access the expired key to trigger cleanup mechanism. Is it better to add cache maximum number limit? > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11663: Affects Version/s: 3.4.0 (was: 3.3.6) > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11663: Attachment: image-2024-03-14-18-12-28-426.png > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11663: Attachment: image-2024-03-14-18-12-49-950.png > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11663: Component/s: federation > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.3.6 >Reporter: Yuan Luo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11663) Router cache expansion issue
Yuan Luo created YARN-11663: --- Summary: Router cache expansion issue Key: YARN-11663 URL: https://issues.apache.org/jira/browse/YARN-11663 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.3.6 Reporter: Yuan Luo -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10885) Make FederationStateStoreFacade#getApplicationHomeSubCluster use JCache
[ https://issues.apache.org/jira/browse/YARN-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827033#comment-17827033 ] Yuan Luo commented on YARN-10885: - !image-2024-03-14-18-02-21-278.png! !image-2024-03-14-18-02-49-146.png! hi [~slfan1989] [~chaosju] After apply this feature to our prod env, I found the memory of the router keeps growing over time. This is because after jobs finished, we won't access the expired key to trigger cleanup mechanism. Is it better to add cache maximum number limit? > Make FederationStateStoreFacade#getApplicationHomeSubCluster use JCache > --- > > Key: YARN-10885 > URL: https://issues.apache.org/jira/browse/YARN-10885 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 3.4.0 >Reporter: chaosju >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Yarn Client getApplicationReport function may produce lots of zookeeper ops, > Its import to use the JCache that cache the mapping of application and > subcluster id. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11191) Global Scheduler refreshQueue cause deadLock
[ https://issues.apache.org/jira/browse/YARN-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578409#comment-17578409 ] Yuan Luo commented on YARN-11191: - {code:java} public class ThreadLockTest { private static DateFormat sdf = new SimpleDateFormat("-MM-dd HH:mm:ss"); public static void BeforeFixDeadLock() throws InterruptedException { System.out.println( "BeforeFixDeadLock test start, this will cause deadlock.."); ReentrantReadWriteLock queueLock = new ReentrantReadWriteLock(); ReentrantReadWriteLock.ReadLock queueReadLock = queueLock.readLock(); ReentrantReadWriteLock.WriteLock queueWriteLock = queueLock.writeLock(); ReentrantReadWriteLock premmptionLock = new ReentrantReadWriteLock(); ReentrantReadWriteLock.ReadLock premmptionReadLock = premmptionLock.readLock(); ReentrantReadWriteLock.WriteLock premmptionWriteLock = premmptionLock.writeLock(); Thread schedulerThread = new Thread(() -> { System.out.println("current time: " + sdf.format(new Date()) + ", schedulerThread start!"); //hold: csqueue.readLock queueReadLock.lock(); System.out.println("current time: " + sdf.format(new Date()) + ", schedulerThread get queueReadLock!"); try { Thread.sleep(1000 * 15); } catch (InterruptedException e) { e.printStackTrace(); } //require: PremmptionManager.readLock premmptionReadLock.lock(); System.out.println("current time: " + sdf.format(new Date()) + ", schedulerThread get premmptionReadLock!"); premmptionReadLock.unlock(); queueReadLock.unlock(); System.out.println("current time: " + sdf.format(new Date()) + ", schedulerThread finish!"); }); Thread refreshQueueThread = new Thread(() -> { System.out.println("current time: " + sdf.format(new Date()) + ", refreshQueueThread start!"); //hold: PremmptionManager.writeLock premmptionWriteLock.lock(); System.out.println("current time: " + sdf.format(new Date()) + ", refreshQueueThread get premmptionWriteLock!"); try { Thread.sleep(1000 * 10); } catch (InterruptedException e) { e.printStackTrace(); } //require: csqueue.readLock queueReadLock.lock(); System.out.println("current time: " + sdf.format(new Date()) + ", refreshQueueThread get queueReadLock!"); queueReadLock.unlock(); premmptionWriteLock.unlock(); System.out.println("current time: " + sdf.format(new Date()) + ", refreshQueueThread finish!"); }); Thread otherThread = new Thread(() -> { //make otherThread request queue write lock after schedule thread hold // queue write lock, and before refres thread to get queue read lock try { Thread.sleep(1000 * 5); } catch (InterruptedException e) { e.printStackTrace(); } System.out.println( "current time: " + sdf.format(new Date()) + ", otherThread start!"); queueWriteLock.lock(); System.out.println("current time: " + sdf.format(new Date()) + ", otherThread get queueWriteLock!"); queueWriteLock.unlock(); System.out.println( "current time: " + sdf.format(new Date()) + ", otherThread finish!"); }); schedulerThread.start(); refreshQueueThread.start(); otherThread.start(); refreshQueueThread.join(); schedulerThread.join(); otherThread.join(); } public static void AfterFixDeadLock() throws InterruptedException { System.out.println("AfterFixDeadLock test start.."); ReentrantReadWriteLock queueLock = new ReentrantReadWriteLock(); ReentrantReadWriteLock.ReadLock queueReadLock = queueLock.readLock(); ReentrantReadWriteLock.WriteLock queueWriteLock = queueLock.writeLock(); ReentrantReadWriteLock premmptionLock = new ReentrantReadWriteLock(); ReentrantReadWriteLock.ReadLock premmptionReadLock = premmptionLock.readLock(); ReentrantReadWriteLock.WriteLock premmptionWriteLock = premmptionLock.writeLock(); Thread schedulerThread = new Thread(() -> { System.out.println("current time: " + sdf.format(new Date()) + ", schedulerThread start!"); //hold: csqueue.readLock queueReadLock.lock(); System.out.println("current time: " + sdf.format(new Date()) + ", schedulerThread get queueReadLock!"); try { Thread.sleep(1000 * 15); } catch (InterruptedException e) { e.printStackTrace(); } //require: PremmptionManager.readLock premmptionReadLock.lock(); System.out.println("current time: " + sdf.format(new Date()) + ", schedulerThread get premmptionReadLock!"); premmptionReadLock.unlock(); queueReadLock.unlock(); System.out.println("current time: " +
[jira] [Commented] (YARN-11191) Global Scheduler refreshQueue cause deadLock
[ https://issues.apache.org/jira/browse/YARN-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577784#comment-17577784 ] Yuan Luo commented on YARN-11191: - I want to ask two question: + @Override + public List getChildQueuesByTryLock() { + try { + while (!readLock.tryLock()){ + LockSupport.parkNanos(1); + } + return new ArrayList(childQueues); + } finally { + readLock.unlock(); + } + } 1.Though you use tryLock and park, so refresh queue thread switch to block state, but this thread still hold PremmptionManager lock ,so scheduler thread still can't allocate new container. Is it right? 2.Does this issue related to global Scheduler or just the preemption function? Looking forward to your reply, thanks! > Global Scheduler refreshQueue cause deadLock > - > > Key: YARN-11191 > URL: https://issues.apache.org/jira/browse/YARN-11191 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.9.0, 3.0.0, 3.1.0, 2.10.0, 3.2.0, 3.3.0 >Reporter: ben yang >Priority: Major > Labels: pull-request-available > Attachments: 1.jstack, YARN-11191.001.patch > > > This is a potential bug may impact all open premmption cluster.In our > current version with preemption enabled, the capacityScheduler will call the > refreshQueue method of the PreemptionManager when it refreshQueue. This > process hold the preemptionManager write lock and require csqueue read > lock.Meanwhile,ParentQueue.canAssignToThisQueue will hold csqueue readLock > and require PreemptionManager ReadLock. > There is a possibility of deadlock at this time.Because readlock has one rule > on unfair policy, when a lock is already occupied by a read lock and the > first request in the lock competition queue is a write lock request,other > read lock requests cann‘t acquire the lock. > So the potential deadlock is: > {code:java} > CapacityScheduler.refreshQueue: hold: PremmptionManager.writeLock > require: csqueue.readLock > CapacityScheduler.schedule: hold: csqueue.readLock > require: PremmptionManager.readLock > other thread(completeContainer,release Resource,etc.): require: > csqueue.writeLock > {code} > The jstack logs at the time were as follows -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11191) Global Scheduler refreshQueue cause deadLock
[ https://issues.apache.org/jira/browse/YARN-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577178#comment-17577178 ] Yuan Luo commented on YARN-11191: - We meet a same issue when refresh yarn queue, the refresh thread stuck in below function: {code:java} 'preemptionManager.refreshQueues(null, this.getRootQueue()) {code} Can you help have a look at this issue. [~elgoiri] [~aajisaka] > Global Scheduler refreshQueue cause deadLock > - > > Key: YARN-11191 > URL: https://issues.apache.org/jira/browse/YARN-11191 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.9.0, 3.0.0, 3.1.0, 2.10.0, 3.2.0, 3.3.0 >Reporter: ben yang >Priority: Major > Attachments: 1.jstack, YARN-11191.001.patch > > > This is a potential bug may impact all open premmption cluster.In our > current version with preemption enabled, the capacityScheduler will call the > refreshQueue method of the PreemptionManager when it refreshQueue. This > process hold the preemptionManager write lock and require csqueue read > lock.Meanwhile,ParentQueue.canAssignToThisQueue will hold csqueue readLock > and require PreemptionManager ReadLock. > There is a possibility of deadlock at this time.Because readlock has one rule > on unfair policy, when a lock is already occupied by a read lock and the > first request in the lock competition queue is a write lock request,other > read lock requests cann‘t acquire the lock. > So the potential deadlock is: > {code:java} > CapacityScheduler.refreshQueue: hold: PremmptionManager.writeLock > require: csqueue.readLock > CapacityScheduler.schedule: hold: csqueue.readLock > require: PremmptionManager.readLock > other thread(completeContainer,release Resource,etc.): require: > csqueue.writeLock > {code} > The jstack logs at the time were as follows -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539406#comment-17539406 ] Yuan Luo commented on YARN-5: - any update for this ticket? [~zuston] > Add configuration to disable AM preemption for capacity scheduler > - > > Key: YARN-5 > URL: https://issues.apache.org/jira/browse/YARN-5 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yuan Luo >Assignee: Junfan Zhang >Priority: Major > > I think it's necessary to add configuration to disable AM preemption for > capacity-scheduler, like fair-scheduler feature: YARN-9537. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528508#comment-17528508 ] Yuan Luo commented on YARN-5: - [~zuston] If you can help do that, that would be great. > Add configuration to disable AM preemption for capacity scheduler > - > > Key: YARN-5 > URL: https://issues.apache.org/jira/browse/YARN-5 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yuan Luo >Priority: Major > > I think it's necessary to add configuration to disable AM preemption for > capacity-scheduler, like fair-scheduler feature: YARN-9537. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler
Yuan Luo created YARN-5: --- Summary: Add configuration to disable AM preemption for capacity scheduler Key: YARN-5 URL: https://issues.apache.org/jira/browse/YARN-5 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Yuan Luo I think it's necessary to add configuration to disable AM preemption for capacity-scheduler, like fair-scheduler feature: YARN-9537. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493631#comment-17493631 ] Yuan Luo commented on YARN-10934: - After applying this patch to our cluster, the problem was fixed. Thank you very much! [~bteke] > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > {code:java} > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page
[ https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11012: Description: Router web page should show hadoop version and startup time like other YARN componment, so we can add a tab 'ServerInfo' to show these informations. !image-2021-11-22-18-53-32-007.png|width=726,height=159! was: Router web page should show hadoop version and startup time like other YARN componment, so we can add a tab 'ServerInfo' to show these informations. !image-2021-11-22-18-53-32-007.png! > Add version and process startup time on Router web page > --- > > Key: YARN-11012 > URL: https://issues.apache.org/jira/browse/YARN-11012 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Fix For: 3.4.0 > > Attachments: image-2021-11-22-18-53-32-007.png > > > Router web page should show hadoop version and startup time like other YARN > componment, so we can add a tab 'ServerInfo' to show these informations. > > !image-2021-11-22-18-53-32-007.png|width=726,height=159! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page
[ https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11012: Description: Router web page should show hadoop version and startup time like other YARN componment, so we can add a tab 'ServerInfo' to show these informations. !image-2021-11-22-18-53-32-007.png! was: Router web page should show hadoop version and startup time like other YARN componment, so we can add a tab to show these informations. > Add version and process startup time on Router web page > --- > > Key: YARN-11012 > URL: https://issues.apache.org/jira/browse/YARN-11012 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Fix For: 3.4.0 > > Attachments: image-2021-11-22-18-53-32-007.png > > > Router web page should show hadoop version and startup time like other YARN > componment, so we can add a tab 'ServerInfo' to show these informations. > > !image-2021-11-22-18-53-32-007.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page
[ https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11012: Attachment: image-2021-11-22-18-53-32-007.png > Add version and process startup time on Router web page > --- > > Key: YARN-11012 > URL: https://issues.apache.org/jira/browse/YARN-11012 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Fix For: 3.4.0 > > Attachments: image-2021-11-22-18-53-32-007.png > > > Router web page should show hadoop version and startup time like other YARN > componment, so we can add a tab to show these informations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page
[ https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11012: Description: Router web page should show hadoop version and startup time like other YARN componment, so we can add a tab to show these infos > Add version and process startup time on Router web page > --- > > Key: YARN-11012 > URL: https://issues.apache.org/jira/browse/YARN-11012 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Fix For: 3.4.0 > > > Router web page should show hadoop version and startup time like other YARN > componment, so we can add a tab to show these infos -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page
[ https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11012: Description: Router web page should show hadoop version and startup time like other YARN componment, so we can add a tab to show these informations. was: Router web page should show hadoop version and startup time like other YARN componment, so we can add a tab to show these infos > Add version and process startup time on Router web page > --- > > Key: YARN-11012 > URL: https://issues.apache.org/jira/browse/YARN-11012 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Fix For: 3.4.0 > > > Router web page should show hadoop version and startup time like other YARN > componment, so we can add a tab to show these informations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11012) Add version and process startup time on Router web page
Yuan Luo created YARN-11012: --- Summary: Add version and process startup time on Router web page Key: YARN-11012 URL: https://issues.apache.org/jira/browse/YARN-11012 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.3.1 Reporter: Yuan Luo Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Fix Version/s: 3.4.0 > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: router-error.png > > Time Spent: 10m > Remaining Estimate: 0h > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > The router always throws the following non-obvious error to client: > Exception in thread "main" > org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException: > No active SubCluster available to submit the request. > To make it easy for the user to locate the cause of a job submission failure, > we need to make Router throw the true cause of the error to the client > clearly. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error to client: Exception in thread "main" org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException: No active SubCluster available to submit the request. To make it easy for the user to locate the cause of a job submission failure, we need to make Router throw the true cause of the error to the client clearly. was: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error to client: Exception in thread "main" org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException: No active SubCluster available to submit the request. > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: router-error.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > The router always throws the following non-obvious error to client: > Exception in thread "main" > org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException: > No active SubCluster available to submit the request. > To make it easy for the user to locate the cause of a job submission failure, > we need to make Router throw the true cause of the error to the client > clearly. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error to client: Exception in thread "main" org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException: No active SubCluster available to submit the request. was: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error to client: No active SubCluster available to submit the request. > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: router-error.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > The router always throws the following non-obvious error to client: > Exception in thread "main" > org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException: > No active SubCluster available to submit the request. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error to client: No active SubCluster available to submit the request. was: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error: > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: router-error.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > The router always throws the following non-obvious error to client: > No active SubCluster available to submit the request. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Attachment: router-error.png > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: router-error.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > The router always throws the following non-obvious error: -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error: was: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: router-error.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > The router always throws the following non-obvious error: -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error: was: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. The router always throws the following non-obvious error: > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: router-error.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > The router always throws the following non-obvious error: -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Attachment: router-error.png > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Attachment: (was: router-error.png) > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. was: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. was: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. *Before repair: * *Client output:* !screenshot-1.png! *Router log:* !screenshot-2.png! > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Attachment: (was: screenshot-1.png) > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > *Before repair: * > *Client output:* > !screenshot-1.png! > *Router log:* > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Attachment: (was: screenshot-2.png) > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > *Before repair: * > *Client output:* > !screenshot-1.png! > *Router log:* > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Attachment: screenshot-1.png > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > *Before repair: * > *Client output:* > !screenshot-1.png|thumbnail! > *Router log:* > !screenshot-2.png|thumbnail! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. *Before repair: * *Client output:* !screenshot-1.png|thumbnail! *Router log:* !screenshot-2.png|thumbnail! > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > *Before repair: * > *Client output:* > !screenshot-1.png|thumbnail! > *Router log:* > !screenshot-2.png|thumbnail! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Description: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. *Before repair: * *Client output:* !screenshot-1.png! *Router log:* !screenshot-2.png! was: When router submits job to yarn, the job is rejected by yarn for some reason. Yarn router just throw exception to log and return not enough info to client. *Before repair: * *Client output:* !screenshot-1.png|thumbnail! *Router log:* !screenshot-2.png|thumbnail! > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > *Before repair: * > *Client output:* > !screenshot-1.png! > *Router log:* > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly
[ https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-11011: Attachment: screenshot-2.png > Make YARN Router throw Exception to client clearly > -- > > Key: YARN-11011 > URL: https://issues.apache.org/jira/browse/YARN-11011 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > When router submits job to yarn, the job is rejected by yarn for some reason. > Yarn router just throw exception to log and return not enough info to client. > *Before repair: * > *Client output:* > !screenshot-1.png|thumbnail! > *Router log:* > !screenshot-2.png|thumbnail! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11011) Make YARN Router throw Exception to client clearly
Yuan Luo created YARN-11011: --- Summary: Make YARN Router throw Exception to client clearly Key: YARN-11011 URL: https://issues.apache.org/jira/browse/YARN-11011 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 3.3.1 Reporter: Yuan Luo -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411636#comment-17411636 ] Yuan Luo edited comment on YARN-10934 at 9/8/21, 7:26 AM: -- [~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I have added some yarn config in the attachment. We use DefaultResourceCalculator and queue number of vcore configuration is 0, suspicion and the related, but the code is not found the problem. was (Author: luoyuan): [~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I will add some information in the attachment. > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml > > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-10934: Attachment: (was: RM-capacity-scheduler.xml) > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml > > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-10934: Attachment: RM-capacity-scheduler.xml > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml > > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-10934: Attachment: RM-capacity-scheduler.xml > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml > > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-10934: Attachment: (was: RM-capacity-scheduler.xml) > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: RM-yarn-site.xml > > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Luo updated YARN-10934: Attachment: RM-capacity-scheduler.xml RM-yarn-site.xml > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan Luo >Priority: Major > Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml > > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan LUO updated YARN-10934: Summary: LeafQueue activateApplications NPE (was: activateApplications NPE) > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10934) activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411636#comment-17411636 ] Yuan LUO commented on YARN-10934: - [~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I will add some information in the attachment. > activateApplications NPE > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan LUO updated YARN-10934: Summary: activateApplications NPE (was: activateApplications NPL) > activateApplications NPE > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10934) activateApplications NPL
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411200#comment-17411200 ] Yuan LUO commented on YARN-10934: - Hi [~zhuqi] [~gandras] [~bteke] [~taoyang] Could you have a look at this issue, thanks! > activateApplications NPL > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) activateApplications NPL
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan LUO updated YARN-10934: Description: Our prod Yarn cluster is hadoop version 3.3.1 , we changed DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then our RM crashed, the Exception stack like below. I think this is a serious bug and hope someone can follow up and fix it. 2021-08-30 21:00:59,114 ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) at java.base/java.lang.Thread.run(Thread.java:834) was: Our prod Yarn cluster is hadoop version 3.3.1 , we changed DefaultResourceCalculator -> DominantResourceCalculator, then our RM crashed, the Exception stack like below: 2021-08-30 21:00:59,114 ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) at java.base/java.lang.Thread.run(Thread.java:834) > activateApplications NPL > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands,
[jira] [Created] (YARN-10934) activateApplications NPL
Yuan LUO created YARN-10934: --- Summary: activateApplications NPL Key: YARN-10934 URL: https://issues.apache.org/jira/browse/YARN-10934 Project: Hadoop YARN Issue Type: Bug Components: RM Affects Versions: 3.3.1 Reporter: Yuan LUO Our prod Yarn cluster is hadoop version 3.3.1 , we changed DefaultResourceCalculator -> DominantResourceCalculator, then our RM crashed, the Exception stack like below: 2021-08-30 21:00:59,114 ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10774) Federation: Normalize the yarn federation queue name
[ https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan LUO updated YARN-10774: Attachment: YARN-10774.003.patch > Federation: Normalize the yarn federation queue name > > > Key: YARN-10774 > URL: https://issues.apache.org/jira/browse/YARN-10774 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 2.10.0, 3.3.0, 2.10.1 >Reporter: Yuan LUO >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10774.001.patch, YARN-10774.002.patch, > YARN-10774.003.patch > > > While in YARN at root.abc is equivalent to the abc queue, the routing > behavior of both should be consistent in yarn federation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10774) Federation: Normalize the yarn federation queue name
[ https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347464#comment-17347464 ] Yuan LUO edited comment on YARN-10774 at 5/19/21, 9:49 AM: --- Thanks for your reply [~zhuqi] :) This Patch [YARN-7621|https://issues.apache.org/jira/browse/YARN-7621 ] is used in our cluster to ensure CS support this, but it has not been integrated into the community for a long time. I'm looking forward to you can make the community version of CS support this feature as well, this will make it easier for users to switch from FS to CS. YARN Federation may need this feature too. was (Author: luoyuan): Thanks for your reply [~zhuqi] :) This Patch [YARN-7621|https://issues.apache.org/jira/browse/YARN-7621 ] is used in our cluster to ensure CS support this, but it has not been integrated into the community for a long time. I'm looking forward to you can make the community version of CS support this feature as well, this will make it easier for users to switch from FS to CS. YARN Federation may need this feature too. Yarn Federation also needs support in the long term > Federation: Normalize the yarn federation queue name > > > Key: YARN-10774 > URL: https://issues.apache.org/jira/browse/YARN-10774 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Reporter: Yuan LUO >Priority: Major > Attachments: YARN-10774.001.patch > > > While in YARN at root.abc is equivalent to the abc queue, the routing > behavior of both should be consistent in yarn federation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10774) Federation: Normalize the yarn federation queue name
[ https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347464#comment-17347464 ] Yuan LUO commented on YARN-10774: - Thanks for your reply [~zhuqi] :) This Patch [YARN-7621|https://issues.apache.org/jira/browse/YARN-7621 ] is used in our cluster to ensure CS support this, but it has not been integrated into the community for a long time. I'm looking forward to you can make the community version of CS support this feature as well, this will make it easier for users to switch from FS to CS. YARN Federation may need this feature too. Yarn Federation also needs support in the long term > Federation: Normalize the yarn federation queue name > > > Key: YARN-10774 > URL: https://issues.apache.org/jira/browse/YARN-10774 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Reporter: Yuan LUO >Priority: Major > Attachments: YARN-10774.001.patch > > > While in YARN at root.abc is equivalent to the abc queue, the routing > behavior of both should be consistent in yarn federation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10774) Federation: Normalize the yarn federation queue name
[ https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan LUO updated YARN-10774: Attachment: YARN-10774.001.patch > Federation: Normalize the yarn federation queue name > > > Key: YARN-10774 > URL: https://issues.apache.org/jira/browse/YARN-10774 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Reporter: Yuan LUO >Priority: Major > Attachments: YARN-10774.001.patch > > > While in YARN at root.abc is equivalent to the abc queue, the routing > behavior of both should be consistent in yarn federation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10774) Federation: Normalize the yarn federation queue name
Yuan LUO created YARN-10774: --- Summary: Federation: Normalize the yarn federation queue name Key: YARN-10774 URL: https://issues.apache.org/jira/browse/YARN-10774 Project: Hadoop YARN Issue Type: Bug Components: federation, yarn Reporter: Yuan LUO While in YARN at root.abc is equivalent to the abc queue, the routing behavior of both should be consistent in yarn federation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10111) In Federation cluster Distributed Shell Application submission fails as YarnClient#getQueueInfo is not implemented
[ https://issues.apache.org/jira/browse/YARN-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346094#comment-17346094 ] Yuan LUO commented on YARN-10111: - Hi [~zhuqi] Is there any new discussion progress on this patch? I have a question,in the case of yarn federation, queue names of all subclusters need to be consistent? Looking forward to your reply, thanks! > In Federation cluster Distributed Shell Application submission fails as > YarnClient#getQueueInfo is not implemented > -- > > Key: YARN-10111 > URL: https://issues.apache.org/jira/browse/YARN-10111 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sushanta Sen >Assignee: Qi Zhu >Priority: Blocker > Attachments: YARN-10111.001.patch > > > In Federation cluster Distributed Shell Application submission fails as > YarnClient#getQueueInfo is not implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation
[ https://issues.apache.org/jira/browse/YARN-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346090#comment-17346090 ] Yuan LUO edited comment on YARN-10465 at 5/17/21, 11:21 AM: Hi [~ayushsaxena] [~tangzhankun] [~zhuqi] Can you help review this,Thanks! was (Author: luoyuan): Hi [~ayushsaxena] [~tangzhankun] [~zhuqi]] Can you help review this,Thanks! > Support getClusterNodes, getNodeToLabels, getLabelsToNodes, > getClusterNodeLabels API's for Federation > - > > Key: YARN-10465 > URL: https://issues.apache.org/jira/browse/YARN-10465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: YARN-10465.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation
[ https://issues.apache.org/jira/browse/YARN-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346090#comment-17346090 ] Yuan LUO commented on YARN-10465: - Hi [~ayushsaxena] [~tangzhankun] [~zhuqi]] Can you help review this,Thanks! > Support getClusterNodes, getNodeToLabels, getLabelsToNodes, > getClusterNodeLabels API's for Federation > - > > Key: YARN-10465 > URL: https://issues.apache.org/jira/browse/YARN-10465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: YARN-10465.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10487) Support getQueueUserAcls, listReservations, getApplicationAttempts, getContainerReport, getContainers, getResourceTypeInfo API's for Federation
[ https://issues.apache.org/jira/browse/YARN-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346079#comment-17346079 ] Yuan LUO commented on YARN-10487: - Hi [~dmmkr] I don't think it's necessary for these APIs to request all subclusters, you can just send request to homesubcluster, what do you think? > Support getQueueUserAcls, listReservations, getApplicationAttempts, > getContainerReport, getContainers, getResourceTypeInfo API's for Federation > --- > > Key: YARN-10487 > URL: https://issues.apache.org/jira/browse/YARN-10487 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: YARN-10487.001.patch > > > Support getQueueUserAcls, listReservations, getApplicationAttempts, > getContainerReport, getContainers, getResourceTypeInfo API's for Federation -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346073#comment-17346073 ] Yuan LUO commented on YARN-8173: Hi [~dmmkr] [~giovanni.fumarola] We want use Livy to run in YARN Federation, the current implementation relies on this interface and is blocked. Is there any new progress about this patch?Looking forward to your reply, thanks! > [Router] Implement missing FederationClientInterceptor#getApplications() > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch, YARN-8173.004.patch, YARN-8173.005.patch, > YARN-8173.006.patch, YARN-8173.007.patch, YARN-8173.008.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org