[jira] [Updated] (YARN-11683) RM crash due to RELEASE_CONTAINER NPE

2024-04-10 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11683:

Affects Version/s: 3.4.0

> RM crash due to RELEASE_CONTAINER NPE
> -
>
> Key: YARN-11683
> URL: https://issues.apache.org/jira/browse/YARN-11683
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>
> We enable scheduleAsynchronously in our prod env, RM crash and throw 
> exception stack below:
> {code:java}
> // error stack
> ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in 
> handling event type RELEASE_CONTAINER to the Event Dispatcher
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completeOustandingUpdatesWhichAreReserved(AbstractYarnScheduler.java:811)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:770)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2271)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:177)
>         at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:90)
>         at java.base/java.lang.Thread.run(Thread.java:834) {code}
> I found same issues like YARN-11488, YARN-10204 reported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11683) RM crash due to RELEASE_CONTAINER NPE

2024-04-10 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11683:

Description: 
We enable scheduleAsynchronously in our prod env, RM crash and throw exception 
stack below:
{code:java}
// error stack
ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in 
handling event type RELEASE_CONTAINER to the Event Dispatcher
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completeOustandingUpdatesWhichAreReserved(AbstractYarnScheduler.java:811)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:770)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2271)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:177)
        at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:90)
        at java.base/java.lang.Thread.run(Thread.java:834) {code}

I found same issues like YARN-11488, YARN-10204 reported.

> RM crash due to RELEASE_CONTAINER NPE
> -
>
> Key: YARN-11683
> URL: https://issues.apache.org/jira/browse/YARN-11683
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yuan Luo
>Priority: Major
>
> We enable scheduleAsynchronously in our prod env, RM crash and throw 
> exception stack below:
> {code:java}
> // error stack
> ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in 
> handling event type RELEASE_CONTAINER to the Event Dispatcher
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completeOustandingUpdatesWhichAreReserved(AbstractYarnScheduler.java:811)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:770)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2271)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:177)
>         at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:90)
>         at java.base/java.lang.Thread.run(Thread.java:834) {code}
> I found same issues like YARN-11488, YARN-10204 reported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11683) RM crash due to RELEASE_CONTAINER NPE

2024-04-10 Thread Yuan Luo (Jira)
Yuan Luo created YARN-11683:
---

 Summary: RM crash due to RELEASE_CONTAINER NPE
 Key: YARN-11683
 URL: https://issues.apache.org/jira/browse/YARN-11683
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Yuan Luo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827323#comment-17827323
 ] 

Yuan Luo edited comment on YARN-11663 at 3/15/24 2:51 AM:
--

[~slfan1989] Thanks for your reply.

Our cache setting is 60 seconds, and memory recycling is triggered by full gc. 
The number of cached objects keeps growing.

!image-2024-03-15-10-50-32-860.png!

I test ehcache or guava cache will not clean up expired keys immediately, it 
will only be triggered when accessed. After the job ends, the cache key will 
never be accessed again.

 

 


was (Author: luoyuan):
Our cache setting is 60 seconds, and memory recycling is triggered by full gc. 
The number of cached objects keeps growing.

!image-2024-03-15-10-50-32-860.png!

I test ehcache or guava cache will not clean up expired keys immediately, it 
will only be triggered when accessed. After the job ends, the cache key will 
never be accessed again.

 

 

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827323#comment-17827323
 ] 

Yuan Luo commented on YARN-11663:
-

Our cache setting is 60 seconds, and memory recycling is triggered by full gc. 
The number of cached objects keeps growing.

!image-2024-03-15-10-50-32-860.png!

I test ehcache or guava cache will not clean up expired keys immediately, it 
will only be triggered when accessed. After the job ends, the cache key will 
never be accessed again.

 

 

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11663:

Attachment: image-2024-03-15-10-50-32-860.png

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11663:

Description: 
!image-2024-03-14-18-12-28-426.png!

!image-2024-03-14-18-12-49-950.png!

hi [~slfan1989] After apply this feature to our prod env, I found the memory of 
the router keeps growing over time. This is because after jobs finished, we 
won't access the expired key to trigger cleanup mechanism. Is it better to add 
cache maximum number limit?

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11663:

Affects Version/s: 3.4.0
   (was: 3.3.6)

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11663:

Attachment: image-2024-03-14-18-12-28-426.png

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11663:

Attachment: image-2024-03-14-18-12-49-950.png

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11663:

Component/s: federation

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.3.6
>Reporter: Yuan Luo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Yuan Luo (Jira)
Yuan Luo created YARN-11663:
---

 Summary: Router cache expansion issue
 Key: YARN-11663
 URL: https://issues.apache.org/jira/browse/YARN-11663
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.3.6
Reporter: Yuan Luo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10885) Make FederationStateStoreFacade#getApplicationHomeSubCluster use JCache

2024-03-14 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827033#comment-17827033
 ] 

Yuan Luo commented on YARN-10885:
-

!image-2024-03-14-18-02-21-278.png!

!image-2024-03-14-18-02-49-146.png!

hi [~slfan1989] [~chaosju]  After apply this feature to our prod env, I found 
the memory of the router keeps growing over time. This is because after jobs 
finished, we won't access the expired key to trigger cleanup mechanism. Is it 
better to add cache maximum number limit?

 

 

> Make FederationStateStoreFacade#getApplicationHomeSubCluster use JCache
> ---
>
> Key: YARN-10885
> URL: https://issues.apache.org/jira/browse/YARN-10885
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: chaosju
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Yarn Client getApplicationReport function may produce lots of zookeeper ops, 
> Its import to use the JCache that cache the mapping of  application and 
> subcluster id. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11191) Global Scheduler refreshQueue cause deadLock

2022-08-11 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578409#comment-17578409
 ] 

Yuan Luo commented on YARN-11191:
-

 
{code:java}
public class ThreadLockTest {

  private static DateFormat sdf = new SimpleDateFormat("-MM-dd HH:mm:ss");

  public static void BeforeFixDeadLock() throws InterruptedException {
System.out.println(
"BeforeFixDeadLock test start, this will cause deadlock..");
ReentrantReadWriteLock queueLock = new ReentrantReadWriteLock();
ReentrantReadWriteLock.ReadLock queueReadLock = queueLock.readLock();
ReentrantReadWriteLock.WriteLock queueWriteLock = queueLock.writeLock();

ReentrantReadWriteLock premmptionLock = new ReentrantReadWriteLock();
ReentrantReadWriteLock.ReadLock premmptionReadLock =
premmptionLock.readLock();
ReentrantReadWriteLock.WriteLock premmptionWriteLock =
premmptionLock.writeLock();

Thread schedulerThread = new Thread(() -> {
  System.out.println("current time: " + sdf.format(new Date()) +
  ", schedulerThread start!");
  //hold: csqueue.readLock
  queueReadLock.lock();
  System.out.println("current time: " + sdf.format(new Date()) +
  ", schedulerThread get queueReadLock!");
  try {
Thread.sleep(1000 * 15);
  } catch (InterruptedException e) {
e.printStackTrace();
  }

  //require: PremmptionManager.readLock
  premmptionReadLock.lock();
  System.out.println("current time: " + sdf.format(new Date()) +
  ", schedulerThread get premmptionReadLock!");

  premmptionReadLock.unlock();
  queueReadLock.unlock();
  System.out.println("current time: " + sdf.format(new Date()) +
  ", schedulerThread finish!");
});

Thread refreshQueueThread = new Thread(() -> {
  System.out.println("current time: " + sdf.format(new Date()) +
  ", refreshQueueThread start!");
  //hold: PremmptionManager.writeLock
  premmptionWriteLock.lock();
  System.out.println("current time: " + sdf.format(new Date()) +
  ", refreshQueueThread get premmptionWriteLock!");

  try {
Thread.sleep(1000 * 10);
  } catch (InterruptedException e) {
e.printStackTrace();
  }

  //require: csqueue.readLock
  queueReadLock.lock();
  System.out.println("current time: " + sdf.format(new Date()) +
  ", refreshQueueThread get queueReadLock!");

  queueReadLock.unlock();
  premmptionWriteLock.unlock();
  System.out.println("current time: " + sdf.format(new Date()) +
  ", refreshQueueThread finish!");
});

Thread otherThread = new Thread(() -> {
  //make otherThread request queue write lock after schedule thread hold
  // queue write lock, and before refres thread to get queue read lock
  try {
Thread.sleep(1000 * 5);
  } catch (InterruptedException e) {
e.printStackTrace();
  }
  System.out.println(
  "current time: " + sdf.format(new Date()) + ", otherThread start!");
  queueWriteLock.lock();


  System.out.println("current time: " + sdf.format(new Date()) +
  ", otherThread get queueWriteLock!");
  queueWriteLock.unlock();
  System.out.println(
  "current time: " + sdf.format(new Date()) + ", otherThread finish!");
});

schedulerThread.start();
refreshQueueThread.start();
otherThread.start();

refreshQueueThread.join();
schedulerThread.join();
otherThread.join();
  }

  public static void AfterFixDeadLock() throws InterruptedException {
System.out.println("AfterFixDeadLock test start..");
ReentrantReadWriteLock queueLock = new ReentrantReadWriteLock();
ReentrantReadWriteLock.ReadLock queueReadLock = queueLock.readLock();
ReentrantReadWriteLock.WriteLock queueWriteLock = queueLock.writeLock();

ReentrantReadWriteLock premmptionLock = new ReentrantReadWriteLock();
ReentrantReadWriteLock.ReadLock premmptionReadLock =
premmptionLock.readLock();
ReentrantReadWriteLock.WriteLock premmptionWriteLock =
premmptionLock.writeLock();

Thread schedulerThread = new Thread(() -> {
  System.out.println("current time: " + sdf.format(new Date()) +
  ", schedulerThread start!");
  //hold: csqueue.readLock
  queueReadLock.lock();
  System.out.println("current time: " + sdf.format(new Date()) +
  ", schedulerThread get queueReadLock!");
  try {
Thread.sleep(1000 * 15);
  } catch (InterruptedException e) {
e.printStackTrace();
  }

  //require: PremmptionManager.readLock
  premmptionReadLock.lock();
  System.out.println("current time: " + sdf.format(new Date()) +
  ", schedulerThread get premmptionReadLock!");

  premmptionReadLock.unlock();
  queueReadLock.unlock();
  System.out.println("current time: " + 

[jira] [Commented] (YARN-11191) Global Scheduler refreshQueue cause deadLock

2022-08-10 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577784#comment-17577784
 ] 

Yuan Luo commented on YARN-11191:
-

 I want to ask two question:
+  @Override
+  public List getChildQueuesByTryLock() {
+    try {
+      while (!readLock.tryLock()){
+        LockSupport.parkNanos(1);
+      }
+      return new ArrayList(childQueues);
+    } finally {
+      readLock.unlock();
+    }
+  }

1.Though you use tryLock and park, so refresh queue thread switch to block 
state, but this thread still hold PremmptionManager lock ,so scheduler thread 
still can't allocate new container. Is it right?

2.Does this issue related to global Scheduler or just the preemption function?

Looking forward to your reply, thanks!

> Global Scheduler refreshQueue cause deadLock 
> -
>
> Key: YARN-11191
> URL: https://issues.apache.org/jira/browse/YARN-11191
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0, 3.0.0, 3.1.0, 2.10.0, 3.2.0, 3.3.0
>Reporter: ben yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1.jstack, YARN-11191.001.patch
>
>
> This is a potential bug may impact all open premmption  cluster.In our 
> current version with preemption enabled, the capacityScheduler will call the 
> refreshQueue method of the PreemptionManager when it refreshQueue. This 
> process hold the preemptionManager write lock and  require csqueue read 
> lock.Meanwhile,ParentQueue.canAssignToThisQueue will hold csqueue readLock 
> and require PreemptionManager ReadLock.
> There is a possibility of deadlock at this time.Because readlock has one rule 
> on unfair policy, when a lock is already occupied by a read lock and the 
> first request in the lock competition queue is a write lock request,other 
> read lock requests cann‘t acquire the lock.
> So the potential deadlock is:
> {code:java}
> CapacityScheduler.refreshQueue: hold: PremmptionManager.writeLock
> require: csqueue.readLock
> CapacityScheduler.schedule: hold: csqueue.readLock
> require: PremmptionManager.readLock
> other thread(completeContainer,release Resource,etc.): require: 
> csqueue.writeLock 
> {code}
> The jstack logs at the time were as follows



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11191) Global Scheduler refreshQueue cause deadLock

2022-08-09 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577178#comment-17577178
 ] 

Yuan Luo commented on YARN-11191:
-

We meet a same issue when refresh yarn queue, the refresh thread stuck in below 
function:

 
{code:java}
'preemptionManager.refreshQueues(null, this.getRootQueue()) {code}
Can you help have a look at this issue. [~elgoiri] [~aajisaka] 

 

 

 

 

> Global Scheduler refreshQueue cause deadLock 
> -
>
> Key: YARN-11191
> URL: https://issues.apache.org/jira/browse/YARN-11191
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0, 3.0.0, 3.1.0, 2.10.0, 3.2.0, 3.3.0
>Reporter: ben yang
>Priority: Major
> Attachments: 1.jstack, YARN-11191.001.patch
>
>
> This is a potential bug may impact all open premmption  cluster.In our 
> current version with preemption enabled, the capacityScheduler will call the 
> refreshQueue method of the PreemptionManager when it refreshQueue. This 
> process hold the preemptionManager write lock and  require csqueue read 
> lock.Meanwhile,ParentQueue.canAssignToThisQueue will hold csqueue readLock 
> and require PreemptionManager ReadLock.
> There is a possibility of deadlock at this time.Because readlock has one rule 
> on unfair policy, when a lock is already occupied by a read lock and the 
> first request in the lock competition queue is a write lock request,other 
> read lock requests cann‘t acquire the lock.
> So the potential deadlock is:
> {code:java}
> CapacityScheduler.refreshQueue: hold: PremmptionManager.writeLock
> require: csqueue.readLock
> CapacityScheduler.schedule: hold: csqueue.readLock
> require: PremmptionManager.readLock
> other thread(completeContainer,release Resource,etc.): require: 
> csqueue.writeLock 
> {code}
> The jstack logs at the time were as follows



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler

2022-05-19 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539406#comment-17539406
 ] 

Yuan Luo commented on YARN-5:
-

any update for this ticket? [~zuston] 

> Add configuration to disable AM preemption for capacity scheduler
> -
>
> Key: YARN-5
> URL: https://issues.apache.org/jira/browse/YARN-5
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yuan Luo
>Assignee: Junfan Zhang
>Priority: Major
>
> I think it's necessary to add configuration to disable AM preemption for 
> capacity-scheduler, like fair-scheduler feature: YARN-9537.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler

2022-04-26 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528508#comment-17528508
 ] 

Yuan Luo commented on YARN-5:
-

[~zuston]  If you can help do that, that would be great.

> Add configuration to disable AM preemption for capacity scheduler
> -
>
> Key: YARN-5
> URL: https://issues.apache.org/jira/browse/YARN-5
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yuan Luo
>Priority: Major
>
> I think it's necessary to add configuration to disable AM preemption for 
> capacity-scheduler, like fair-scheduler feature: YARN-9537.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler

2022-04-24 Thread Yuan Luo (Jira)
Yuan Luo created YARN-5:
---

 Summary: Add configuration to disable AM preemption for capacity 
scheduler
 Key: YARN-5
 URL: https://issues.apache.org/jira/browse/YARN-5
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Yuan Luo


I think it's necessary to add configuration to disable AM preemption for 
capacity-scheduler, like fair-scheduler feature: YARN-9537.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10934) LeafQueue activateApplications NPE

2022-02-16 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493631#comment-17493631
 ] 

Yuan Luo commented on YARN-10934:
-

After applying this patch to our cluster, the problem was fixed. Thank you very 
much! [~bteke] 

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> {code:java}
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11012:

Description: 
Router web page should show hadoop version and startup time like other YARN 
componment, so we can add a tab 'ServerInfo' to show these informations.

 

!image-2021-11-22-18-53-32-007.png|width=726,height=159!

  was:
Router web page should show hadoop version and startup time like other YARN 
componment, so we can add a tab 'ServerInfo' to show these informations.

 

!image-2021-11-22-18-53-32-007.png!


> Add version and process startup time on Router web page
> ---
>
> Key: YARN-11012
> URL: https://issues.apache.org/jira/browse/YARN-11012
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: image-2021-11-22-18-53-32-007.png
>
>
> Router web page should show hadoop version and startup time like other YARN 
> componment, so we can add a tab 'ServerInfo' to show these informations.
>  
> !image-2021-11-22-18-53-32-007.png|width=726,height=159!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11012:

Description: 
Router web page should show hadoop version and startup time like other YARN 
componment, so we can add a tab 'ServerInfo' to show these informations.

 

!image-2021-11-22-18-53-32-007.png!

  was:
Router web page should show hadoop version and startup time like other YARN 
componment, so we can add a tab to show these informations.



> Add version and process startup time on Router web page
> ---
>
> Key: YARN-11012
> URL: https://issues.apache.org/jira/browse/YARN-11012
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: image-2021-11-22-18-53-32-007.png
>
>
> Router web page should show hadoop version and startup time like other YARN 
> componment, so we can add a tab 'ServerInfo' to show these informations.
>  
> !image-2021-11-22-18-53-32-007.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11012:

Attachment: image-2021-11-22-18-53-32-007.png

> Add version and process startup time on Router web page
> ---
>
> Key: YARN-11012
> URL: https://issues.apache.org/jira/browse/YARN-11012
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: image-2021-11-22-18-53-32-007.png
>
>
> Router web page should show hadoop version and startup time like other YARN 
> componment, so we can add a tab to show these informations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11012:

Description: 
Router web page should show hadoop version and startup time like other YARN 
componment, so we can add a tab to show these infos


> Add version and process startup time on Router web page
> ---
>
> Key: YARN-11012
> URL: https://issues.apache.org/jira/browse/YARN-11012
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Fix For: 3.4.0
>
>
> Router web page should show hadoop version and startup time like other YARN 
> componment, so we can add a tab to show these infos



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11012) Add version and process startup time on Router web page

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11012:

Description: 
Router web page should show hadoop version and startup time like other YARN 
componment, so we can add a tab to show these informations.


  was:
Router web page should show hadoop version and startup time like other YARN 
componment, so we can add a tab to show these infos



> Add version and process startup time on Router web page
> ---
>
> Key: YARN-11012
> URL: https://issues.apache.org/jira/browse/YARN-11012
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Fix For: 3.4.0
>
>
> Router web page should show hadoop version and startup time like other YARN 
> componment, so we can add a tab to show these informations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11012) Add version and process startup time on Router web page

2021-11-22 Thread Yuan Luo (Jira)
Yuan Luo created YARN-11012:
---

 Summary: Add version and process startup time on Router web page
 Key: YARN-11012
 URL: https://issues.apache.org/jira/browse/YARN-11012
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.3.1
Reporter: Yuan Luo
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Fix Version/s: 3.4.0

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: router-error.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> The router always throws the following non-obvious error to client:
> Exception in thread "main" 
> org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException:
> No active SubCluster available to submit the request.
> To make it easy for the user to locate the cause of a job submission failure, 
> we need to make Router throw the true cause of the error to the client 
> clearly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error to client:
Exception in thread "main" 
org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException:
No active SubCluster available to submit the request.

To make it easy for the user to locate the cause of a job submission failure, 
we need to make Router throw the true cause of the error to the client clearly.

  was:
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error to client:
Exception in thread "main" 
org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException:
No active SubCluster available to submit the request.



> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: router-error.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> The router always throws the following non-obvious error to client:
> Exception in thread "main" 
> org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException:
> No active SubCluster available to submit the request.
> To make it easy for the user to locate the cause of a job submission failure, 
> we need to make Router throw the true cause of the error to the client 
> clearly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error to client:
Exception in thread "main" 
org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException:
No active SubCluster available to submit the request.


  was:
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error to client:

No active SubCluster available to submit the request.



> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: router-error.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> The router always throws the following non-obvious error to client:
> Exception in thread "main" 
> org.apache.hadoop.yarn.server.federation.policies.exceptions.FederationPolicyException:
> No active SubCluster available to submit the request.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error to client:

No active SubCluster available to submit the request.


  was:
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error:




> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: router-error.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> The router always throws the following non-obvious error to client:
> No active SubCluster available to submit the request.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Attachment: router-error.png

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: router-error.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> The router always throws the following non-obvious error:



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error:




  was:
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.





> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: router-error.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> The router always throws the following non-obvious error:



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error:



  was:
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

The router always throws the following non-obvious error:





> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: router-error.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> The router always throws the following non-obvious error:



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Attachment: router-error.png

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Attachment: (was: router-error.png)

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-22 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.




  was:
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.



> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-21 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.


  was:
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

*Before repair: *

*Client output:*
 !screenshot-1.png! 

*Router log:*
 !screenshot-2.png! 


> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-21 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Attachment: (was: screenshot-1.png)

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> *Before repair: *
> *Client output:*
>  !screenshot-1.png! 
> *Router log:*
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-21 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Attachment: (was: screenshot-2.png)

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> *Before repair: *
> *Client output:*
>  !screenshot-1.png! 
> *Router log:*
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-21 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Attachment: screenshot-1.png

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> *Before repair: *
> *Client output:*
>  !screenshot-1.png|thumbnail! 
> *Router log:*
>  !screenshot-2.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-21 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

*Before repair: *

*Client output:*
 !screenshot-1.png|thumbnail! 

*Router log:*
 !screenshot-2.png|thumbnail! 

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> *Before repair: *
> *Client output:*
>  !screenshot-1.png|thumbnail! 
> *Router log:*
>  !screenshot-2.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-21 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Description: 
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

*Before repair: *

*Client output:*
 !screenshot-1.png! 

*Router log:*
 !screenshot-2.png! 

  was:
When router submits job to yarn, the job is rejected by yarn for some reason. 
Yarn router just throw exception to log and return not enough info to client.

*Before repair: *

*Client output:*
 !screenshot-1.png|thumbnail! 

*Router log:*
 !screenshot-2.png|thumbnail! 


> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> *Before repair: *
> *Client output:*
>  !screenshot-1.png! 
> *Router log:*
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-21 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-11011:

Attachment: screenshot-2.png

> Make YARN Router throw Exception to client clearly
> --
>
> Key: YARN-11011
> URL: https://issues.apache.org/jira/browse/YARN-11011
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When router submits job to yarn, the job is rejected by yarn for some reason. 
> Yarn router just throw exception to log and return not enough info to client.
> *Before repair: *
> *Client output:*
>  !screenshot-1.png|thumbnail! 
> *Router log:*
>  !screenshot-2.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11011) Make YARN Router throw Exception to client clearly

2021-11-21 Thread Yuan Luo (Jira)
Yuan Luo created YARN-11011:
---

 Summary: Make YARN Router throw Exception to client clearly
 Key: YARN-11011
 URL: https://issues.apache.org/jira/browse/YARN-11011
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.3.1
Reporter: Yuan Luo






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411636#comment-17411636
 ] 

Yuan Luo edited comment on YARN-10934 at 9/8/21, 7:26 AM:
--

[~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I have 
added some yarn config in the attachment.  We use DefaultResourceCalculator and 
queue number of vcore configuration is 0, suspicion and the related, but the 
code is not found the problem.


was (Author: luoyuan):
[~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I will 
add some information in the attachment.  

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: (was: RM-capacity-scheduler.xml)

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: RM-capacity-scheduler.xml

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: RM-capacity-scheduler.xml

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: (was: RM-capacity-scheduler.xml)

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: RM-capacity-scheduler.xml
RM-yarn-site.xml

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-07 Thread Yuan LUO (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan LUO updated YARN-10934:

Summary: LeafQueue activateApplications NPE  (was: activateApplications NPE)

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan LUO
>Priority: Major
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10934) activateApplications NPE

2021-09-07 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411636#comment-17411636
 ] 

Yuan LUO commented on YARN-10934:
-

[~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I will 
add some information in the attachment.  

> activateApplications NPE
> 
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan LUO
>Priority: Major
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) activateApplications NPE

2021-09-07 Thread Yuan LUO (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan LUO updated YARN-10934:

Summary: activateApplications NPE  (was: activateApplications NPL)

> activateApplications NPE
> 
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan LUO
>Priority: Major
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10934) activateApplications NPL

2021-09-07 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411200#comment-17411200
 ] 

Yuan LUO commented on YARN-10934:
-

Hi [~zhuqi] [~gandras] [~bteke] [~taoyang] Could you have a look at  this 
issue, thanks!

> activateApplications NPL
> 
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan LUO
>Priority: Major
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) activateApplications NPL

2021-09-07 Thread Yuan LUO (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan LUO updated YARN-10934:

Description: 
Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
our RM crashed, the Exception stack like below.  I think this is a serious bug 
and hope someone can follow up and fix it.

2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
(MarkerIgnoringBase.java:error(159)) - Error in handling event type 
APP_ATTEMPT_REMOVED to the Event Dispatcher
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
at java.base/java.lang.Thread.run(Thread.java:834)

  was:
Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
DefaultResourceCalculator -> DominantResourceCalculator, then our RM crashed, 
the Exception stack like below:

2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
(MarkerIgnoringBase.java:error(159)) - Error in handling event type 
APP_ATTEMPT_REMOVED to the Event Dispatcher
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
at java.base/java.lang.Thread.run(Thread.java:834)


> activateApplications NPL
> 
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan LUO
>Priority: Major
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, 

[jira] [Created] (YARN-10934) activateApplications NPL

2021-09-07 Thread Yuan LUO (Jira)
Yuan LUO created YARN-10934:
---

 Summary: activateApplications NPL
 Key: YARN-10934
 URL: https://issues.apache.org/jira/browse/YARN-10934
 Project: Hadoop YARN
  Issue Type: Bug
  Components: RM
Affects Versions: 3.3.1
Reporter: Yuan LUO


Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
DefaultResourceCalculator -> DominantResourceCalculator, then our RM crashed, 
the Exception stack like below:

2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
(MarkerIgnoringBase.java:error(159)) - Error in handling event type 
APP_ATTEMPT_REMOVED to the Event Dispatcher
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10774) Federation: Normalize the yarn federation queue name

2021-05-19 Thread Yuan LUO (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan LUO updated YARN-10774:

Attachment: YARN-10774.003.patch

> Federation: Normalize the yarn federation queue name
> 
>
> Key: YARN-10774
> URL: https://issues.apache.org/jira/browse/YARN-10774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 2.10.0, 3.3.0, 2.10.1
>Reporter: Yuan LUO
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10774.001.patch, YARN-10774.002.patch, 
> YARN-10774.003.patch
>
>
> While in YARN at root.abc is equivalent to the abc queue, the routing 
> behavior of both should be consistent in yarn federation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10774) Federation: Normalize the yarn federation queue name

2021-05-19 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347464#comment-17347464
 ] 

Yuan LUO edited comment on YARN-10774 at 5/19/21, 9:49 AM:
---

Thanks for your reply [~zhuqi] :)

This Patch [YARN-7621|https://issues.apache.org/jira/browse/YARN-7621 ] is used 
in our cluster to ensure CS support this, but it has not been integrated into 
the community for a long time.

I'm looking forward to you can make the community version of CS support this 
feature as well, this will make it easier for users to switch from FS to CS.  
YARN Federation may need this feature too.


was (Author: luoyuan):
Thanks for your reply [~zhuqi] :)

This Patch [YARN-7621|https://issues.apache.org/jira/browse/YARN-7621 ] is used 
in our cluster to ensure CS support this, but it has not been integrated into 
the community for a long time.

I'm looking forward to you can make the community version of CS support this 
feature as well, this will make it easier for users to switch from FS to CS.  
YARN Federation may need this feature too.

Yarn Federation also needs support in the long term

> Federation: Normalize the yarn federation queue name
> 
>
> Key: YARN-10774
> URL: https://issues.apache.org/jira/browse/YARN-10774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Reporter: Yuan LUO
>Priority: Major
> Attachments: YARN-10774.001.patch
>
>
> While in YARN at root.abc is equivalent to the abc queue, the routing 
> behavior of both should be consistent in yarn federation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10774) Federation: Normalize the yarn federation queue name

2021-05-19 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347464#comment-17347464
 ] 

Yuan LUO commented on YARN-10774:
-

Thanks for your reply [~zhuqi] :)

This Patch [YARN-7621|https://issues.apache.org/jira/browse/YARN-7621 ] is used 
in our cluster to ensure CS support this, but it has not been integrated into 
the community for a long time.

I'm looking forward to you can make the community version of CS support this 
feature as well, this will make it easier for users to switch from FS to CS.  
YARN Federation may need this feature too.

Yarn Federation also needs support in the long term

> Federation: Normalize the yarn federation queue name
> 
>
> Key: YARN-10774
> URL: https://issues.apache.org/jira/browse/YARN-10774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Reporter: Yuan LUO
>Priority: Major
> Attachments: YARN-10774.001.patch
>
>
> While in YARN at root.abc is equivalent to the abc queue, the routing 
> behavior of both should be consistent in yarn federation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10774) Federation: Normalize the yarn federation queue name

2021-05-18 Thread Yuan LUO (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan LUO updated YARN-10774:

Attachment: YARN-10774.001.patch

> Federation: Normalize the yarn federation queue name
> 
>
> Key: YARN-10774
> URL: https://issues.apache.org/jira/browse/YARN-10774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Reporter: Yuan LUO
>Priority: Major
> Attachments: YARN-10774.001.patch
>
>
> While in YARN at root.abc is equivalent to the abc queue, the routing 
> behavior of both should be consistent in yarn federation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10774) Federation: Normalize the yarn federation queue name

2021-05-18 Thread Yuan LUO (Jira)
Yuan LUO created YARN-10774:
---

 Summary: Federation: Normalize the yarn federation queue name
 Key: YARN-10774
 URL: https://issues.apache.org/jira/browse/YARN-10774
 Project: Hadoop YARN
  Issue Type: Bug
  Components: federation, yarn
Reporter: Yuan LUO


While in YARN at root.abc is equivalent to the abc queue, the routing behavior 
of both should be consistent in yarn federation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10111) In Federation cluster Distributed Shell Application submission fails as YarnClient#getQueueInfo is not implemented

2021-05-17 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346094#comment-17346094
 ] 

Yuan LUO commented on YARN-10111:
-

Hi [~zhuqi] Is there any new discussion progress on this patch? I have a 
question,in the case of yarn federation, queue names of all subclusters need to 
be consistent? Looking forward to your reply, thanks!

> In Federation cluster Distributed Shell Application submission fails as 
> YarnClient#getQueueInfo is not implemented
> --
>
> Key: YARN-10111
> URL: https://issues.apache.org/jira/browse/YARN-10111
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sushanta Sen
>Assignee: Qi Zhu
>Priority: Blocker
> Attachments: YARN-10111.001.patch
>
>
> In Federation cluster Distributed Shell Application submission fails as 
> YarnClient#getQueueInfo is not implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation

2021-05-17 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346090#comment-17346090
 ] 

Yuan LUO edited comment on YARN-10465 at 5/17/21, 11:21 AM:


Hi [~ayushsaxena] [~tangzhankun]  [~zhuqi]  Can you help review this,Thanks!


was (Author: luoyuan):
Hi [~ayushsaxena] [~tangzhankun]  [~zhuqi]]  Can you help review this,Thanks!

> Support getClusterNodes, getNodeToLabels, getLabelsToNodes, 
> getClusterNodeLabels API's for Federation
> -
>
> Key: YARN-10465
> URL: https://issues.apache.org/jira/browse/YARN-10465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
> Attachments: YARN-10465.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation

2021-05-17 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346090#comment-17346090
 ] 

Yuan LUO commented on YARN-10465:
-

Hi [~ayushsaxena] [~tangzhankun]  [~zhuqi]]  Can you help review this,Thanks!

> Support getClusterNodes, getNodeToLabels, getLabelsToNodes, 
> getClusterNodeLabels API's for Federation
> -
>
> Key: YARN-10465
> URL: https://issues.apache.org/jira/browse/YARN-10465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
> Attachments: YARN-10465.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10487) Support getQueueUserAcls, listReservations, getApplicationAttempts, getContainerReport, getContainers, getResourceTypeInfo API's for Federation

2021-05-17 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346079#comment-17346079
 ] 

Yuan LUO commented on YARN-10487:
-

Hi [~dmmkr]  I don't think it's necessary for these APIs to request all 
subclusters, you can just send request to homesubcluster, what do you think?

> Support getQueueUserAcls, listReservations, getApplicationAttempts, 
> getContainerReport, getContainers, getResourceTypeInfo API's for Federation
> ---
>
> Key: YARN-10487
> URL: https://issues.apache.org/jira/browse/YARN-10487
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
> Attachments: YARN-10487.001.patch
>
>
> Support getQueueUserAcls, listReservations, getApplicationAttempts, 
> getContainerReport, getContainers, getResourceTypeInfo API's for Federation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2021-05-17 Thread Yuan LUO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346073#comment-17346073
 ] 

Yuan LUO commented on YARN-8173:


Hi [~dmmkr] [~giovanni.fumarola] 
We want use Livy to run in YARN Federation, the current implementation relies 
on this interface and is blocked. Is there any new progress about this 
patch?Looking forward to your reply, thanks!

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: D M Murali Krishna Reddy
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch, YARN-8173.004.patch, YARN-8173.005.patch, 
> YARN-8173.006.patch, YARN-8173.007.patch, YARN-8173.008.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org